[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#743063: ITP: ruby-github-linguist -- detection and highlight of the programming language of source code and ignore binary files



Package: wnpp
Severity: wishlist
Owner: timothee <debian@timotheegirard.com>

* Package name    : ruby-github-linguist
  Version         : 2.10.11
  Upstream Author : GitHub <support@github.com>
* URL             : https://github.com/github/linguist
* License         : MIT
  Programming Lang: Ruby
  Description     : detection and highlight of the programming language of source code and ignore binary files

Library use by GitHub to detect blob languages, highlight code, ignore binary files, suppress generated files in diffs, and generate language breakdown 
graphs.

Features :
	- Language detection : ruby-github-linguist defines a list of all languages known to GitHub in a yaml file. In order for a file to be highlighted, a 
language and a lexer must be defined there.

Most languages are detected by their file extension. For disambiguating between files with common extensions, we first apply some common-sense heuristics to 
pick out obvious languages. After that, we use a statistical classifier. This process can help us tell the difference between, for example, .h files which 
could be either C, C++, or Obj-C.

	- Syntax Highlighting : The actual syntax highlighting is handled by our Pygments wrapper, ruby-pygments.rb. It also provides a Lexer abstraction 
that determines which highlighter should be used on a file.

	- Stats : The Language stats bar that you see on every repository is built by aggregating the languages of each file in that repository. The top 
language in the graph determines the project's primary language.

	- Ignore vendored files : Checking other code into your git repo is a common practice. But this often inflates your project's language stats and may 
even cause your project to be labeled as another language. ruby-github-linguist is able to identify some of these files and directories and exclude them.

	- Generated file detection : Not all plain text files are true source files. Generated files like minified js and compiled CoffeeScript can be 
detected and excluded from language stats. As an extra bonus, these files are suppressed in diffs.


Reply to: