Bug#743063: ITP: ruby-github-linguist -- detection and highlight of the programming language of source code and ignore binary files
Owner: timothee <firstname.lastname@example.org>
* Package name : ruby-github-linguist
Version : 2.10.11
Upstream Author : GitHub <email@example.com>
* URL : https://github.com/github/linguist
* License : MIT
Programming Lang: Ruby
Description : detection and highlight of the programming language of source code and ignore binary files
Library use by GitHub to detect blob languages, highlight code, ignore binary files, suppress generated files in diffs, and generate language breakdown
- Language detection : ruby-github-linguist defines a list of all languages known to GitHub in a yaml file. In order for a file to be highlighted, a
language and a lexer must be defined there.
Most languages are detected by their file extension. For disambiguating between files with common extensions, we first apply some common-sense heuristics to
pick out obvious languages. After that, we use a statistical classifier. This process can help us tell the difference between, for example, .h files which
could be either C, C++, or Obj-C.
- Syntax Highlighting : The actual syntax highlighting is handled by our Pygments wrapper, ruby-pygments.rb. It also provides a Lexer abstraction
that determines which highlighter should be used on a file.
- Stats : The Language stats bar that you see on every repository is built by aggregating the languages of each file in that repository. The top
language in the graph determines the project's primary language.
- Ignore vendored files : Checking other code into your git repo is a common practice. But this often inflates your project's language stats and may
even cause your project to be labeled as another language. ruby-github-linguist is able to identify some of these files and directories and exclude them.
- Generated file detection : Not all plain text files are true source files. Generated files like minified js and compiled CoffeeScript can be
detected and excluded from language stats. As an extra bonus, these files are suppressed in diffs.