[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#722156: ITP: binaryornot -- Ultra-lightweight pure Python package to check if a file is binary or text.

* Julian Taylor <jtaylor.debian@googlemail.com>, 2013-09-08, 17:04:
* URL             : https://github.com/audreyr/binaryornot
* License         : BSD
 Programming Lang: Python
 Description     : Ultra-lightweight pure Python package to check if a file is binary or text.

This Python package provides a function to check if a file is a text file or a binary file. It uses the same heuristic as file(1) by looking at the first 1024 bytes of the file and checks that all characters are printable.

do we need a package for that?

I would mind a library that does only one little thing, if it did it right, was well-documented and came with a decent test suite. Unfortunately, binaryornot is currently not like that. Its bug density is rather high:

PY3 = sys.version > '3'

Eww, the sys.version_info tuple should be used for comparisons instead.

def unicode_open(filename, *args, **kwargs):
   Opens a file as usual on Python 3, and with UTF-8 encoding on Python 2.

So it uses locale encoding in Python 3, but UTF-8 in Python 2. Why such inconsistency? Also, this function isn't used anywhere...

def get_starting_chunk(filename):
   :param filename: File to open and get the first little chunk of.
   :returns: Starting chunk of bytes.
   with open(filename, 'r') as f:
       chunk = f.read(1024)
       return chunk

Docstring says it returns "bytes", but in Python 3 it returns a Unicode string.

def is_binary_string(bytes_to_check):
   :param bytes: A chunk of bytes to check.

The parameter's name is "bytes_to_check", not "bytes".

   textchars = ''.join(
       map(chr, [7, 8, 9, 10, 12, 13, 27] + range(0x20, 0x100)))

In Python 3, this raises TypeError.

def is_binary_alt(filename):
   :param filename: File to check.
   :returns: True if it's a binary file, otherwise False.

How is is_binary_alt() different than is_binary()? They have identical docstrings.

       chunk = get_starting_chunk(filename)
       if not PY3:
           return is_binary_string(chunk)

There's no "else" branch...

Jakub Wilk

Reply to: