[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Intent to package UPC barcode generator ; barcode



> I don't want to create bar codes, but I'd like to scan my books/CD's
> into a database. I haven't made it through all my wife's books and she

I have a modest science fiction collection (~400 paperback, ~100
hardcover) so I've looked into much of this already; allow me to
summarize, and suggest...

CD's don't *have* useful barcodes.  Look at cddb for the one approach
that has been found to work - namely, scanning the disk index marks
for length, in seconds, and hashing that.  This even works with
"mp3info" output of cooked disks - one second resolution is
sufficient.

UPC codes specifically are useless for books.  They're too short, are
split vendor/non-unique-vendor-code -- for example, Baen Books,
"Shards of Honor", Lois McMaster Bujold, has UPC 76714 00599; 76714 is
Baen, 00599 is "any Baen paperback that lists for $5.99".  There is an
extension code, 72087 on this copy, which is a subset of the digits of
the ISBN - which provides uniqueness among actual Baen books but *not*
identification.

What you *really* want is the EAN (which is supplanting UPC even in
the US in the next 3 years or so.)  The EAN is a longer code, which
even on mass-market paperbacks you'll find inside the front cover, if
the back still has a UPC; over the next couple of years, the EAN will
get primary placement on the back, sometimes with a UPC, more often
the UPC goes away altogether (look at any XML book for an example,
they're all new enough to have EAN only.)

The *really* cool bit about EAN is the content (the encoding in ink is
the same as UPC, just longer): the aforementioned book is
9780671720872 with an extra field of 50599.  You see, the first three
digits of the EAN are the country code - and 978 is a very special
country, called "Bookland" (really, I'm not kidding, web search on it
if you don't believe me.)  A "Bookland EAN" is followed by an ISBN
directly: 978-067172087-2 which is ISBN 067172087-2.  The last digit
is a checksum - the algorithms differ, and it is a coincidence that
they match for this example.  Also, all EAN's have an extension field
which is a "list price" field... 5-0599 is "US$" 5.99, etc.

> Would a database with the codes for books be free/affordable/expensive?

And there's the tricky part.  I have some tools I've written which get
pointers to other libraries from the Library of Congress Z39.50 index
page.  I selected 10 random science fiction titles from my collection,
and hit 160 libraries - no more than 20 libraries got hits for *any*
of them; I think the most hits from a single library was 6 out of 10.
The problem seems to be that libraries only have "cards" for books
that they actually *own*.  I'm not sure why the LoC itself isn't
better, but last I checked they were still only running the production
server during business hours and shutting it down at night.

There are sources for raw MARC card catalog data; apparently the data
sets *start* in the US$20,000 range - so I'm planning to
	1) slurp what individual "cards" I can from the libraries out there
	2) edit and republish them (there is a *lot* of variation in
quality in the cards I have gotten)
	3) write and publish software to manage it all (with a heavy
XML and web slant, but a pilot app is also a minimal requirement :-)

Barcode scanning itself is the easy part - it seems that the "keyboard
wedge" model has basically taken over the industry, any barcode device
that is above the "parts" level will either do ASCII-serial or PC
Keyboard, and the latter are most common.

I'm actually taking the approach of scanning the EAN and UPC together;
since the extended-UPC is still *unique*, I can use it to find the
records I have without opening up the book, when signing out books for
friends...

Anyone interested in this is welcome to contact me directly, there
isn't that much relevance to Debian until I get an implementation
started (though if there's a good XML-interface to any of the free
*SQL databases in Debian, that would be good to know - also if there's
actually enough *pure debian* support for java (by which I mean "stuff
I can apt-get install, without contrib/non-free") that it becomes a
viable development option, I'd like to see that discussed too,
debian-java is a better list for it.)

			_Mark_ <eichin@thok.org>
			The Herd of Kittens
			Debian Package Maintainer


Reply to: