Full paper-to-bibliography toolchain

To: debian-science <debian-science@lists.debian.org>, kanzure@gmail.com
Subject: Full paper-to-bibliography toolchain
From: Bryan Bishop <kanzure@gmail.com>
Date: Sat, 14 Mar 2009 21:41:51 -0500
Message-id: <55ad6af70903141941s6ae5f129q2bc7554b521a87cf@mail.gmail.com>

Hi all,

This email comes about because of the recent thread about bibliography
management. In particular, I've always had my eye out for what sort of
software should (or should not) exist for scientific papers. Some
immediate examples:

AutoScholar
http://heybryan.org/projects/autoscholar/

AutoScholar is a perl script that takes a paper title and queries
Google Scholar and fetches a PDF link if available- either on the
first page of search results, or in the "Get This Article" link. This
truly belongs as a module to the 'surfraw' project more than anything.
Future bugfixes should honestly include automatically following
through to the publisher's website via WWW::Mechanize and look for PDF
links. PDF links are of three types usually: (1) direct links, (2)
links to a page that refreshes to the PDF, or (3) a popup with some
javascript black magic (like in the case of ScienceDirect) (which I
don't know how to fix with perl's WWW::Mechanize, any hints?).

"Call for Paper" (CFPs) file format standards- suggestions for a microformat
http://heybryan.org/cfp.html

""""
http://wikicfp.org/ - a wiki for posting CFPs. I've posted a few. Many
of the CFPs that flood my inbox are forwarded (by me) over to the
wikicfp gmail e-mail address, but I know that the poor guy who runs it
isn't keeping up with the CFP emails that I send his way. Also, I know
that there's no automatic way of reading CFPs since they hardly have a
standard format. Yes, there is standardized information that is
contained within each, but not always in the same format. Anyway, CFPs
should be released in a standardized format so that there is metadata,
descriptions, authors/participants/keynote speakers, locations and
addresses, deadlines, email addresses, URLs, BibTeX for previous
proceeding publications, and so on. The wikicfp wiki has an interface
for inserting information, but unfortunately it doesn't always capture
all of the information of a CFP since not all CFPs follow the same
three-tiered submission deadline format.

What are the advantages of sending around CFP files? You could process
more of them, and more quickly. You would only have to download an RSS
feed or zip file of CFPs and search for terms that you are interested
in. You could use the calendar/date-time information to import into
your own personal calendar/scheduling system. And it might also be a
good way to keep track of your work on different abstracts, posters,
papers, etc., with respect to deadlines, topics, etc. Perhaps even
through the submission-review-editing-(hopeful)-acceptance process?
""""

Recently I mentioned the idea of a GreaseMonkey userscript to
complement paper-reading over Google Scholar, here on this mailing
list:
http://lists.debian.org/debian-science/2009/03/msg00046.html

Google has an option in the user preferences page on Google Scholar (
http://scholar.google.com/ ) to show "Export citation" links next to
each paper that it turns up as a result to a user's queries, including
a BibTeX format. If you're downloading all of these papers, perhaps a
userscript that will detect a click and simultaneously download both
the PDF as well as the citation be appropriate? Or even better,
perhaps exporting just the citation with the link into a queue for
later processing? (This goes hand-in-hand with "list of things to
improve Google with"- like "search session management" (to see recent
queries, and recent results, instead of going in circles with Google
Trends and Google Search History) which I'll probably never get around
to implementing.)

Btw, speaking of GreaseMonkey, here's a script that will fix
ScienceDirect's naughty popup behavior for showing PDFs:
http://userscripts.org/tags/sciencedirect
http://userscripts.org/scripts/show/41663

There are some browser plugins for Firefox, such as Zotero, which does
in-browser bibliography management.
http://www.zotero.org/

But it's not entirely clear how often Zotero is able to capture both
bibliographic information as well as the actual PDF. Anybody know? I'd
like to be able to just impose a standard on all of you: a tar file
with a PDF and a dot bib (BibTeX) file. But alas, this doesn't seem
like it will happen. ;-) I did however have an opportunity once to
impose code on PLoS ONE, but I didn't take advantage of the situation-
silly me!

Another software package I once put a few hours into was something I
foolishly called "Autozen 2008". It was a perl script for cyclical PDF
viewing- in other words, pages would be flashed up for a few seconds
at a time on one of my many idle monitors. Instead of having a
television blazing around in the background, when I get distracted at
least I'm being distracted by something educational and interesting.
"Huh. I never knew that the wetting properties of acrylic liquids was
inversely proportional to their capillary crawl distance." (or
something) It would also have been interesting to set up a public
repository for a few clients to connect to for a "reading circle" of
sorts, where we all throw in some really interesting papers, since
friends and I are always talking about papers, and it's annoying
copying and pasting links when we know in general the content is of
high quality. For instance, sites like physorg and KurzweilAi commonly
contain press releases related to some paper published in some
journal, then I have to hunt through the news release to find any
hints as to a citation, whereas I'm sure I know somebody who has
already found the paper- which is usually more informative than the
news article.

So to clear up the rambling I've written above, here's some issues:
(1) downloading PDFs and getting the bibliographical information easily
(2) keeping track of what I have and have not read
(3) keeping track of literature searching and paper-reading
(4) scheduling "papers to be written deadlines" re: CFPs, managing
large volumes of CFPs
(5) somehow integrating the "bibtools" package into all of this
(6) and also somehow integrating emacs' "org-mode"

I feel that the present lack of system is somewhat broken, and I'd
like to help build a toolchain, but I'd also like to see if any of
these problems strike a chord with any other d-s people. Thoughts?

- Bryan
http://heybryan.org/
1 512 203 0507

Reply to:

Follow-Ups:
- Re: Full paper-to-bibliography toolchain
  - From: Ross Boylan <ross@biostat.ucsf.edu>

Prev by Date: Re: Bibliography and File Management
Next by Date: Re: Bug#519583: ITP: opticalraytracer -- A Utility that analyzes systems of lenses.
Previous by thread: Re: Bibliography and File Management
Next by thread: Re: Full paper-to-bibliography toolchain
Index(es):
- Date
- Thread