Re: Using wget to fill in a form
wget isn't the right tool for that job. However its brother wput may be
able to do the job. On Sat, 22 Sep 2012, Gary Dale wrote:
> On 22/09/12 11:27 AM, Gary Dale wrote:
> > On 22/09/12 11:01 AM, craig@gtek.biz wrote:
> > > Greetings,
> > >
> > > I have a small book collection (~150) that I thought would be neat to
> > > catalog by the Library of Congress catalog numbers. I have found a LOC
> > > search form that will allow me to input the ISBN, and it will return the
> > > information I want:
> > >
> > > [code]http://www.loc.gov/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090[/code]
> > >
> > >
> > > I have the list of book ISBNs in a text file, so scripting this should be
> > > quite easy. The problem is I can't figure out how to submit the form from
> > > the command line. I figured wget would be the best way, but everything I
> > > try results in downloading a single line that reads "Your form didn't
> > > include an ACTION!" So I thought I would turn to here for help. The test
> > > ISBN I am using is for The Linux Cookbook: 1886411484, QA76.76.O63S788
> > > 2001.
> > >
> > > And a related side question. From my reading, I've learned that the Z39.50
> > > protocol is used to query databases, usually library related. Is anyone
> > > aware of an ISBN database table that can be downloaded by the user,
> > > preferably in a format that can be imported into MySQL or PostgreSQL?
> > >
> > > Thanks, Craig
> > >
> > The url you give is for the form. If you enter an ISBN number it will do the
> > search.
> >
> > What you need to do is capture the http header sent when you click "submit
> > query" then replace the test ISBN number with whatever number you want to
> > search. Wireshark can do this. Simply look for the query packet(s).
> >
> The fields you need are shown in the page source:
>
> <FORM METHOD="POST"ACTION="/cgi-bin/zgate">
> <INPUT NAME="ACTION"VALUE="SEARCH"TYPE="HIDDEN">
> <INPUT NAME="DBNAME"VALUE="VOYAGER"TYPE="HIDDEN">
> <INPUT NAME="ESNAME"VALUE="B"TYPE="HIDDEN">
> <INPUT NAME="MAXRECORDS"VALUE="20"TYPE="HIDDEN">
> <INPUT NAME="RECSYNTAX"VALUE="1.2.840.10003.5.10"TYPE="HIDDEN">
> <INPUT
> NAME="REINIT"TYPE="HIDDEN"VALUE="/cgi-bin/zgate?ACTION=INIT&FORM_HOST_PORT=/prod/www/data/z3950/locils2.html,z3950.loc.gov,7090">
> <INPUT NAME="srchtype"VALUE="1,1016,2,102,3,3,4,2,5,100,6,1"TYPE="HIDDEN">
>
> <P>
> <STRONG>Enter Search Term(s):</STRONG><br>(The search term can be a single
> word or a phrase from anywhere in the record. Enter an author's name in
> indirect order, i.e., last_name, first_name.)<p>
> <INPUT NAME="TERM_1"SIZE="60">
> <p>
> <INPUT TYPE="SUBMIT"VALUE="Submit Query">
> <INPUT Type="RESET"VALUE="Clear Form">
> <HR>
> Use of this form results in a search of the LC Voyager database (approximately
> 14 million records). This database contains records in all bibliographic
> formats (i.e., books, serials, music, maps, manuscripts, computer files, and
> visual materials), and includes the retrospective, unedited older
> bibliographic
> records known as the PreMARC File. LC name and subject authority records
> cannot be searched.
> <INPUT NAME="SESSION_ID"VALUE="5923056"TYPE="HIDDEN">
> </FORM>
>
>
> You need to construct the query using those fields with those values, with
> TERM_1 containing the ISBN number.
>
> From the error you are getting, it seems like your query either didn't include
> the SEARCH action or the header wasn't understood.
>
>
>
>
>
---------------------------------------------------------------------------
jude <jdashiel@shellworld.net>
Adobe fiend for failing to Flash
Reply to: