[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Automated ad-hoc url extracting



        I'm looking for a program or some code to help extract url's from
arbitrary file types. I imagine I could write such a program using bison,
but I'd like to use an existing program to reduce the amount of research
that I would have to do to figure out what is and isn't a valid URL.

        I'm also looking for something to convert relative url's to
absolute urls.

        It'd also be useful to have specific parsers if I run into files
that I can tell their type. Eg, if I see an html file, I'd run it through
an html parser, same for xml, ms office documents...

     Drew Daniels




Reply to: