[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: trying to parse lines from an awkwardly formatted HAR file ...



> Archive.org has a well-documented API at
> https://archive.org/developers/. There's even a command-line tool
> (assuming one doesn't want to use, say, the python library).

I had given a somewhat thorough reading to their API some time ago,
but didn’t find anything that interesting and I was thinking of
developing a java GraalVM API which would be more customizable, easily
usable for other text banks. I took a second look at it and they still
don’t address their own problems, like repeated texts (same exact
text/publication with different identifiers), not standardized
metadata definitions: fr., french, French, fr, … to specify the
language. Author names are entered as free text as well ... so what is
the point of even having an API when the metadata is not well-defined,
-kept.


Reply to: