[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: trying to parse lines from an awkwardly formatted HAR file ...



Greg Wooledge via lists.debian.org


>Furthermore, whatever method you are using to *create* this HAR file

>is questionable, since apparently you aren't even getting a properly

>formatted file in the end.


>So, putting these together, it looks like you are taking a file that

>was intended to be used for diagnosing browser/network performance

>issues, and attempting to use this in place of a downloadable index

>of documents from archive.org.


Well, the Chromium HAR log utility has captured that file as a HAR
formatted one of sorts describing the client-server back and forth and
the Linux file utility is telling me it is: "JSON text data". You may
also go:


https://archive.org/search?query=Euklid+OR+Euclid+OR+Euclides&and%5B%5D=lending%3A%22is_readable%22


to save that page and tell me what can you start with its content.
This is what I mean with hellishly obfuscated "js cr@p" and I can't
understand why archive.org would do that.


>Do you have one of these HAR files in a *DIRECTLY DOWNLOADABLE URL*?


the sample json file (the HAR file from archive.org) I am using right
now was uploaded file to:


https://ergosumus.files.wordpress.com/2024/03/karl_rosenkranz02_ia.har_.odt


date

url="https://ergosumus.files.wordpress.com/2024/03/karl_rosenkranz02_ia.har_.odt";

time wget -q --spider --no-verbose --server-response "${url}"; _wgetq=$?

echo "// __ \$_wgetq: |$_wgetq|"


Sat Mar 23 01:39:17 PM CDT 2024

HTTP/1.1 200 OK

Server: nginx

Date: Sat, 23 Mar 2024 18:38:16 GMT

Content-Type: application/vnd.oasis.opendocument.text

Content-Length: 686303

Connection: keep-alive

Last-Modified: Sat, 23 Mar 2024 17:01:03 GMT

Expires: Thu, 18 Apr 2024 19:04:42 GMT

X-Orig-Src: 01_mogdir

X-nc: MISS mdw 24 np

X-Content-Type-Options: nosniff

Alt-Svc: h3=":443"; ma=86400

Accept-Ranges: bytes


real 0m0.582s

user 0m0.080s

sys 0m0.069s

// __ $_wgetq: |0|

~

$ date

Sat Mar 23 11:59:53 AM CDT 2024


$ ls -l Karl_Rosenkranz02_IA.har.*

-rw-r--r-- 1 user user 686303 Mar 23 11:59 Karl_Rosenkranz02_IA.har.odt

-rw-r--r-- 1 user user 4290474 Mar 21 19:17 Karl_Rosenkranz02_IA.har.txt

-rw-r--r-- 1 user user 686303 Mar 23 11:59 Karl_Rosenkranz02_IA.har.zip


$ file --brief Karl_Rosenkranz02_IA.har.*

Zip archive data, at least v2.0 to extract, compression method=deflate

JSON text data

Zip archive data, at least v2.0 to extract, compression method=deflate


$ file Karl_Rosenkranz02_IA.har.*

Karl_Rosenkranz02_IA.har.odt: Zip archive data, at least v2.0 to
extract, compression method=deflate

Karl_Rosenkranz02_IA.har.txt: JSON text data

Karl_Rosenkranz02_IA.har.zip: Zip archive data, at least v2.0 to
extract, compression method=deflate


$ sha256sum Karl_Rosenkranz02_IA.har.*

95c2bf849d67b6812193b72fc8504fcab71b49da7937ea8fd9421bee4075ac86
Karl_Rosenkranz02_IA.har.odt

79dd5a23748db1a7270927b6c16fc28cfff59eaf804ba24b2443da578903ede2
Karl_Rosenkranz02_IA.har.txt

95c2bf849d67b6812193b72fc8504fcab71b49da7937ea8fd9421bee4075ac86
Karl_Rosenkranz02_IA.har.zip

~

or you could:


a) go: https://en.wikipedia.org/wiki/Karl_Rosenkranz

b) click on: Works by or about Karl Rosenkranz (at Internet Archive)

c) on the archive.org page, select "texts" and "always available"
(meaning text which is public domain)

d) open "More Tools" ... as I explained before (with d.5 I meant you
may have to scroll down or use Key press combinations to "manually"
get all records) in Rosenkranz' case I got 169 texts.

~

>This tells me we're deep inside an X-Y problem. The original goal is

>possibly something like "I want an index of all the books about this

>Greek dude". Maybe start from there, and see what answers you get.


Actually, in order to deX-Y it in case anyone can offer any help, it
is more like "I want an index of all the books which have ever been
written/published" in order to read all of them ;-)


Data registries mind their own extant entries. There is no general,
"orbis unum" registry of all texts (generally meant in a philological,
semiological sense: videos, paintings, ...) just the registry not the
extant data. Terribly persuasive silly me tried to explain this idea
to the archive.org folks and they told me off.

What would that registry be good for? Well, let me use self serving
metaphors, some time ago people didn't know how many people lived in
their countries or even their cities, where did the Nile river start,
what an earth map would look like, ... There was a moment in the
history of humankind in which one person could actually have read all
extant literature (at least relating to one culture, say: "natural
philosophy"). Technically it is not so hard, according to google some
130 million books have been printed since the invention of the
printing press. Not that many, anyway. The idea of reading them all
seized me when I was little after reading a one liner by some Perugian
dude (as cannibalized by me):


"the greatest of all gifts and graces that God has granted us with is
the capacity of overcoming oneself".


Now, there you have a thoroughly true statement which ontological or
physiological/metabolical bearings you don't have the most minimal
clue about. How on earth is that even possible! I think he was talking
about the natural, spiritual ability one has to bootstrap oneself,
which no one/nothing can’t take away from you as part of one’s own
very existence. It doesn't simply mean that you can go:


grep --help | grep "help"


lbrtchx


Reply to: