Re: sha256sum --text generating blank spaces and hyphens?
On 27/04/2023 11:02, David Christensen wrote:
Things get more interesting when you approach the problem as a database.
Save the content wherever and put the metadata into a table -- content
hash (primary key), URL, download timestamp, author, subject, title,
keywords, etc.. Create fully inverted indexes. Create a search engine.
Create a spider. Implementation could range from a CSV/TSV flat-file
and shell/P* scripts, to a desktop database/UI, to a LAMP stack, and
beyond (NoSQL, N-tier). There are distributed file sharing systems
based on such ideas.
I have never tried: "Open-source self-hosted web archiving"
https://github.com/ArchiveBox/ArchiveBox
This one allows to save selected part of a page:
https://github.com/danny0838/webscrapbook/
Reply to: