[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: sha256sum --text generating blank spaces and hyphens?



On 4/26/23 00:41, Albretch Mueller wrote:
  This is not a debian question per se (more like a Linux bash one),
but I wasn't able to find an answer on the Internet.

  Here is first the problem I am having before you start reading a
conspiracy theory into it ;-)

  I need to somehow map URL on the web to a local file, but you can't
do that for two main reasons:

  1) URLs are free text
  2) which people take to their heart's content.

  Take for example:

  https://dokumen.pub/qdownload/nietzsche-und-der-deutsche-geist-band-4-ausbreitung-und-wirkung-des-nietzscheschen-werkes-im-deutschen-sprachraum-bis-zum-ende-des-zweiten-weltkrieges-ein-schrifttumsverzeichnis-der-jahre-1867-1945-ergnzungen-berichtigungen-und-gesamtverzeichnisse-zu-den-bnden-i-iii-9783110202861-9783110189865-3110189860.html

  that file and the pdf you would download I need to map to a local
directory looking like: ... /pub/dokumen/qdownload/ ...

  but the file name (excluding the extension) is 306 characters long,
which Windows NTFS would not swallow. There may be also funky rules
regarding character sets and where in a string certain chars may be
used; so, as a way to work around those kinds of problems I:

  a) encode the string name as base64
  b) calculate the sha256sum of §a
  c) use §b as file name (of course, leaving the original extension as it is)
  d) include a "§b_file_name.txt" plain text file decriptor which only
content is the actual prehash name of that file.


  https://dokumen.pub/qdownload/nietzsche-und-der-deutsche-geist-band-4-ausbreitung-und-wirkung-des-nietzscheschen-werkes-im-deutschen-sprachraum-bis-zum-ende-des-zweiten-weltkrieges-ein-schrifttumsverzeichnis-der-jahre-1867-1945-ergnzungen-berichtigungen-und-gesamtverzeichnisse-zu-den-bnden-i-iii-9783110202861-9783110189865-3110189860.html
  _TXT="nietzsche-und-der-deutsche-geist-band-4-ausbreitung-und-wirkung-des-nietzscheschen-werkes-im-deutschen-sprachraum-bis-zum-ende-des-zweiten-weltkrieges-ein-schrifttumsverzeichnis-der-jahre-1867-1945-ergnzungen-berichtigungen-und-gesamtverzeichnisse-zu-den-bnden-i-iii-9783110202861-9783110189865-3110189860"
  _B64TXTENC=$(printf '%s' "${_TXT}" | base64 )
  echo "// __ \$_B64TXTENC: |${_B64TXTENC}|"
  _B64TXTDEC=$(printf '%s' "${_B64TXTENC}" | base64 --decode)
  echo "// __ \$_B64TXTDEC: |${_B64TXTDEC}|"
  if [[ "${_TXT}" == "${_B64TXTDEC}" ]]; then
   echo "// __ [[ \${_TXT} == \${_B64TXTDEC} ]]: |${_TXT}|"
   _SHA256=$(printf '%s' "${_TXT}" | sha256sum --text )
   echo "// __ \$_SHA256: |${_SHA256}|"
  fi

// __ $_SHA256:
|7d5895cb24ab49692a8ad495e036074fec8e61b22040544f02a9b69c926dbdeb  -|

  I am trying to avoid funky characters and sha256sum --text still
generates them!?!

  I work like this because I need replicate the original URL as a local
path in a way that would be compatible any file system.

  Do you know of a better way to deal with such issues?

  lbrtchx


I will assume you have solved the sha256sum output issue. (I would use Perl and Digest::SHA.)


I suggest hashing the document content rather than the URL. This would work nicely for static documents.


David


Reply to: