[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: sha256sum --text generating blank spaces and hyphens?



On 4/26/23, David Wright <deblis@lionunicorn.co.uk> wrote:
> I guess you need the expense of sha256 rather than md5 as you're
> downloading the entire web?

 I am not downloading the entire web. I have no way of knowing how
they entertained those ideations but I think we could use their
estimate when google said that approximately 1 million and a half
books have been ever published. Think of it! It is not that much data.
It would all fit nicely in one hard drive; include some searching
capability and "bye bye google" will be the name of your movie. At
times you need to gain a sense of things before going into exposed
mode to search for something (which these days means making sure you
are not being baited into something else)

On 4/26/23, Dan Ritter <dsr@randomstring.org> wrote:
> The only characters used in the sha256 hash itself are [a-f] and
> [0-9]

 Yes, I knew that; that is why I could not understand why sha256sum
was being "courteous" to me.

On 4/26/23, Nicolas George <george@nsup.org> wrote:
> shaXsum always writes X/4 hexadecimal nibbles then two spaces then the
> file name. If the input is from stdin, then the convention is the file
> name is ‘-’.
>
> (Well, not always always: if the file name contains very special
> characters, it will use an escaped output format. And there is the -z
> option.)

On 4/26/23, Thomas Schmitt <scdbackup@gmx.net> wrote:
> "FILE" is the minus-sign for standard input. The second blank is there
> to indicate the text mode of sha256sum.
> Only the first blank is somewhat puzzling. But it's always there.
>
>
> https://www.gnu.org/software/coreutils/manual/html_node/sha2-utilities#sha2-utilities
> points to
>
> https://www.gnu.org/software/coreutils/manual/html_node/md5sum-invocation.html
> which says
>   For each file, ‘md5sum’ outputs by default, the MD5 checksum, a space,
>   a flag indicating binary or text input mode, and the file name. Binary
>   mode is indicated with ‘*’, text mode with ‘ ’ (space). Binary mode is
>   the default on systems where it’s significant, otherwise text mode is
>   the default. The cksum command always uses binary mode and a ‘ ’
>   (space) flag.
>
> So the first blank can be relied on and thus the proposal by Andy Smith
> to use "awk '{print $1}'" is valid.

 OK, now I see why cutting off the string on the first space that
appears is safe. I never saw such cases because I always used sha*sums
on files. I would expect if a user enters a string via printf that was
all there was to it. Of course, sha*sums can tell apart a file from a
text string.

On 4/26/23, Jeffrey Walton <noloader@gmail.com> wrote:
> There's no guarantee a URL will map onto a filesystem.

> I seem to
> recall Stunnel tried to do that in a caching mode, but it had weird
> corner cases. (In addition to problems with filesystems that had
> character set and path limitations).

 Well, no; and I am fine with:
 a) trying to best match both; the URL path as best as possible
 b) the extra malabarism base64-ing and hashsing the name of the file ...

 Something I have learned as a corpora research kind of guy is not to
ever try to "educate" people. I would just take their sh!t as they
dump it and cleanse, deal with it!

 You would not hear the end of it if I start telling stories of the
kind of cr@p you find out there when you look at the web from that
point of view.

> I think your best bet is to digest the URL into a representation. I
> suggest using SipHash+Base64 or Base64URL. SipHash provides collision
> resistance, a uniform distribution, and its fast. SipHash has a very
> good pedigree since it was designed by Jean-Philippe Aumasson and
> Daniel J. Bernstein. The final Base64 or Base64URL encoding ensures
> you stay within printable character range without reserved file system
> characters.

 Thank you I will look into what they did when I get a chance,

 lbrtchx


On 4/26/23, Albretch Mueller <lbrtchx@gmail.com> wrote:
> On 4/26/23, David Wright <deblis@lionunicorn.co.uk> wrote:
>> I guess you need the expense of sha256 rather than md5 as you're
>> downloading the entire web?
>
>  I am not downloading the entire web. I have no way of knowing how
> they entertained those ideations but I think we could use their
> estimate when they said that approximately 1 million and a half books
> have been ever published. Think of it! It is not that much data. It
> would all fit nicely in one hard drive include some searching
> capability and "bye bye google" will be the name of your movie.
>
> On 4/26/23, Dan Ritter <dsr@randomstring.org> wrote:
>> The only characters used in the sha256 hash itself are [a-f] and
>> [0-9]
>
>  Yes, I knew that; that is why I could not understand why sha256sum
> was being "courteous" to me.
>
> On 4/26/23, Nicolas George <george@nsup.org> wrote:
>> shaXsum always writes X/4 hexadecimal nibbles then two spaces then the
>> file name. If the input is from stdin, then the convention is the file
>> name is ‘-’.
>>
>> (Well, not always always: if the file name contains very special
>> characters, it will use an escaped output format. And there is the -z
>> option.)
>
> On 4/26/23, Thomas Schmitt <scdbackup@gmx.net> wrote:
>> "FILE" is the minus-sign for standard input. The second blank is there
>> to indicate the text mode of sha256sum.
>> Only the first blank is somewhat puzzling. But it's always there.
>>
>>
>> https://www.gnu.org/software/coreutils/manual/html_node/sha2-utilities#sha2-utilities
>> points to
>>
>> https://www.gnu.org/software/coreutils/manual/html_node/md5sum-invocation.html
>> which says
>>   For each file, ‘md5sum’ outputs by default, the MD5 checksum, a space,
>>   a flag indicating binary or text input mode, and the file name. Binary
>>   mode is indicated with ‘*’, text mode with ‘ ’ (space). Binary mode is
>>   the default on systems where it’s significant, otherwise text mode is
>>   the default. The cksum command always uses binary mode and a ‘ ’
>>   (space) flag.
>>
>> So the first blank can be relied on and thus the proposal by Andy Smith
>> to use "awk '{print $1}'" is valid.
>
>  OK, now I see why cutting off the string on the first space that
> appears is safe. I never saw such cases because I always used sha*sums
> on files. I would expect if a user enters a string via printf that was
> all there was to it. Of course, sha*sums can tell apart a file from a
> string a plain text.
>
> On 4/26/23, Jeffrey Walton <noloader@gmail.com> wrote:
>> There's no guarantee a URL will map onto a filesystem.
>
>> I seem to
>> recall Stunnel tried to do that in a caching mode, but it had weird
>> corner cases. (In addition to problems with filesystems that had
>> character set and path limitations).
>
>  Well, no; and I am fine with:
>  a) trying to best match both; the URL path as best as possible
>  b) the extra malabarism base64-ing and hashsing the name of the file ...
>
>  Something I have learned as a corpora research kind of guy is not to
> ever try to "educate" people. I would just take their sh!t as they
> dump it and cleanse, deal with it!
>
>  You would not hear the end of it if I start telling stories of the
> kind of cr@p you find out there when you look at the web from that
> point of view: from folks at archive.org who would list: "Henry
> Valentine Miller", "Henry V. Miller", "Henry Miller", "henry miller",
> "Miller, Henry", "Miller, Henry 12-1891 06-1980" apparently as
> different authors/"creators", to the gutenberb.org large text bank
> including some protagonistic bs in the actual texts, to developers of
> libreoffice watermarking text with some cr@p which of course is being
> used for "monitoring" purposes by the kinds of folks who put
> "intelligence" in the names of the organizations they work for and to
> make sure they are making sense they put flags around them when they
> fart through their mouths whatever nonsense they think of.
>
>  I had had rehearsing day dreams about becoming a dictator of the
> world ;-) and making people do "the right thing" (tm) ... until I had
> once an epiphany while watching Trump talk to a media prestitude who
> caracteristically wasn't making much sense. After asking a few
> questions trying to make sense of what she was saying, prestitude said
> "let me formulate it better". Trump quietly sat back saying: "OK, take
> your time"!!!
>
>  I was amazed! There you have someone the U.S. media, who as a mouth
> piece of the status quo, were being viscerally offensive towards
> anything relating to him, including posting on the front page of
> mainstream US news papers naked pictures of his wife and mother of his
> child one month before she became "the first lady" and he took it
> easy, respectfully on her! That was the best case I have noticed so
> far of "separating the message from the messenger". I mean people who
> erect all those pay walls and somehow see themselves as authoring,
> guarding content are not even the messengers and we all have to put up
> with their bs.
>
>> I think your best bet is to digest the URL into a representation. I
>> suggest using SipHash+Base64 or Base64URL. SipHash provides collision
>> resistance, a uniform distribution, and its fast. SipHash has a very
>> good pedigree since it was designed by Jean-Philippe Aumasson and
>> Daniel J. Bernstein. The final Base64 or Base64URL encoding ensures
>> you stay within printable character range without reserved file system
>> characters.
>
>  Thank you I will look into what they did when I get a chance,
>
>  lbrtchx
>


Reply to: