[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: One-line password generator



On Fri, Sep 01, 2017 at 09:38:14PM -0500, Mario Castelán Castro wrote:
> On 01/09/17 18:43, Zenaan Harkness wrote:
> > (Probably obvious, but as long as you're reading from urandom,
> > "entropy" is the wrong word, in this context, better to say "128 bits
> > of crytographically secure numbers" as that which has been said e.g.
> > by the Linux kernel urandom developers as being "crypographically
> > secure" has changed a few times, and may change again in the future -
> > it it truly were entropy (as /dev/random suggests it provides), the
> > ongoing changes for "security" would not be necessary.)
> 
> No. Entropy is the appropriate word. Please recall that “entropy” is
> just a different scale

Use of the word "scale" is one example of things that lead people to
use loose terms like "stretching of entropy", which, though useful in
certain contexts, not only readily give rise to imprecise
comprehension in the mind of someone who has no robust definition of
the term, but is mathematically bogus on the face of it, unless one
gets really really precise in each and every definition of every term
in ones "turtles on turtles" stack of term.

We humans are in general woefully untrained in axiomatic
communication and thus abundant confusions and misunderstandings
arise (and I'm no less guilty of misunderstandings than anyone else -
this is in the nature of human communication).


> for probability and quantities comparable to
> probability (like expected probability). Nothing more, nothing less.

https://en.wikipedia.org/wiki/Entropy_(information_theory)

"Information entropy is defined as the average amount of information
produced by a probabilistic stochastic source of data."

(See also for disambiguation:
https://en.wikipedia.org/wiki/Entropy_(disambiguation)#Information_theory_and_mathematics
)

Now let's go to that first links second sentence:
"The measure of information entropy associated with each possible
data value is the negative logarithm of the probability mass function
for the value."

I am not mathematically literate enough to even properly parse that
sentence!

The last two sentences of that first paragraph sound a little more
comprehendable/ promising:

"Generally, entropy refers to disorder or uncertainty, and the
definition of entropy used in information theory is directly
analogous to the definition used in statistical thermodynamics. The
concept of information entropy was introduced by Claude Shannon in
his 1948 paper "A Mathematical Theory of Communication"."

>From my naieve comprehension of what I read here, the term "entropy
stretching" kind of makes sense - there's a statistical "amount" of
randomness, and that randomness is "spread" over the Linux kernel's
"entropy pool" by the mixing function (ChaCha or something these
days), and so there may only be 1 bit of entropy fed into that pool
in say a 5 minute period, which could make it easier for an attacker
to reverse-calculate a primary key generated some minutes ago if say
he is able to suck out a high rate of numbers directly from the
kernel's /dev/urandom "output source".

Yet as soon as one more bit is fed ("randomly" by the mixing
function) into that pool about 5 minutes later (and thereafter
further bits each 5 minutes), the difficulty of reverse calculating
what some key-generating program may have been delivered by
/dev/urandom ~5minutes ago, steadily becomes exponentially more
difficult - even IF you were able to extract a high rate of output
from /dev/urandom.

On this basis of "understanding", I can understand why Ted Ts'o
appears to be saying "just use /dev/urandom, even when you're
generating primary keys for highly important data" - there's enough
"entropy spread 'randomly' throughout the kernel's 'entropy' pool"
that /dev/urandom is nowhere near the weak link in your security
(software) stack, let alone your hardware stack with e.g. Intel RME
etc etc.


> Also note that all the theoretical (and very unrealistic) attacks on
> /dev/urandom apply only when the attacker knows part of the *past*
> output of /dev/urandom, and he uses this to predict the *current* and
> *future* output of /dev/urandom. This is not applicable in our scenario.

In general, absolutely yes. The difficulty of extracting any rate of
numbers out of your target's remote server's /dev/urandom device, is
a significant challenge in and of itself, and if you've achieved
that, you almost certainly have 0wned the machine already, and it's
at that point one hell of a lot easier to just scan the memory of
that server to extract the keys of interest directly - thus in real
world scenarios, such /dev/urandom attacks are (have always been?)
pretty close to "entirely theoretical, no one would ever bother
anyway given any reasonable implementation of /dev/urandom".


> In short: Given that the state of the CSPRNG is larger than the amount
> of bits read[1], the bits can be assumed to be distributed at random.

Ack.


> Longer answer:
> 
> According to my reading of
> <https://github.com/torvalds/linux/blob/master/drivers/char/random.c>,
> /dev/urandom uses a variation of ChaCha20 which is periodically
> re-seeded from the “entropy pool”.
> 
> In a reasonable scenario for password generation, the attacker does not
> know the state of the 512-bit CRNG state, and so the best he can do in
> practice is to model it with uniform probability distribution.

Ack.


> According to my understanding, the output of /dev/urandom when reading
> with my command will be truncate(ChaCha20(X)) where (X) is the aforesaid
> 512-bit state and “truncate” is the function that returns the first 128
> bits of its input. The processing with ChaCha20 and truncation skew the
> distribution a bit, but this is negligible.

Interesting - I thought ChaCha was being used because it was such a
good (non-skewing, suitably crypto-random mixing, reasonably
performant) algorithm.

Even theoretical attacks will undoubtedly focus on this skewing, if
indeed ChaCha20, or the implementation of it in the kernel, is
actually skewing.


> As a side note, I noticed that Linux uses weird constants in the
> ChaCha20 input for the aforesaid CSPRNG: the ASCII text “expand 32-byte
> k”. This looks like a bad choice, but I doubt that it has any security
> impact in practice.

I assume the opposite - almost always, such constants will and do
effect security of the algorithm, AIUI. It may be that this constant
is a "more recent/ more recommended" constant to use over the one in
"the official ChaCha20 standard" (whatever/wherever that is), or that
ChaCha2 says something like "you can generate constants for this
constant value in the following way, and it is (or is not, or doesn't
matter) recommended to do so.


> Anyway, they should have used the constants
> recommended by D, J. Bernstein (the designer of ChaCha20).

I have not read his ChaCha paper for a long time - I don't remember
what he actually recommends in this regard. It is unwise for either
of us to express certainty on the matter when what he actually says
can be readily looked up.


> [1]: 384 bits according to my understanding, since 128 of the 512 bits
> feed to ChaCha seem to be fixed to the ASCII “expand 32-byte k”.


Good luck,


Reply to: