# Re: streql - Constant-time string comparison

```On Thu, Oct 30, 2014 at 4:58 AM, Riley Baird
<BM-2cVqnDuYbAU5do2DfJTrN7ZbAJ246S4Xix@bitmessage.ch> wrote:
> On 29/10/14 19:55, Richard van den Berg wrote:
>> On 28-10-14 20:59 , Riley Baird wrote:
>>> As far as I can tell, your code ensures that even if the strings are of
>>> different length, an equality calculation should be performed anyway,
>>> however returning 0, on the grounds that this would make it more
>>> difficult for an attacker to know that the two strings entered were of
>>> different lengths. Is this right?
>>
>> Pardon my ignorance, but how much more difficult does it actually become
>> to determine the two inputs are of different length? In the original the
>> function returns right away if xlen != ylen. If the attacker can control
>> one of the inputs (say x), the change proposed by Joel will cause the
>> time of the compare to increment when xlen in increased until xlen ==
>> ylen. If this can be observed with enough precision the same objective
>> can be achieved.
>
> Good point. Perhaps this could be fixed by,
>
> origleny=len(y)
>
> while len(x) >= len(y):
>         y += '0'
>
> result = i = 0
> for i in xrange(len(x)):
>         result |= ord(x[i]) ^ ord(y[i])
> return result == 0 and len(y) == origleny
>
> This way, the time taken to complete the function will increase even
> after xlen >= ylen
>
> However, with this I'm concerned that the 'while' loop will take up too
> much time, thus still allowing an attacker to see when the string
> lengths become equal. Is there a quicker way to append zeros to a string
> such that it is equal in length to the other string?

Well, allocation of a string would tend to make the run time somewhat
arbitrary, but whether it would mask the length or not is kind of hard
to predict. Considering that the attack uses statistics, it's not hard
to look for double peaks.

I had a sort of similar idea, using a pair of local buffers, cycling
the string through the buffers, and got workable code. But I realized
I was just working too hard. A one byte buffer is plenty, and
simplifies the loop. Seriously simplifies the loop. (I think. Still
haven't really tested it or looked for corner cases.)

I'm going to spend some time testing the following today and tomorrow:

-----------------------------
// The core function: test two regions of memory for bytewise equality
with constant time.
// If cmplength is less than min( xlen, ylen ), comparison is incomplete.
static int equals_internal_constime(
const char *x, unsigned int xlen,
const char *y, unsigned int ylen,
int cmplength) {

int result = 0;

while ( --cmplength >= 0 ) {
char xtemp = 0;
char ytemp = 0;

if ( --xlen >= 0 ) xtemp = *x++;
if ( --ylen >= 0 ) ytemp = *y++;

result |= xtemp ^ ytemp;
}

return (xlen == ylen) && (result == 0);
}
-----------------------------

Picking the minimum length for cmplength when you call it would be the
same as I posted before. Picking the maximum length would bias the
execution time to the longer of the pair. Either way, sequencing
through lengths of guesses would (using statistical methods) reveal a
point where the time goes from constant to variable, as Richard notes
above.

If you can set an absolute max cmplength, you can always call it with
the absolute max and get strictly constant time compares. (Or maybe
use a manifest constant for cmplength.) For instance, for an API
dedicated to passwords, if you know that all passwords are 31 bytes
or less, you can pass 31 for cmplength on the call.

But when the same API could be used for passwords or 2048 bit keys,
you'd need to pass 256 for cmplength, which might be too long a wait
for some things. That's why I suggested the possibility of needing to
change the API.

Another possibility is to use the greater of xlen, ylen, and cmplength
in this function, which would avoid the problem of incomplete compares
I put in the comments, and would allow one to set a length, beyond
which we would not care if the length leaked. (20 or 32 might be
reasonable.) This could be done without changing the API from the
Python side.

A third possibility might be to offer an API where we call the x
parameter "guess" and the y parameter "actual_key", and always set the
cmplength parameter to the guess length. That way, the time would
always be dependent on the attacker's guess (assuming the caller
function treated the parameters correctly).

Completely separate topic, but with just a little more effort and care
in the conditional paths, that could even be made an ordered compare,
by noting the difference of the first pair of bytes that differ.

--
Joel Rees

Be careful when you see conspiracy.
Look first in your own heart,