[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#451886: fgets() and poison NULL byte attacks (aka NULL escapes)



package: libc6
version: 2.3.6.ds1-13etch2
severity: wishlist

Possible partial fix for fgets and alternatives.

Bug #57729 is marked as done.  It could be fixed for real.
I have found null escapes a pretty reliable way of breaking many C
programs including various editors.

The standard C <stdio.h> fgets function is just plain wrong.

fputs stops at a '\0', but fgets may only stop at '\n' even if a '\0'
is encountered.  There is no way of knowing how much is read by fgets.
These functions are inconsistent and are not binary safe.

The standard should be fixed.

In fcat2.c I add a function int fngets(char *s, int size, FILE *stream);
which has similar functionality to GNU getline.  This can be used with
fwrite.  As fngets does not exist it could be added by any <stdio.h>
user and will be portable.

In fcat.c I simply fix the broken fputs function in such a way that it
should not break existing use.  I also suggest some macros to check the
<stdio.h> buffer so that more efficient <unistd.h> can be mixed, even
though POSIX makes it clear that this should not be done.

How many extra CPU cycles to match '\n' or '\0' rather than just
match '\n' in the library fgets functions?

I give ESMTP as an example of a new line matching problem as this often
uses <stdio.h> and is often exploited by remote.  Many other
applications match new lines and could also be fed '\0's by remote.

Trivial examples attached.  These are not thread safe.
/*BINFMTC: -O -Wuninitialized -Werror -pedantic-errors

Binary safe <stdio.h> functions.

fputs stops at a '\0', but fgets may only stop at a '\n'.
These functions are inconsistenat and are not binary safe.

fgets could be made binary safe without breaking existing applications,
but its safe use would be inefficient.  See exaample fcat.c.

> $ gcc -o fcat2.out fcat2.c
> $ ./fcat2.out <fcat2.out >fcat2.out2
> $ cmp fcat2.out*

Define fngets and fnputs.  These functions use <stdio.h>, still store a
null '\0', however they read past null '\0' and return the number read
or written.

*/
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <errno.h>

/* returns size_t of number written */
#define fnputs(x,y,z) fwrite_unlocked((x),1,(y),(z))

/* Reads size-1 characters up to \n including any \0 cf. getline */
int fngets(char *s, int size, FILE *stream)
{
	char *p;
	int c;

	p=s,c=EOF;
	while(--size>0&&(c=getc_unlocked(stream))!=EOF) {
		*p++=(char)c;
		if (c=='\n') { /* Will read '\0'. */
			break;
		}
	}
	*p='\0'; /* Always mark the end. */
	if(c==EOF) { /* Expected. */
		if(feof_unlocked(stream)) {
			if(p==s) {
				return -1;
			}
		} else {
			return -1;
		}
	}
	return p-s; /* Return something meaningful. */
}

/* fdopen of TCP_NODELAY SO_KEEPALIVE socket 
   call fflush for continuation lines teergrube from hell */

int main() {
	char b[1024];
	int i;

	/*(void)setvbuf(stdout,(char *)NULL,_IOLBF,0);*/
	/*setlinebuf(stdout);*/ /* before fprintf(stderr,...) */

	while((i=fngets(b,1024,stdin))>0) {
		(void)fprintf(stderr,"Got %d bytes.\n",i);
		(void)fprintf(stderr,"Put %u bytes.\n",\
		fnputs(b,i,stdout));
		(void)fflush(stdout); /* after fprintf(stderr,...) */
	}

	return 0;
}



/*BINFMTC: -O -Wuninitialized -Werror -pedantic-errors

We often need new line matching, but the peer may have sent additional
information after the new line so we can not use the light weight read.

POSIX has adopted the <stdio.h> functions such as fgets which match new
lines.

The problem with UNIX/POSIX/IEEE/ISO fgets definition is that it may not
necessarily be binary safe.  This should really be fixed in a way that
is compatible with existing POSIX applications.

Bug #57729 says use non POSIX getline GNU extension which returns length
rather than the infamous pointer. (dietlibc: What were they smoking?)

The function libc6 fgets_unlocked has read past the '\0' and there is no
easy way to find how much has been read.

> $ gcc -o fcat.out fcat.c
> $ ./fcat.out <fcat.out >fcat.out2
> $ cmp fcat.out*
> fcat.out fcat.out2 differ: byte 9, line 1

The Linux manpage for fgets(3) says: -

> fgets() reads in at most one less than size characters from stream  and
> stores  them  into  the buffer pointed to by s.  Reading stops after an
> EOF or a newline.  If a newline is read, it is stored into the buffer.
> A '\0' is stored after the last character in the buffer.

It does not say that reading should stop at a '\0'.

The behaviour of fgets in the POSIX manpage is also unclear if fgets
reads a '\0'.

UTF-16 could be used with C99 wcslen which matches L'\0', but UTF-16 is
inefficient even for Asian languages so it should not be used.

ESMTP 8BITMIME is fine for ISO-8859-15 which does not contain '\0' or
control characters, but it is already misused with other character sets so
if 8BITMIME is implemented then it should be made binary safe.  So long as
the MUA ensures that messages including binary ones have a "\n.\r\n" to
terminate the message.  This could be done by matching new lines and then
".\r\n".

Alternatively a way of mixing <stdio.h> and <unistd.h> should exist
as read() and write() are binary safe.  You could fread up to fnread
until feob with fread.  Then you could use read.  This would make the
standard POSIX fileno() function more useful.  A turn such as ESMTP
without PIPELINING could be used to ensure that binary data should not
be in the <stdio.h> buffer, however with illegal PIPELINING your
<stdio.h> buffer may never be empty.  The real fix will be to make use
of <stdio.h> binary safe.

Mixing <stdio.h> and <unistd.h> would also help with functions like
Linux sendfile.

In a particular version of GNU/linux and libc6 this works: -
#define feob_unlocked(a)	((a)->_IO_read_ptr>=(a)->_IO_read_end)
#define fnread_unlocked(a)	((a)->_IO_read_end-(a)->_IO_read_ptr)

These are only whishlist bugs as to be useful the defects in standards
need to be corrected and GNU already provides the getline extension.
The defect is really in The Single UNIX Specification, Version 3 XBD,
IEEE Std 1003.1-2001 and ISO/IEC 9899:1999.

I have not thought about thread safe versions as I use fork().

For now the portable way to match '\n' is to implement your own fgets.

*/

#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <errno.h>

char *my_fgets(char *s, int size, FILE *stream)
{
	char *p;
	int c;

	p=s,c=EOF;
	while(--size>0&&(c=getc_unlocked(stream))!=EOF) {
		*p++=(char)c;
		if (c=='\n'||c=='\0') { /* Many do not stop at '\0'. */
			break;
		}
	}
	*p='\0'; /* Always mark the end. */
	if(c==EOF) { /* Expected. */
		if(feof_unlocked(stream)) {
			if(p==s) { /* Got nothing - not even '\0'. */
				return (char *)NULL;
			}
		} else { /* Other error. */
			return (char *)NULL;
		}
	} /*else got something or an empty string which means null by itself*/
	return s;
}

int main() {
	char b[1024];
	int i;

	/*while(fgets_unlocked(b,1024,stdin))*/
	while(my_fgets(b,1024,stdin)) {
		i=strlen(b); /* Resigned to the fact <stdio.h> */
		if(i>0) { /* must scan the same thing many times. */
			(void)fputs_unlocked(b,stdout);
			if(b[i-1]=='\n') { /* Got a line. */
				(void)fflush_unlocked(stdout);
			} else {
				if(i<1023) { /* Got up to a null. */
					(void)putc_unlocked(0,stdout);
				} /* Got part of a line. */
			}
		} else { /* Got null by itself. */
			(void)putc_unlocked(0,stdout);
		}
	}
	(void)fflush_unlocked(stdout);

	return 0;
}




Reply to: