Re: Invalid UTF-8 byte? (was: Re: utf)

To: debian-user@lists.debian.org
Subject: Re: Invalid UTF-8 byte? (was: Re: utf)
From: <tomas@tuxteam.de>
Date: Wed, 4 Apr 2018 21:24:40 +0200
Message-id: <[🔎] 20180404192440.GA12608@tuxteam.de>
In-reply-to: <[🔎] 20180404184423.sk2eev6i5sy4rn2o@khazad-dum.debian.net>
References: <[🔎] 92aa2f6d-d39f-61a6-311b-f0c45b00b9c9@gmx.com> <[🔎] 201804020837.54725.rhkramer@gmail.com> <[🔎] 20180403004328.f49e19cbe32cfd5773b9e5e7@freenet.de> <[🔎] 201804030743.02707.rhkramer@gmail.com> <[🔎] 20180403135833.3156da4df8b9e11298ae6306@freenet.de> <[🔎] 20180404111823.rcrwymrtpctajffi@khazad-dum.debian.net> <[🔎] 20180404140927.GB3431@tuxteam.de> <[🔎] 20180404184423.sk2eev6i5sy4rn2o@khazad-dum.debian.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Apr 04, 2018 at 03:44:23PM -0300, Henrique de Moraes Holschuh wrote:

[...]

> That said, it is always safe to break valid "modified UTF-8" into
> records using zeroes, as long as you don't expect the result to be valid
> UTF-8 (it isn't valid UTF-8 because NULs will be encoded using a
> non-minimal byte sequence that *will* decode to a zero even if it is
> invalid) or valid modified UTF-8 (it isn't valid modified UTF-8 because
> 0 is not valid as an encoding for NUL in modified UTF-8).  But a lax
> UTF-8 or modified UTF-8 *would* parse "modified UTF-8 with zero as
> record separators" and reconstruct the unicode text properly (but it
> would read the record separators as NULs, so you'd get extra NULs in the
> resulting text).

You are a nasty guy, aren't you ;-)

Pretty cunning...

Cheers
- -- t
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlrFJngACgkQBcgs9XrR2kZqLgCdEuap+rqSU6HCrXpkL6XHl3Az
lRUAnjwGhiMNNlY+SXwIxpd/kfnvst1z
=kHBa
-----END PGP SIGNATURE-----

Reply to:

References:
- utf
  - From: mess-mate <mess-mate@gmx.com>
- Invalid UTF-8 byte? (was: Re: utf)
  - From: rhkramer@gmail.com
- Re: Invalid UTF-8 byte? (was: Re: utf)
  - From: Michael Lange <klappnase@freenet.de>
- Re: Invalid UTF-8 byte? (was: Re: utf)
  - From: rhkramer@gmail.com
- Re: Invalid UTF-8 byte? (was: Re: utf)
  - From: Michael Lange <klappnase@freenet.de>
- Re: Invalid UTF-8 byte? (was: Re: utf)
  - From: Henrique de Moraes Holschuh <hmh@debian.org>
- Re: Invalid UTF-8 byte? (was: Re: utf)
  - From: <tomas@tuxteam.de>
- Re: Invalid UTF-8 byte? (was: Re: utf)
  - From: Henrique de Moraes Holschuh <hmh@debian.org>

Prev by Date: Re: utf
Next by Date: Re: Invalid UTF-8 byte? (was: Re: utf)
Previous by thread: Re: Invalid UTF-8 byte? (was: Re: utf)
Next by thread: Re: Invalid UTF-8 byte?
Index(es):
- Date
- Thread