[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Glibc 2.28 breaks collation for PostgreSQL (and others?)



* Christoph Berg:

> with the update to glibc 2.28, collation aka sort ordering is
> changing:
>
> $ echo $LANG
> de_DE.utf8
> $ (echo 'a-a'; echo 'a a'; echo 'a+a'; echo 'aa') | sort
>
> stretch:
>   aa
>   a a
>   a-a
>   a+a
>
> buster:
>   a a
>   a+a
>   a-a
>   aa
>
> A vast number of locales is affected, including en_US, possibly all of
> them.
>
> For PostgreSQL, this means that the ordering of indexes on disk is
> becoming corrupt, and all "text" (varchar, char, ...) indexes need to
> be rebuilt. (And worse, if that is not done immediately, the tables
> might become corrupt because some tuples aren't index-visible anymore
> due to the incorrect btree ordering.)

That's fairly normal in a glibc update.  glibc upstream prefers it
this way.  I've discussed it several times with other glibc
maintainers.

My understanding is that ICU provides versioned collation tables,
which would allow you to avoid this issue.

  <https://blog.2ndquadrant.com/icu-support-postgresql-10/>

> I've been thinking about this for some time, and the best I could come
> up so far is "raise a debconf note that people need to invoke REINDEX
> DATABASE". The open question about this plan is, how should this note
> be triggered.

That might not work for unique indices because locale data changes
could cause strings to sort the same that were distinct before the
update.


Reply to: