Re: Endianness of data files in MultiArch (was: Please test gzip -9n - related to dpkg with multiarch support)

To: Jan Hauke Rahm <jhr@debian.org>
Cc: debian-devel@lists.debian.org
Subject: Re: Endianness of data files in MultiArch (was: Please test gzip -9n - related to dpkg with multiarch support)
From: Aron Xu <happyaron.xu@gmail.com>
Date: Fri, 10 Feb 2012 12:29:47 +0800
Message-id: <[🔎] CAMr=8w6qiM6VB_2iegzKMFx=TV+ert6LqELY6NAoqfpAco-6oQ@mail.gmail.com>
In-reply-to: <[🔎] 20120209082314.GA3215@ca.home.jhr-online.de>
References: <[🔎] CAMr=8w494XG1bWJ3LR5rQnjrgRcUNG-E6igQB+xT6BDygPrNaA@mail.gmail.com> <[🔎] 4F32B26F.8050200@debian.org> <[🔎] CAMr=8w6s+itAP8Usgjaqf86MFFypAOp+QJODeTjhdYuMb7AYmw@mail.gmail.com> <[🔎] 20120209082314.GA3215@ca.home.jhr-online.de>

Sorry, the thread was broken and I saw your reply just now.

On Thu, Feb 9, 2012 at 16:23, Jan Hauke Rahm <jhr@debian.org> wrote:
> On Thu, Feb 09, 2012 at 01:58:28AM +0800, Aron Xu wrote:
>>
>> This is valid for most-used applications/formats like gettext, images
>> that are designed to behave in this way, but on the contrary there are
>> upstream that don't like to see such impact, especially due to the
>> complexity and performance impact.
>>
>> Currently I am using arch:any for data files which aren't be affected
>> with multiarch, i.e. not "same" or "foreign". For endianness-critical
>> data that is required to make a library working, I have to force them
>> to be installed into /usr/lib/<triplet>/$package/data/ and mark them
>> as "Multiarch: same", this is sufficient to avoid breakage, but again
>> it consumes a lot of space on mirror.
>
> Actually, what is "a lot" here? I mean, how many libraries are there
> containing endianness-critical data and how big are the actual files?
> Not that I'm any kind of expert, but this solution sounds reasonable to
> me.
>
> Hauke
>

As far as I know, there isn't too many libraries known to have
endianness-critical data, but there might be landmines because the
maintainer just aren't aware about it.

I have the chance to notice this problem because my team maintain
several stack of input methods, which usually need to deal with
linguistic data. [1]

For me here is a library named libpinyin at hand to package, which has
some data files of ~7.5MiB size after gzip -9 (the total size of this
library is no more than 9MiB after gzip -9). We have 14 architectures
on ftp-master, so the data file eats up 105MiB, while if we find some
way to have only one copy for be/le, it'll only use 15MiB. And think
about when it get released as a stable, a new copy of those data is
making their way to the archive when new version get uploaded to
unstable.

Such concern is also valid to other endianness-critical data that are
not bothered with Multi-Arch at present, we need to make them arch:any
and in the end they are eating more and more space.

[1] Performance is critical for these applications, this doesn't mean
it consumes a lot of CPU percentage, but it must response very quickly
to user's input - do some complex calculations to split a sentence
into words and find out a list of most related suggestions, which
needs to query from 10^5 ~ 10^6 lines of data several times to
complete such an action. There was project tried to use something like
SQLite3 but the performance is a bit frustrating, so they have now
decided not to care about that but just design data format that can
fit for their requirements.
-- 
Regards,
Aron Xu

Reply to:

Follow-Ups:
- Re: Endianness of data files in MultiArch
  - From: Goswin von Brederlow <goswin-v-b@web.de>

References:
- Endianness of data files in MultiArch (was: Please test gzip -9n - related to dpkg with multiarch support)
  - From: Aron Xu <happyaron.xu@gmail.com>
- Re: Endianness of data files in MultiArch (was: Please test gzip -9n - related to dpkg with multiarch support)
  - From: Simon McVittie <smcv@debian.org>
- Re: Endianness of data files in MultiArch (was: Please test gzip -9n - related to dpkg with multiarch support)
  - From: Aron Xu <happyaron.xu@gmail.com>
- Re: Endianness of data files in MultiArch (was: Please test gzip -9n - related to dpkg with multiarch support)
  - From: Jan Hauke Rahm <jhr@debian.org>

Prev by Date: Re: severity for bugs in ignoring TMP/TMPDIR?
Next by Date: Re: Use of the first person in messages from the computer
Previous by thread: Re: Endianness of data files in MultiArch (was: Please test gzip -9n - related to dpkg with multiarch support)
Next by thread: Re: Endianness of data files in MultiArch
Index(es):
- Date
- Thread