[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: pci 0000:00:01:0: MSI quirk detected; subordinated MSI disabled ...



On 29/04/2021 14:03, Albretch Mueller wrote:
>> What is "alpha-offset format"?
>  we, corpora research kinds of folks, need to process thousand of
> files as other people process bytes. UTF8 was basically an
> Americanizierung of alle alphabets. UTF is great to describe an
> alphabet but not for text files.
>
>  UTF8 turned all files into streams not good for questions such as
> what is the charatcer/string sequence starting on the nth addressable
> unit of a file ...

Depends on what you mean by "addressable unit", surely? UTF8 is a
variable-length record format, but it's still addressable. Basically,
it's like taking a CSV file and saying "what's the contents of the cell
starting at byte 123"? CSV cells are variable length. Perhaps there
isn't such a cell. If you want to know the contents of the cell which
includes byte 123, then you need some context, don't you?

>
>  Doing that with utF8 is from way too complicated to impossible. Also
> alpha offset nicely splits the files segments into its different
> parts: ALPHABETICAL text, js, css, ...
So, do you use something more like UTF-32?
>
>  lbrtchx
>

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


Reply to: