[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [OFFTOPIC] Re: pci 0000:00:01:0: MSI quirk detected; subordinated MSI disabled ...



On Fri 30 Apr 2021 at 19:17:55 (-0400), Stefan Monnier wrote:
> > Now I wonder how this might enable random access to the nth
> > character.  I will keep looking around.
> 
> Another part of the question is: why would someone give you the position
> information in terms of characters rather than in terms of (say) bytes,
> or words, or ...

I thought we'd scotched bytes, because the number of bytes in a
character depends on the encoding: the simple letter A occupies
1 byte in ASCII and UTF-8, 2 bytes in UTF-16, and 4 in UTF-32.

> or words,

Counting words raises the problem of compound words, because people
spell them differently: as separate words with spaces, or hyphenated,
or as a single word. In English, compound words tend to evolve in
that order, as we grow comfortable with seeing the words joined
together. So the same sentence might have fewer words in it just
because it was written in more modern times.

> or ...

It's fairly usual to count characters, ignoring whitespace
and punctuation. There are still complications for languages
like Welsh, where there are single letters like ch and ll.
I remember, as a small child on holiday, learning to spell,
pronounce and translate the 58-letter place name
Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch
and not realising for years that it was only 51 letters long,
and didn't really contain a 4-letter repeat.

BTW I, too, naturally googled A-O F, and found the same telescopic
reference as rhkramer, though looking once again, I see this thread is
starting to add far more hits for the term (which I won't repeat here).

Cheers,
David.


Reply to: