[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#470220: marked as done (unicode ligatures to ASCII)



Your message dated Mon, 08 Jan 2024 22:49:16 +0000
with message-id <[🔎] E1rMyQm-003bRM-0l@fasolo.debian.org>
and subject line Bug#1060239: Removed package(s) from unstable
has caused the Debian Bug report #470220,
regarding unicode ligatures to ASCII
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
470220: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=470220
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: uni2ascii
Version: 4.4-1
Severity: minor

I would like to discuss today the Unicodes
¯ ’“”− ff fi fl ffi ...
that is
00AF 2019 201C 201D 2212 FB00 FB01 FB02 FB03 ...

You see, I noticed them when I used pdftotext on
http://www.cs.ucr.edu/~anirban/Anir-networking07.pdf
and then tired to read the results on my ASCII PDA.

I wish pdftotext had a flag to make the output ASCII.

Anyway, even uni2ascii -ydpxef wouldn't get all of them into ASCII.
The ligatures remained -- but turned into 0x codes. (P.S., I wish
there was one flag to "give me best ASCII", lest one ponder the man
page too long.) Also apparently there is no way to get uni2ascii to
not turn what it can't deal with to 0x codes, and let sail thru for
some other filter to complete the job.

Now turning to pstotext, whose man page says "pstotext deals better
with punctuation and ligatures." Not in this case.

Now turning to Text::Unidecode: sorry: mangled ligatures.

Anyways, I ended up having to write by hand:

#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
    s/¯/_/g; #just a guess
    s/’/'/g;
    s/“/"/g;
    s/”/"/g;
    s/−/-/g;
    s/ff/ff/g;
    s/fi/fi/g;
    s/fl/fl/g;
    s/ffi/ffi/g;
    s/ffl/ffl/g;
    s/ſt/ft/g;
    s/st/st/g;
    print;
}



--- End Message ---
--- Begin Message ---
Version: 1.9-7+rm

Dear submitter,

as the package pstotext has just been removed from the Debian archive
unstable we hereby close the associated bug reports.  We are sorry
that we couldn't deal with your issue properly.

For details on the removal, please see https://bugs.debian.org/1060239

The version of this package that was in Debian prior to this removal
can still be found using https://snapshot.debian.org/.

Please note that the changes have been done on the master archive and
will not propagate to any mirrors until the next dinstall run at the
earliest.

This message was generated automatically; if you believe that there is
a problem with it please contact the archive administrators by mailing
ftpmaster@ftp-master.debian.org.

Debian distribution maintenance software
pp.
Thorsten Alteholz (the ftpmaster behind the curtain)

--- End Message ---

Reply to: