Re: Ruby help needed (Was: Bug#676114: wordnet: FTBFS: debian/wn-for-goldendict.rb:465: invalid multibyte char (US-ASCII))

To: Per Andersson <avtobiff@gmail.com>
Cc: Andreas Tille <andreas@an3as.eu>, Debian Ruby List <debian-ruby@lists.debian.org>, 676114@bugs.debian.org, "Dmitry E. Oboukhov" <unera@debian.org>, Debian Mentors List <debian-mentors@lists.debian.org>
Subject: Re: Ruby help needed (Was: Bug#676114: wordnet: FTBFS: debian/wn-for-goldendict.rb:465: invalid multibyte char (US-ASCII))
From: Antonio Terceiro <terceiro@debian.org>
Date: Tue, 5 Jun 2012 21:37:32 -0300
Message-id: <[🔎] 20120606003732.GD7517@debian.org>
Mail-followup-to: Per Andersson <avtobiff@gmail.com>, Andreas Tille <andreas@an3as.eu>, Debian Ruby List <debian-ruby@lists.debian.org>, 676114@bugs.debian.org, "Dmitry E. Oboukhov" <unera@debian.org>, Debian Mentors List <debian-mentors@lists.debian.org>
In-reply-to: <[🔎] CABYrXST9JdaCZTr8v81eD75EAmu9cu5d71rRMoEMNg7Su1tBQw@mail.gmail.com>
References: <20120604223349.GA26743@xanadu.blop.info> <[🔎] 20120605140851.GG10297@an3as.eu> <[🔎] 4FCE2532.9090705@intertwingly.net> <[🔎] 20120605165338.GA24393@an3as.eu> <[🔎] CABYrXST9JdaCZTr8v81eD75EAmu9cu5d71rRMoEMNg7Su1tBQw@mail.gmail.com>

Hi,

Per Andersson escreveu isso aí:
> Please see the attached patch and try if it works.

Heh, except that you missed other usages of the same pattern. Andreas,
please try the attached patch.

As Per explained, the problem is that in Ruby 1.8, a string is an array
of bytes, so str[0] returns a number (the byte at position 0).  Ruby 1.9
is fully encoding-aware, so a string is an array of characters and
str[0] returns a string that is the first *character* in str.

What I did was replacing the occurrences of str[0] with str.bytes.first
to explicitly request the first *byte* in str.

The resulting output when run with Ruby 1.9 is pretty much the same as
the original version generates when run by Ruby 1.8, *except* for the
ordering between lowercase and lowercase letters. Maybe that is due to
some other detail, but understanding that script completely is too much
for me. :)

If that's not acceptable, you can also (at least for Wheezy) run the
script from debian/rules with `ruby1.8` instead of `ruby` and build
depend on ruby1.8 explicitly.

-- 
Antonio Terceiro <terceiro@debian.org>

Index: debian/wn-for-goldendict.rb
===================================================================
--- debian/wn-for-goldendict.rb	(revision 44965)
+++ debian/wn-for-goldendict.rb	(working copy)
@@ -1,4 +1,5 @@
 #!/usr/bin/env ruby
+# encoding: utf-8
 
 # A script to convert WordNet 3.0 dictionary from original
 # format (http://wordnet.princeton.edu/wordnet/download/)
@@ -293,14 +294,14 @@
     @offset = data[0].to_i
     @lex_filenum = data[1]
     @pos = data[2]
-    @w_cnt = [data[3]].pack('H2')[0]
+    @w_cnt = [data[3]].pack('H2').bytes.first
     @words = []
     i = 4
     @lex_ids = []
     @w_cnt.times {
       @words << data[i].gsub(/_/, ' ').gsub(/\s*\((p|a|ip)\)\s*$/, '')
       i += 1
-      @lex_ids << [data[i]].pack('h')[0]
+      @lex_ids << [data[i]].pack('h').bytes.first
       i += 1
     }
 
@@ -362,8 +363,8 @@
     if (src_target == "0000")
       return other.words
     else
-      src = [src_target[0, 2]].pack('H2')[0]
-      target = [src_target[2, 2]].pack('H2')[0]
+      src = [src_target[0, 2]].pack('H2').bytes.first
+      target = [src_target[2, 2]].pack('H2').bytes.first
       h_src = words[src - 1]
       if (h_src == headword)
         return [other.words[target - 1]]
@@ -374,7 +375,7 @@
   end
   def get_frame_data(headword, frame)
     f_num = frame[0].to_i
-    w_num = [frame[1]].pack('H2')[0]
+    w_num = [frame[1]].pack('H2').bytes.first
     if (w_num == 0)
       return [$frames[f_num]]
     else

Attachment: signature.asc
Description: Digital signature

Reply to:

Follow-Ups:
- Re: Ruby help needed (Was: Bug#676114: wordnet: FTBFS: debian/wn-for-goldendict.rb:465: invalid multibyte char (US-ASCII))
  - From: "Dmitry E. Oboukhov" <unera@debian.org>

References:
- Ruby help needed (Was: Bug#676114: wordnet: FTBFS: debian/wn-for-goldendict.rb:465: invalid multibyte char (US-ASCII))
  - From: Andreas Tille <andreas@an3as.eu>
- Re: Ruby help needed (Was: Bug#676114: wordnet: FTBFS: debian/wn-for-goldendict.rb:465: invalid multibyte char (US-ASCII))
  - From: Sam Ruby <rubys@intertwingly.net>
- Re: Ruby help needed (Was: Bug#676114: wordnet: FTBFS: debian/wn-for-goldendict.rb:465: invalid multibyte char (US-ASCII))
  - From: Andreas Tille <andreas@an3as.eu>
- Re: Ruby help needed (Was: Bug#676114: wordnet: FTBFS: debian/wn-for-goldendict.rb:465: invalid multibyte char (US-ASCII))
  - From: Per Andersson <avtobiff@gmail.com>

Prev by Date: Bug#673600: RFS: nyancat/1.0+git20120519.5fe3de9-1
Next by Date: Processed: retitle to RFS: gnome-session-shutdown/1.82-1 [ITP] -- Shutdown command for GNOME
Previous by thread: Re: Ruby help needed (Was: Bug#676114: wordnet: FTBFS: debian/wn-for-goldendict.rb:465: invalid multibyte char (US-ASCII))
Next by thread: Re: Ruby help needed (Was: Bug#676114: wordnet: FTBFS: debian/wn-for-goldendict.rb:465: invalid multibyte char (US-ASCII))
Index(es):
- Date
- Thread