[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Ruby help needed (Was: Bug#676114: wordnet: FTBFS: debian/wn-for-goldendict.rb:465: invalid multibyte char (US-ASCII))



Hi,

Per Andersson escreveu isso aí:
> Please see the attached patch and try if it works.

Heh, except that you missed other usages of the same pattern. Andreas,
please try the attached patch.

As Per explained, the problem is that in Ruby 1.8, a string is an array
of bytes, so str[0] returns a number (the byte at position 0).  Ruby 1.9
is fully encoding-aware, so a string is an array of characters and
str[0] returns a string that is the first *character* in str.

What I did was replacing the occurrences of str[0] with str.bytes.first
to explicitly request the first *byte* in str.

The resulting output when run with Ruby 1.9 is pretty much the same as
the original version generates when run by Ruby 1.8, *except* for the
ordering between lowercase and lowercase letters. Maybe that is due to
some other detail, but understanding that script completely is too much
for me. :)

If that's not acceptable, you can also (at least for Wheezy) run the
script from debian/rules with `ruby1.8` instead of `ruby` and build
depend on ruby1.8 explicitly.

-- 
Antonio Terceiro <terceiro@debian.org>
Index: debian/wn-for-goldendict.rb
===================================================================
--- debian/wn-for-goldendict.rb	(revision 44965)
+++ debian/wn-for-goldendict.rb	(working copy)
@@ -1,4 +1,5 @@
 #!/usr/bin/env ruby
+# encoding: utf-8
 
 # A script to convert WordNet 3.0 dictionary from original
 # format (http://wordnet.princeton.edu/wordnet/download/)
@@ -293,14 +294,14 @@
     @offset = data[0].to_i
     @lex_filenum = data[1]
     @pos = data[2]
-    @w_cnt = [data[3]].pack('H2')[0]
+    @w_cnt = [data[3]].pack('H2').bytes.first
     @words = []
     i = 4
     @lex_ids = []
     @w_cnt.times {
       @words << data[i].gsub(/_/, ' ').gsub(/\s*\((p|a|ip)\)\s*$/, '')
       i += 1
-      @lex_ids << [data[i]].pack('h')[0]
+      @lex_ids << [data[i]].pack('h').bytes.first
       i += 1
     }
 
@@ -362,8 +363,8 @@
     if (src_target == "0000")
       return other.words
     else
-      src = [src_target[0, 2]].pack('H2')[0]
-      target = [src_target[2, 2]].pack('H2')[0]
+      src = [src_target[0, 2]].pack('H2').bytes.first
+      target = [src_target[2, 2]].pack('H2').bytes.first
       h_src = words[src - 1]
       if (h_src == headword)
         return [other.words[target - 1]]
@@ -374,7 +375,7 @@
   end
   def get_frame_data(headword, frame)
     f_num = frame[0].to_i
-    w_num = [frame[1]].pack('H2')[0]
+    w_num = [frame[1]].pack('H2').bytes.first
     if (w_num == 0)
       return [$frames[f_num]]
     else

Attachment: signature.asc
Description: Digital signature


Reply to: