[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#984717: unblock: opencc/1.1.1+git20200624+ds2-10



Package: release.debian.org
Severity: normal
User: release.debian.org@packages.debian.org
Usertags: unblock
X-Debbugs-Cc: zhsj@debian.org, byang@debian.org

Please unblock package opencc

Summary of changes since testing/1.1.1+git20200624+ds2-5

+ Enable upstream tests in package building.
+ Add autopkgtest
+ Backport 2 upstream patches to fix performance regression.

[ Reason ]

This fix is small, but the performance is improved a lot.
As opencc is also used in generating Debian official website
for Simplified and Traditional Chinese version, and a lot of text
needs to be processed by opencc, so the performance matters.

opencc is key package, so it needs manual unblock.

[ Impact ]
Without this patch, the performance drops a lot.

[ Tests ]
+ Upstream unit and integration tests
+ Autopkgtest for installed library and tool
+ Manual tests
  + With opencc in testing:

    $ time opencc -c /usr/share/opencc/t2s.json -i <(printf "Open Chinese Convert 開放中文轉換\n%.0s" {1..50000}) -o /dev/null

    real    0m40.328s
    user    0m40.272s
    sys     0m0.105s


  + With opencc in unstable:

    $ time opencc -c /usr/share/opencc/t2s.json -i <(printf "Open Chinese Convert 開放中文轉換\n%.0s" {1..50000}) -o /dev/null

    real    0m0.556s
    user    0m0.551s
    sys     0m0.065s


[ Risks ]

+ Patch is small
+ Key package

[ Checklist ]
  [x] all changes are documented in the d/changelog
  [x] I reviewed all changes and I approve them
  [x] attach debdiff against the package in testing

[ Other info ]
The upstream patch was first backported in 1.1.1+git20200624+ds2-6, but the tests
were not run, so Boyuan didn't notice the backport is incomplete. Then the backport
was reverted in -7. After adding tests and autopkgtest, the patches are backported
again in -10.


unblock opencc/1.1.1+git20200624+ds2-10

Diff:

Real effected code added are only 13 lines.
+ https://salsa.debian.org/debian/opencc/-/blob/debian/1.1.1+git20200624+ds2-10/debian/patches/0006-Fix-a-bug-in-the-calculation-of-DictGroup-keyMaxLeng.patch
  L28-L42
+ https://salsa.debian.org/debian/opencc/-/blob/debian/1.1.1+git20200624+ds2-10/debian/patches/0007-Fix-a-severe-performance-bug-in-Conversion-Convert-t.patch
  L78-L80

Full diff:

Also available at:
https://salsa.debian.org/debian/opencc/-/compare/debian%2F1.1.1+git20200624+ds2-5...debian%2F1.1.1+git20200624+ds2-10

Some long test code added by upstream commit are skipped below.

diff --git a/debian/changelog b/debian/changelog
index eee8331..69db793 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,48 @@
+opencc (1.1.1+git20200624+ds2-10) unstable; urgency=medium
+
+  * Team upload.
+  * Upload to unstable.
+  * Backport patch to fix performance regression again.
+    Add
+    + 0006-Fix-a-bug-in-the-calculation-of-DictGroup-keyMaxLeng.patch
+    + 0007-Fix-a-severe-performance-bug-in-Conversion-Convert-t.patch
+
+ -- Shengjing Zhu <zhsj@debian.org>  Sun, 07 Mar 2021 14:20:40 +0800
+
+opencc (1.1.1+git20200624+ds2-9) experimental; urgency=medium
+
+  * Team upload.
+  * Remove unused command from autopkgtest scripts
+  * Add build-essential to autopkgtest
+
+ -- Shengjing Zhu <zhsj@debian.org>  Sun, 07 Mar 2021 00:54:22 +0800
+
+opencc (1.1.1+git20200624+ds2-8) experimental; urgency=medium
+
+  * Team upload.
+  * Enable test when building
+  * Add autopkgtest
+
+ -- Shengjing Zhu <zhsj@debian.org>  Sat, 06 Mar 2021 17:57:00 +0800
+
+opencc (1.1.1+git20200624+ds2-7) unstable; urgency=high
+
+  * Team upload.
+  * Drop debian/patches/0005 for now due to regression reported.
+    See also https://github.com/fcitx/fcitx5/issues/238 .
+
+ -- Boyuan Yang <byang@debian.org>  Fri, 05 Mar 2021 09:37:48 -0500
+
+opencc (1.1.1+git20200624+ds2-6) unstable; urgency=high
+
+  * Team upload.
+  * debian/patches/0005: Add upstream patch to fix severe performance
+    regression in `Conversion::Convert` that caused O(N^2) complexity.
+  * debian/rules: Disable parallel build to workaround some random
+    build error for now.
+
+ -- Boyuan Yang <byang@debian.org>  Sun, 28 Feb 2021 19:48:01 -0500
+
 opencc (1.1.1+git20200624+ds2-5) unstable; urgency=medium
 
   * Team upload.
diff --git a/debian/control b/debian/control
index 2eadce4..834dae0 100644
--- a/debian/control
+++ b/debian/control
@@ -13,6 +13,7 @@ Build-Depends:
  darts,
  debhelper-compat (= 13),
  doxygen <!nodoc>,
+ googletest <!nocheck>,
  libmarisa-dev,
  libtclap-dev,
  python3:any,
diff --git a/debian/gbp.conf b/debian/gbp.conf
new file mode 100644
index 0000000..cec628c
--- /dev/null
+++ b/debian/gbp.conf
@@ -0,0 +1,2 @@
+[DEFAULT]
+pristine-tar = True
diff --git a/debian/patches/use-cmake-install-libdir.patch b/debian/patches/0001-use-cmake-install-libdir.patch
similarity index 100%
rename from debian/patches/use-cmake-install-libdir.patch
rename to debian/patches/0001-use-cmake-install-libdir.patch
diff --git a/debian/patches/0003-use-system-libraries.patch b/debian/patches/0002-use-system-libraries.patch
similarity index 100%
rename from debian/patches/0003-use-system-libraries.patch
rename to debian/patches/0002-use-system-libraries.patch
diff --git a/debian/patches/no-remote-images-when-reading-docs-on-disk.patch b/debian/patches/0004-no-remote-images-when-reading-docs-on-disk.patch
similarity index 77%
rename from debian/patches/no-remote-images-when-reading-docs-on-disk.patch
rename to debian/patches/0004-no-remote-images-when-reading-docs-on-disk.patch
index 3ea24ee..60b1bbc 100644
--- a/debian/patches/no-remote-images-when-reading-docs-on-disk.patch
+++ b/debian/patches/0004-no-remote-images-when-reading-docs-on-disk.patch
@@ -1,9 +1,17 @@
-Description: Don't fetch remote images when reading docs on disk
- This fixes a privacy breach previously reported as Lintian warnings
-Author: Gunnar Hjalmarsson <gunnarhj@ubuntu.com>
+From: Gunnar Hjalmarsson <gunnarhj@ubuntu.com>
+Date: Sat, 6 Mar 2021 17:45:20 +0800
+Subject: Don't fetch remote images when reading docs on disk
+
 Forwarded: no
 Last-Update: 2021-01-14
 
+This fixes a privacy breach previously reported as Lintian warnings
+---
+ README.md | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+diff --git a/README.md b/README.md
+index d941d6b..0663dcd 100644
 --- a/README.md
 +++ b/README.md
 @@ -1,12 +1,12 @@
diff --git a/debian/patches/0005-Use-system-googletest.patch b/debian/patches/0005-Use-system-googletest.patch
new file mode 100644
index 0000000..783d6bb
--- /dev/null
+++ b/debian/patches/0005-Use-system-googletest.patch
@@ -0,0 +1,22 @@
+From: Shengjing Zhu <zhsj@debian.org>
+Date: Sat, 6 Mar 2021 17:53:25 +0800
+Subject: Use system googletest
+
+Forwarded: not-needed
+---
+ CMakeLists.txt | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/CMakeLists.txt b/CMakeLists.txt
+index efb51ae..e26b934 100644
+--- a/CMakeLists.txt
++++ b/CMakeLists.txt
+@@ -195,7 +195,7 @@ add_subdirectory(test)
+ ######## Testing
+ 
+ if (ENABLE_GTEST)
+-  add_subdirectory(deps/gtest-1.11.0)
++  add_subdirectory(/usr/src/googletest/googletest ${CMAKE_BINARY_DIR}/googletest-build EXCLUDE_FROM_ALL)
+   enable_testing()
+ endif()
+ 
diff --git a/debian/patches/0006-Fix-a-bug-in-the-calculation-of-DictGroup-keyMaxLeng.patch b/debian/patches/0006-Fix-a-bug-in-the-calculation-of-DictGroup-keyMaxLeng.patch
new file mode 100644
index 0000000..d89f9a3
--- /dev/null
+++ b/debian/patches/0006-Fix-a-bug-in-the-calculation-of-DictGroup-keyMaxLeng.patch
@@ -0,0 +1,97 @@
+From: Carbo Kuo <byvoid@byvoid.com>
+Date: Thu, 25 Feb 2021 20:48:50 +0900
+Subject: Fix a bug in the calculation of DictGroup::keyMaxLength.
+
+The length should be the maximum of all sub-dictionaries in the dictionary group.
+---
+ src/DictGroup.cpp     | 16 ++++++++++++++--
+ src/DictGroupTest.cpp | 32 ++++++++++++++++++++++++--------
+ 2 files changed, 38 insertions(+), 10 deletions(-)
+
+diff --git a/src/DictGroup.cpp b/src/DictGroup.cpp
+index 4ca9e33..c682e96 100644
+--- a/src/DictGroup.cpp
++++ b/src/DictGroup.cpp
+@@ -1,7 +1,7 @@
+ /*
+  * Open Chinese Convert
+  *
+- * Copyright 2010-2014 Carbo Kuo <byvoid@byvoid.com>
++ * Copyright 2010-2021 Carbo Kuo <byvoid@byvoid.com>
+  *
+  * Licensed under the Apache License, Version 2.0 (the "License");
+  * you may not use this file except in compliance with the License.
+@@ -24,8 +24,20 @@
+ 
+ using namespace opencc;
+ 
++namespace {
++
++size_t GetKeyMaxLength(const std::list<DictPtr>& dicts) {
++  size_t keyMaxLength = 0;
++  for (const DictPtr& dict : dicts) {
++    keyMaxLength = std::max(keyMaxLength, dict->KeyMaxLength());
++  }
++  return keyMaxLength;
++}
++
++} // namespace
++
+ DictGroup::DictGroup(const std::list<DictPtr>& _dicts)
+-    : keyMaxLength(0), dicts(_dicts) {}
++    : keyMaxLength(GetKeyMaxLength(_dicts)), dicts(_dicts) {}
+ 
+ DictGroup::~DictGroup() {}
+ 
+diff --git a/src/DictGroupTest.cpp b/src/DictGroupTest.cpp
+index 7003506..c91731e 100644
+--- a/src/DictGroupTest.cpp
++++ b/src/DictGroupTest.cpp
<skip, upstream test file>
diff --git a/debian/patches/0007-Fix-a-severe-performance-bug-in-Conversion-Convert-t.patch b/debian/patches/0007-Fix-a-severe-performance-bug-in-Conversion-Convert-t.patch
new file mode 100644
index 0000000..a614830
--- /dev/null
+++ b/debian/patches/0007-Fix-a-severe-performance-bug-in-Conversion-Convert-t.patch
@@ -0,0 +1,188 @@
+From: Carbo Kuo <byvoid@byvoid.com>
+Date: Thu, 25 Feb 2021 21:13:38 +0900
+Subject: Fix a severe performance bug in `Conversion::Convert` that caused
+ O(N^2) complexity.
+
<skip, upstream log commit message>
+This bug was reported in https://github.com/BYVoid/OpenCC/issues/478 and https://github.com/BYVoid/OpenCC/issues/517.
+---
+ src/Dict.hpp                  |  7 ++++
+ src/benchmark/Performance.cpp | 77 ++++++++++++++++++++++++++++++++++---------
+ 2 files changed, 69 insertions(+), 15 deletions(-)
+
+diff --git a/src/Dict.hpp b/src/Dict.hpp
+index 461d6d2..1c81034 100644
+--- a/src/Dict.hpp
++++ b/src/Dict.hpp
+@@ -49,6 +49,13 @@ public:
+   virtual Optional<const DictEntry*> MatchPrefix(const char* word,
+                                                  size_t len) const;
+ 
++  /**
++   * Matches the longest matched prefix of a word.
++   */
++  Optional<const DictEntry*> MatchPrefix(const char* word) const {
++    return MatchPrefix(word, KeyMaxLength());
++  }
++
+   /**
+    * Matches the longest matched prefix of a word.
+    */
+diff --git a/src/benchmark/Performance.cpp b/src/benchmark/Performance.cpp
+index cf8d3aa..d1b6468 100644
+--- a/src/benchmark/Performance.cpp
++++ b/src/benchmark/Performance.cpp
+@@ -1,7 +1,26 @@
<skip, upstream performance test>
diff --git a/debian/patches/series b/debian/patches/series
index 74933a6..ad3c3d6 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1,4 +1,7 @@
-use-cmake-install-libdir.patch
-0003-use-system-libraries.patch
+0001-use-cmake-install-libdir.patch
+0002-use-system-libraries.patch
 0003-data-Explicitly-use-python3.patch
-no-remote-images-when-reading-docs-on-disk.patch
+0004-no-remote-images-when-reading-docs-on-disk.patch
+0005-Use-system-googletest.patch
+0006-Fix-a-bug-in-the-calculation-of-DictGroup-keyMaxLeng.patch
+0007-Fix-a-severe-performance-bug-in-Conversion-Convert-t.patch
diff --git a/debian/rules b/debian/rules
index 9bfc988..250a007 100755
--- a/debian/rules
+++ b/debian/rules
@@ -8,8 +8,10 @@ export DEB_BUILD_MAINT_OPTIONS = hardening=+all
 
 include /usr/share/dpkg/architecture.mk
 
+# Disable parallel build to circumvent some random build error
+# (needs further investigation)
 %:
-       dh $@ --buildsystem=cmake --with pkgkde_symbolshelper
+       dh $@ --buildsystem=cmake --with pkgkde_symbolshelper --no-parallel
 
 BUILD_OPTIONS = \
     -DCMAKE_INSTALL_PREFIX=/usr \
@@ -23,6 +25,11 @@ else
 BUILD_OPTIONS += -DBUILD_DOCUMENTATION=OFF
 endif
 
+ifeq (,$(filter nocheck,$(DEB_BUILD_OPTIONS)))
+BUILD_OPTIONS += -DENABLE_GTEST=ON
+else
+BUILD_OPTIONS += -DENABLE_GTEST=OFF
+endif
 
 override_dh_auto_configure:
        dh_auto_configure -- $(BUILD_OPTIONS)
diff --git a/debian/tests/CMakeLists.txt b/debian/tests/CMakeLists.txt
new file mode 100644
index 0000000..e6f88c7
--- /dev/null
+++ b/debian/tests/CMakeLists.txt
@@ -0,0 +1,24 @@
+cmake_minimum_required(VERSION 3.18)
+project (opencc-integration CXX)
+include (CTest)
+enable_testing()
+
+find_package(PkgConfig REQUIRED)
+find_program (OPENCC_TOOL opencc REQUIRED)
+pkg_check_modules(OPENCC REQUIRED IMPORTED_TARGET opencc)
+
+add_definitions(
+  -DCMAKE_BINARY_DIR="${CMAKE_BINARY_DIR}"
+  -DCMAKE_SOURCE_DIR="${CMAKE_SOURCE_DIR}"
+  -DPROJECT_BINARY_PATH="${OPENCC_TOOL}"
+)
+make_directory(${CMAKE_BINARY_DIR}/test)
+add_subdirectory(/usr/src/googletest/googletest ${CMAKE_BINARY_DIR}/googletest-build EXCLUDE_FROM_ALL)
+set(UNITTESTS
+  CommandLineConvertTest
+)
+foreach(UNITTEST ${UNITTESTS})
+  add_executable(${UNITTEST} ${UNITTEST}.cpp)
+  target_link_libraries(${UNITTEST} gtest gtest_main PkgConfig::OPENCC)
+  add_test(${UNITTEST} ${UNITTEST})
+endforeach(UNITTEST)
diff --git a/debian/tests/CommandLineConvertTest.cpp b/debian/tests/CommandLineConvertTest.cpp
new file mode 100644
index 0000000..6ca6068
--- /dev/null
+++ b/debian/tests/CommandLineConvertTest.cpp
@@ -0,0 +1,104 @@
+/*
+ * Open Chinese Convert
+ *
+ * Copyright 2015 Carbo Kuo <byvoid@byvoid.com>
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <fstream>
+
+#include "Common.hpp"
+#include "gtest/gtest.h"
+
+namespace opencc {
+
+class CommandLineConvertTest : public ::testing::Test {
+protected:
+  CommandLineConvertTest() { GetCurrentWorkingDirectory(); }
+
+  virtual ~CommandLineConvertTest() { free(originalWorkingDirectory); }
+
+  virtual void SetUp() {
+    ASSERT_NE("", CMAKE_BINARY_DIR);
+    ASSERT_NE("", CMAKE_SOURCE_DIR);
+    ASSERT_EQ(0, chdir("/usr/share/opencc/"));
+  }
+
+  virtual void TearDown() { ASSERT_EQ(0, chdir(originalWorkingDirectory)); }
+
+  std::string GetFileContents(const std::string& fileName) const {
+    std::ifstream fs(fileName);
+    EXPECT_TRUE(fs.is_open());
+    const std::string content((std::istreambuf_iterator<char>(fs)),
+                              (std::istreambuf_iterator<char>()));
+    fs.close();
+    return content;
+  }
+
+  void GetCurrentWorkingDirectory() {
+    originalWorkingDirectory = getcwd(nullptr, 0);
+  }
+
+  const char* OpenccCommand() const {
+    return PROJECT_BINARY_PATH;
+  }
+
+  const char* InputDirectory() const {
+    return CMAKE_SOURCE_DIR "/../../test/testcases/";
+  }
+
+  const char* OutputDirectory() const { return CMAKE_BINARY_DIR "/test/"; }
+
+  const char* AnswerDirectory() const {
+    return CMAKE_SOURCE_DIR "/../../test/testcases/";
+  }
+
+  const char* ConfigurationDirectory() const {
+    return "/usr/share/opencc/";
+  }
+
+  std::string OutputFile(const char* config) const {
+    return std::string(OutputDirectory()) + config + ".out";
+  }
+
+  std::string AnswerFile(const char* config) const {
+    return std::string(AnswerDirectory()) + config + ".ans";
+  }
+
+  std::string TestCommand(const char* config) const {
+    return OpenccCommand() + std::string("") + " -i " + InputDirectory() +
+           config + ".in" + " -o " + OutputFile(config) + " -c " +
+           ConfigurationDirectory() + config + ".json";
+  }
+
+  char* originalWorkingDirectory;
+};
+
+class ConfigurationTest : public CommandLineConvertTest,
+                          public ::testing::WithParamInterface<const char*> {};
+
+TEST_P(ConfigurationTest, Convert) {
+  const char* config = GetParam();
+  ASSERT_EQ(0, system(TestCommand(config).c_str()));
+  const std::string& output = GetFileContents(OutputFile(config));
+  const std::string& answer = GetFileContents(AnswerFile(config));
+  ASSERT_EQ(answer, output);
+}
+
+INSTANTIATE_TEST_CASE_P(CommandLine, ConfigurationTest,
+                        ::testing::Values("hk2s", "hk2t", "jp2t", "s2hk", "s2t",
+                                          "s2tw", "s2twp", "t2hk", "t2jp", "t2s",
+                                          "tw2s", "tw2sp", "tw2t"));
+
+} // namespace opencc
diff --git a/debian/tests/README.md b/debian/tests/README.md
new file mode 100644
index 0000000..12d6cec
--- /dev/null
+++ b/debian/tests/README.md
@@ -0,0 +1,3 @@
+Fork from ../../test, please refresh this file if source has changed.
+
+Test with installed opencc tool and library.
diff --git a/debian/tests/control b/debian/tests/control
new file mode 100644
index 0000000..36df47b
--- /dev/null
+++ b/debian/tests/control
@@ -0,0 +1,8 @@
+Tests: integration
+Depends:
+ build-essential,
+ cmake,
+ googletest,
+ pkg-config,
+ @,
+Restrictions: allow-stderr,
diff --git a/debian/tests/integration b/debian/tests/integration
new file mode 100755
index 0000000..c1f1746
--- /dev/null
+++ b/debian/tests/integration
@@ -0,0 +1,12 @@
+#!/bin/sh
+
+set -ex
+
+cd "$(dirname "$0")"
+mkdir -p build
+cd build
+cmake ..
+make
+make test
+cd ..
+rm -rf build

Attachment: signature.asc
Description: PGP signature


Reply to: