[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#928111: [pre-approval] unblock: icu/63.2-1



On Sun, May 5, 2019 at 2:25 PM Ivo De Decker <ivodd@debian.org> wrote:
> On Sun, Apr 28, 2019 at 01:57:15PM +0200, László Böszörményi (GCS) wrote:
> > Second is that I've made local testing and only found a regression.
> > Chromium (the browser) needs to be binNMUed as otherwise it will crash
> > on startup.
>
> First of all, we can't make a statement on something like this without seeing
> the diff.
 Attached and it's a bit larger than one may expect. Reasons are:
- two backported patches are now removed as this release contains those,
- s/63.1/63.2/g all over the source,
- Windows only fixes,
- a testdata fully commented out as it depends on the date the
self-test being run (before or after the new Japanese era).

> Secondly, the fact that chromium needs a rebuild suggest there is a change
> that breaks something. This makes it very unlikely that this change is
> appropriate at this point in the freeze. Maybe targeted fixes on top of 63.1
> would be better.
 Chromium is a bit bigger than me, but as I've experienced it's one of
its pre-checks with an assert. As the size of the Unicode set is now
bigger due to the addition of the new Japanese era it fails. Call it a
lame check, but I tested this case vice-versa: built Chromium with ICU
63.2, then installed it with ICU 63.1 and it doesn't fail but working
normally.

> Please do not upload 63.2 to unstable at this point.
>
> I suggest you upload the new version to experimental. That way we can look at
> the differences and people can test the new packages.
 Sure, uploading to Sid was not my intention before RM approval.
Experimental upload is done and built on all architectures already.

On Sun, May 5, 2019 at 3:39 PM Mattia Rizzolo <mattia@debian.org> wrote:
> On Sun, May 05, 2019 at 02:25:01PM +0200, Ivo De Decker wrote:
> > Secondly, the fact that chromium needs a rebuild suggest there is a change
> > that breaks something.
 As noted above, it's not a code fault but a pre-checking assert in Chromium.

> > I suggest you upload the new version to experimental. That way we can look at
> > the differences and people can test the new packages.
 Done, uploaded. I'm using it for more than one week, even rebooted my
system some occasion to be sure everything uses the new ICU package
version. I've feared LibreOffice may break but that's working normally
including other packages that use ICU (tested the Python ones as
well).

> And check the symbols and other parts of the ABI, as it would be
> important to understand why chromium needs a rebuild.
 Please see above, it's a pre-check assert. If anyone knows Chromium
better, it's in zygote_host_impl_linux.cc:
CHECK(ReceiveFixedMessage(fds[0], kZygoteBootMessage,
sizeof(kZygoteBootMessage), &boot_pid));
The ICU library symbols are current and unchanged.

Regards,
Laszlo/GCS
diff -Nru icu-63.1/debian/changelog icu-63.2/debian/changelog
--- icu-63.1/debian/changelog	2019-01-23 16:51:20.000000000 +0000
+++ icu-63.2/debian/changelog	2019-04-27 06:44:54.000000000 +0000
@@ -1,3 +1,12 @@
+icu (63.2-1) experimental; urgency=medium
+
+  * New upstream release with Japanese new era "Reiwa" support
+    (closes: #927933).
+  * Drop backported ICU-20246 and ICU-20208 fixes as no longer needed.
+  * Break Chromium versions that not built with this ICU release.
+
+ -- Laszlo Boszormenyi (GCS) <gcs@debian.org>  Sat, 27 Apr 2019 06:44:54 +0000
+
 icu (63.1-6) unstable; urgency=medium
 
   * Build without icu-config (closes: #898820).
diff -Nru icu-63.1/debian/control icu-63.2/debian/control
--- icu-63.1/debian/control	2019-01-23 16:51:20.000000000 +0000
+++ icu-63.2/debian/control	2019-04-27 06:44:54.000000000 +0000
@@ -13,7 +13,7 @@
 Multi-Arch: same
 Pre-Depends: ${misc:Pre-Depends}
 Depends: ${misc:Depends}, ${shlibs:Depends}
-Breaks: openttd (<< 1.8.0-2~), libiculx63 (<< 63.1-5)
+Breaks: openttd (<< 1.8.0-2~), libiculx63 (<< 63.1-5), chromium (<= 74.0.3729.108-1)
 Replaces: libiculx63 (<< 63.1-5)
 Description: International Components for Unicode
  ICU is a C++ and C library that provides robust and full-featured
@@ -38,8 +38,8 @@
 Multi-Arch: same
 Pre-Depends: ${misc:Pre-Depends}
 Depends: ${misc:Depends}, libicu63 (= ${binary:Version}), icu-devtools (>= ${binary:Version}), libc6-dev | libc-dev
-Replaces: icu-devtools (<< 63.1-1~)
-Breaks: icu-devtools (<< 63.1-1~)
+Replaces: icu-devtools (<< 63.2-1~)
+Breaks: icu-devtools (<< 63.2-1~)
 Suggests: icu-doc
 Description: Development files for International Components for Unicode
  ICU is a C++ and C library that provides robust and full-featured
@@ -52,8 +52,8 @@
 Multi-Arch: foreign
 Pre-Depends: ${misc:Pre-Depends}
 Depends: ${misc:Depends}, ${shlibs:Depends}
-Replaces: libicu-dev (<< ${binary:Version}), icu-tools (<< 63.1-1~)
-Breaks: libicu-dev (<< ${binary:Version}), icu-tools (<< 63.1-1~)
+Replaces: libicu-dev (<< ${binary:Version}), icu-tools (<< 63.2-1~)
+Breaks: libicu-dev (<< ${binary:Version}), icu-tools (<< 63.2-1~)
 Description: Development utilities for International Components for Unicode
  ICU is a C++ and C library that provides robust and full-featured
  Unicode and locale support. This package contains programs used to
diff -Nru icu-63.1/debian/patches/ICU-20208_uspoof.cpp_function_checkImpl_should_be_static.patch icu-63.2/debian/patches/ICU-20208_uspoof.cpp_function_checkImpl_should_be_static.patch
--- icu-63.1/debian/patches/ICU-20208_uspoof.cpp_function_checkImpl_should_be_static.patch	2018-11-07 18:15:15.000000000 +0000
+++ icu-63.2/debian/patches/ICU-20208_uspoof.cpp_function_checkImpl_should_be_static.patch	1970-01-01 00:00:00.000000000 +0000
@@ -1,37 +0,0 @@
-From 8baff8f03e07d8e02304d0c888d0bb21ad2eeb01 Mon Sep 17 00:00:00 2001
-From: Jeff Genovy <29107334+jefgen@users.noreply.github.com>
-Date: Wed, 17 Oct 2018 19:47:35 -0700
-Subject: [PATCH] ICU-20208 uspoof.cpp function checkImpl should be static,
- regenerate urename.h
-
-(cherry picked from commit 9ec2c332c1c9156323944ea2b15c2b91952efae4)
----
- source/common/unicode/urename.h | 1 -
- source/i18n/uspoof.cpp          | 2 +-
- 2 files changed, 1 insertion(+), 2 deletions(-)
-
-diff --git a/source/common/unicode/urename.h b/source/common/unicode/urename.h
-index 5812173e39c..0512be3b6e5 100644
---- a/source/common/unicode/urename.h
-+++ b/source/common/unicode/urename.h
-@@ -110,7 +110,6 @@
- #define _UTF7Data U_ICU_ENTRY_POINT_RENAME(_UTF7Data)
- #define _UTF8Data U_ICU_ENTRY_POINT_RENAME(_UTF8Data)
- #define allowedHourFormatsCleanup U_ICU_ENTRY_POINT_RENAME(allowedHourFormatsCleanup)
--#define checkImpl U_ICU_ENTRY_POINT_RENAME(checkImpl)
- #define cmemory_cleanup U_ICU_ENTRY_POINT_RENAME(cmemory_cleanup)
- #define dayPeriodRulesCleanup U_ICU_ENTRY_POINT_RENAME(dayPeriodRulesCleanup)
- #define deleteAllowedHourFormats U_ICU_ENTRY_POINT_RENAME(deleteAllowedHourFormats)
-diff --git a/source/i18n/uspoof.cpp b/source/i18n/uspoof.cpp
-index 8e3d69ede2b..66f228f037a 100644
---- a/source/i18n/uspoof.cpp
-+++ b/source/i18n/uspoof.cpp
-@@ -547,7 +547,7 @@ uspoof_checkUnicodeString(const USpoofChecker *sc,
-     return uspoof_check2UnicodeString(sc, id, NULL, status);
- }
- 
--int32_t checkImpl(const SpoofImpl* This, const UnicodeString& id, CheckResult* checkResult, UErrorCode* status) {
-+static int32_t checkImpl(const SpoofImpl* This, const UnicodeString& id, CheckResult* checkResult, UErrorCode* status) {
-     U_ASSERT(This != NULL);
-     U_ASSERT(checkResult != NULL);
-     checkResult->clear();
diff -Nru icu-63.1/debian/patches/ICU-20246_Fixing_another_integer_overflow_in_number_parsing.patch icu-63.2/debian/patches/ICU-20246_Fixing_another_integer_overflow_in_number_parsing.patch
--- icu-63.1/debian/patches/ICU-20246_Fixing_another_integer_overflow_in_number_parsing.patch	2018-11-07 18:13:36.000000000 +0000
+++ icu-63.2/debian/patches/ICU-20246_Fixing_another_integer_overflow_in_number_parsing.patch	1970-01-01 00:00:00.000000000 +0000
@@ -1,60 +0,0 @@
-From 6cbd62e59e30f73b444be89ea71fd74275ac53a4 Mon Sep 17 00:00:00 2001
-From: Shane Carr <shane@unicode.org>
-Date: Mon, 29 Oct 2018 23:52:44 -0700
-Subject: [PATCH] ICU-20246 Fixing another integer overflow in number parsing.
-
-(cherry picked from commit 53d8c8f3d181d87a6aa925b449b51c4a2c922a51)
----
- source/i18n/fmtable.cpp                          |  2 +-
- source/i18n/number_decimalquantity.cpp           |  5 ++++-
- source/test/intltest/numfmtst.cpp                |  8 ++++++++
- 6 files changed, 31 insertions(+), 4 deletions(-)
-
-diff --git a/source/i18n/fmtable.cpp b/source/i18n/fmtable.cpp
-index 45c7024fc29..8601d95f4a6 100644
---- a/source/i18n/fmtable.cpp
-+++ b/source/i18n/fmtable.cpp
-@@ -734,7 +734,7 @@ CharString *Formattable::internalGetCharString(UErrorCode &status) {
-       // not print scientific notation for magnitudes greater than -5 and smaller than some amount (+5?).
-       if (fDecimalQuantity->isZero()) {
-         fDecimalStr->append("0", -1, status);
--      } else if (std::abs(fDecimalQuantity->getMagnitude()) < 5) {
-+      } else if (fDecimalQuantity->getMagnitude() != INT32_MIN && std::abs(fDecimalQuantity->getMagnitude()) < 5) {
-         fDecimalStr->appendInvariantChars(fDecimalQuantity->toPlainString(), status);
-       } else {
-         fDecimalStr->appendInvariantChars(fDecimalQuantity->toScientificString(), status);
-diff --git a/source/i18n/number_decimalquantity.cpp b/source/i18n/number_decimalquantity.cpp
-index 2c4182b1c6e..f6f2b20fab0 100644
---- a/source/i18n/number_decimalquantity.cpp
-+++ b/source/i18n/number_decimalquantity.cpp
-@@ -820,7 +820,10 @@ UnicodeString DecimalQuantity::toScientificString() const {
-     }
-     result.append(u'E');
-     int32_t _scale = upperPos + scale;
--    if (_scale < 0) {
-+    if (_scale == INT32_MIN) {
-+        result.append({u"-2147483648", -1});
-+        return result;
-+    } else if (_scale < 0) {
-         _scale *= -1;
-         result.append(u'-');
-     } else {
-diff --git a/source/test/intltest/numfmtst.cpp b/source/test/intltest/numfmtst.cpp
-index 34355939113..8d52dc122bf 100644
---- a/source/test/intltest/numfmtst.cpp
-+++ b/source/test/intltest/numfmtst.cpp
-@@ -9226,6 +9226,14 @@ void NumberFormatTest::Test20037_ScientificIntegerOverflow() {
-     assertEquals(u"Should not overflow and should parse only the first exponent",
-                  u"1E-2147483647",
-                  {sp.data(), sp.length(), US_INV});
-+
-+    // Test edge case overflow of exponent
-+    result = Formattable();
-+    nf->parse(u".0003e-2147483644", result, status);
-+    sp = result.getDecimalNumber(status);
-+    assertEquals(u"Should not overflow",
-+                 u"3E-2147483648",
-+                 {sp.data(), sp.length(), US_INV});
- }
- 
- void NumberFormatTest::Test13840_ParseLongStringCrash() {
diff -Nru icu-63.1/debian/patches/series icu-63.2/debian/patches/series
--- icu-63.1/debian/patches/series	2018-11-07 18:15:15.000000000 +0000
+++ icu-63.2/debian/patches/series	2019-04-27 06:44:54.000000000 +0000
@@ -3,5 +3,3 @@
 icuinfo-man.patch
 hurd-fix.diff
 layout-test-fix.patch
-ICU-20246_Fixing_another_integer_overflow_in_number_parsing.patch
-ICU-20208_uspoof.cpp_function_checkImpl_should_be_static.patch
diff -Nru icu-63.1/readme.html icu-63.2/readme.html
--- icu-63.1/readme.html	2018-10-15 18:02:37.000000000 +0000
+++ icu-63.2/readme.html	2019-04-11 22:38:30.000000000 +0000
@@ -3,7 +3,7 @@
 
 <html lang="en-US" xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en-US">
   <head>
-    <title>ReadMe for ICU 63.1</title>
+    <title>ReadMe for ICU 63.2</title>
     <meta name="COPYRIGHT" content=
     "Copyright (C) 2016 and later: Unicode, Inc. and others. License &amp; terms of use: http://www.unicode.org/copyright.html"/>
     <!-- meta name="COPYRIGHT" content=
@@ -32,7 +32,7 @@
       International Components for Unicode<br/>
       <span class="only-rc">Release Candidate</span>
       <span class="only-milestone">(Milestone Release)</span>
-      <abbr title="International Components for Unicode">ICU</abbr> 63.1 ReadMe
+      <abbr title="International Components for Unicode">ICU</abbr> 63.2 ReadMe
     </h1>
 
     <!-- Shouldn't need to comment/uncomment this paragraph, just change the body class -->
diff -Nru icu-63.1/source/common/characterproperties.cpp icu-63.2/source/common/characterproperties.cpp
--- icu-63.1/source/common/characterproperties.cpp	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/common/characterproperties.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -23,6 +23,9 @@
 #include "umutex.h"
 #include "uprops.h"
 
+using icu::LocalPointer;
+using icu::Normalizer2Factory;
+using icu::Normalizer2Impl;
 using icu::UInitOnce;
 using icu::UnicodeSet;
 
@@ -30,11 +33,13 @@
 
 UBool U_CALLCONV characterproperties_cleanup();
 
+constexpr int32_t NUM_INCLUSIONS = UPROPS_SRC_COUNT + UCHAR_INT_LIMIT - UCHAR_INT_START;
+
 struct Inclusion {
     UnicodeSet  *fSet;
     UInitOnce    fInitOnce;
 };
-Inclusion gInclusions[UPROPS_SRC_COUNT]; // cached getInclusions()
+Inclusion gInclusions[NUM_INCLUSIONS]; // cached getInclusions()
 
 UnicodeSet *sets[UCHAR_BINARY_LIMIT] = {};
 
@@ -80,35 +85,22 @@
     return TRUE;
 }
 
-}  // namespace
-
-U_NAMESPACE_BEGIN
-
-/*
-Reduce excessive reallocation, and make it easier to detect initialization problems.
-Usually you don't see smaller sets than this for Unicode 5.0.
-*/
-constexpr int32_t DEFAULT_INCLUSION_CAPACITY = 3072;
-
-void U_CALLCONV CharacterProperties::initInclusion(UPropertySource src, UErrorCode &errorCode) {
+void U_CALLCONV initInclusion(UPropertySource src, UErrorCode &errorCode) {
     // This function is invoked only via umtx_initOnce().
-    // This function is a friend of class UnicodeSet.
-
     U_ASSERT(0 <= src && src < UPROPS_SRC_COUNT);
     if (src == UPROPS_SRC_NONE) {
         errorCode = U_INTERNAL_PROGRAM_ERROR;
         return;
     }
-    UnicodeSet * &incl = gInclusions[src].fSet;
-    U_ASSERT(incl == nullptr);
+    U_ASSERT(gInclusions[src].fSet == nullptr);
 
-    incl = new UnicodeSet();
-    if (incl == nullptr) {
+    LocalPointer<UnicodeSet> incl(new UnicodeSet());
+    if (incl.isNull()) {
         errorCode = U_MEMORY_ALLOCATION_ERROR;
         return;
     }
     USetAdder sa = {
-        (USet *)incl,
+        (USet *)incl.getAlias(),
         _set_add,
         _set_addRange,
         _set_addString,
@@ -116,7 +108,6 @@
         nullptr // don't need removeRange()
     };
 
-    incl->ensureCapacity(DEFAULT_INCLUSION_CAPACITY, errorCode);
     switch(src) {
     case UPROPS_SRC_CHAR:
         uchar_addPropertyStarts(&sa, &errorCode);
@@ -183,12 +174,15 @@
     }
 
     if (U_FAILURE(errorCode)) {
-        delete incl;
-        incl = nullptr;
         return;
     }
-    // Compact for caching
+    if (incl->isBogus()) {
+        errorCode = U_MEMORY_ALLOCATION_ERROR;
+        return;
+    }
+    // Compact for caching.
     incl->compact();
+    gInclusions[src].fSet = incl.orphan();
     ucln_common_registerCleanup(UCLN_COMMON_CHARACTERPROPERTIES, characterproperties_cleanup);
 }
 
@@ -199,15 +193,66 @@
         return nullptr;
     }
     Inclusion &i = gInclusions[src];
-    umtx_initOnce(i.fInitOnce, &CharacterProperties::initInclusion, src, errorCode);
+    umtx_initOnce(i.fInitOnce, &initInclusion, src, errorCode);
     return i.fSet;
 }
 
+void U_CALLCONV initIntPropInclusion(UProperty prop, UErrorCode &errorCode) {
+    // This function is invoked only via umtx_initOnce().
+    U_ASSERT(UCHAR_INT_START <= prop && prop < UCHAR_INT_LIMIT);
+    int32_t inclIndex = UPROPS_SRC_COUNT + prop - UCHAR_INT_START;
+    U_ASSERT(gInclusions[inclIndex].fSet == nullptr);
+    UPropertySource src = uprops_getSource(prop);
+    const UnicodeSet *incl = getInclusionsForSource(src, errorCode);
+    if (U_FAILURE(errorCode)) {
+        return;
+    }
+
+    LocalPointer<UnicodeSet> intPropIncl(new UnicodeSet(0, 0));
+    if (intPropIncl.isNull()) {
+        errorCode = U_MEMORY_ALLOCATION_ERROR;
+        return;
+    }
+    int32_t numRanges = incl->getRangeCount();
+    int32_t prevValue = 0;
+    for (int32_t i = 0; i < numRanges; ++i) {
+        UChar32 rangeEnd = incl->getRangeEnd(i);
+        for (UChar32 c = incl->getRangeStart(i); c <= rangeEnd; ++c) {
+            // TODO: Get a UCharacterProperty.IntProperty to avoid the property dispatch.
+            int32_t value = u_getIntPropertyValue(c, prop);
+            if (value != prevValue) {
+                intPropIncl->add(c);
+                prevValue = value;
+            }
+        }
+    }
+
+    if (intPropIncl->isBogus()) {
+        errorCode = U_MEMORY_ALLOCATION_ERROR;
+        return;
+    }
+    // Compact for caching.
+    intPropIncl->compact();
+    gInclusions[inclIndex].fSet = intPropIncl.orphan();
+    ucln_common_registerCleanup(UCLN_COMMON_CHARACTERPROPERTIES, characterproperties_cleanup);
+}
+
+}  // namespace
+
+U_NAMESPACE_BEGIN
+
 const UnicodeSet *CharacterProperties::getInclusionsForProperty(
         UProperty prop, UErrorCode &errorCode) {
     if (U_FAILURE(errorCode)) { return nullptr; }
-    UPropertySource src = uprops_getSource(prop);
-    return getInclusionsForSource(src, errorCode);
+    if (UCHAR_INT_START <= prop && prop < UCHAR_INT_LIMIT) {
+        int32_t inclIndex = UPROPS_SRC_COUNT + prop - UCHAR_INT_START;
+        Inclusion &i = gInclusions[inclIndex];
+        umtx_initOnce(i.fInitOnce, &initIntPropInclusion, prop, errorCode);
+        return i.fSet;
+    } else {
+        UPropertySource src = uprops_getSource(prop);
+        return getInclusionsForSource(src, errorCode);
+    }
 }
 
 U_NAMESPACE_END
@@ -216,7 +261,7 @@
 
 UnicodeSet *makeSet(UProperty property, UErrorCode &errorCode) {
     if (U_FAILURE(errorCode)) { return nullptr; }
-    icu::LocalPointer<UnicodeSet> set(new UnicodeSet());
+    LocalPointer<UnicodeSet> set(new UnicodeSet());
     if (set.isNull()) {
         errorCode = U_MEMORY_ALLOCATION_ERROR;
         return nullptr;
diff -Nru icu-63.1/source/common/ucptrie.cpp icu-63.2/source/common/ucptrie.cpp
--- icu-63.1/source/common/ucptrie.cpp	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/common/ucptrie.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -280,7 +280,7 @@
     int32_t prevI3Block = -1;
     int32_t prevBlock = -1;
     UChar32 c = start;
-    uint32_t value;
+    uint32_t trieValue, value;
     bool haveValue = false;
     do {
         int32_t i3Block;
@@ -319,6 +319,7 @@
                         return c - 1;
                     }
                 } else {
+                    trieValue = trie->nullValue;
                     value = nullValue;
                     if (pValue != nullptr) { *pValue = nullValue; }
                     haveValue = true;
@@ -357,6 +358,7 @@
                             return c - 1;
                         }
                     } else {
+                        trieValue = trie->nullValue;
                         value = nullValue;
                         if (pValue != nullptr) { *pValue = nullValue; }
                         haveValue = true;
@@ -364,23 +366,32 @@
                     c = (c + dataBlockLength) & ~dataMask;
                 } else {
                     int32_t di = block + (c & dataMask);
-                    uint32_t value2 = getValue(trie->data, valueWidth, di);
-                    value2 = maybeFilterValue(value2, trie->nullValue, nullValue,
-                                              filter, context);
+                    uint32_t trieValue2 = getValue(trie->data, valueWidth, di);
                     if (haveValue) {
-                        if (value2 != value) {
-                            return c - 1;
+                        if (trieValue2 != trieValue) {
+                            if (filter == nullptr ||
+                                    maybeFilterValue(trieValue2, trie->nullValue, nullValue,
+                                                     filter, context) != value) {
+                                return c - 1;
+                            }
+                            trieValue = trieValue2;  // may or may not help
                         }
                     } else {
-                        value = value2;
+                        trieValue = trieValue2;
+                        value = maybeFilterValue(trieValue2, trie->nullValue, nullValue,
+                                                 filter, context);
                         if (pValue != nullptr) { *pValue = value; }
                         haveValue = true;
                     }
                     while ((++c & dataMask) != 0) {
-                        if (maybeFilterValue(getValue(trie->data, valueWidth, ++di),
-                                             trie->nullValue, nullValue,
-                                             filter, context) != value) {
-                            return c - 1;
+                        trieValue2 = getValue(trie->data, valueWidth, ++di);
+                        if (trieValue2 != trieValue) {
+                            if (filter == nullptr ||
+                                    maybeFilterValue(trieValue2, trie->nullValue, nullValue,
+                                                     filter, context) != value) {
+                                return c - 1;
+                            }
+                            trieValue = trieValue2;  // may or may not help
                         }
                     }
                 }
diff -Nru icu-63.1/source/common/umutablecptrie.cpp icu-63.2/source/common/umutablecptrie.cpp
--- icu-63.1/source/common/umutablecptrie.cpp	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/common/umutablecptrie.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -60,6 +60,7 @@
 constexpr int32_t INDEX_3_18BIT_BLOCK_LENGTH = UCPTRIE_INDEX_3_BLOCK_LENGTH + UCPTRIE_INDEX_3_BLOCK_LENGTH / 8;
 
 class AllSameBlocks;
+class MixedBlocks;
 
 class MutableCodePointTrie : public UMemory {
 public:
@@ -92,8 +93,10 @@
     void maskValues(uint32_t mask);
     UChar32 findHighStart() const;
     int32_t compactWholeDataBlocks(int32_t fastILimit, AllSameBlocks &allSameBlocks);
-    int32_t compactData(int32_t fastILimit, uint32_t *newData, int32_t dataNullIndex);
-    int32_t compactIndex(int32_t fastILimit, UErrorCode &errorCode);
+    int32_t compactData(
+            int32_t fastILimit, uint32_t *newData, int32_t newDataCapacity,
+            int32_t dataNullIndex, MixedBlocks &mixedBlocks, UErrorCode &errorCode);
+    int32_t compactIndex(int32_t fastILimit, MixedBlocks &mixedBlocks, UErrorCode &errorCode);
     int32_t compactTrie(int32_t fastILimit, UErrorCode &errorCode);
 
     uint32_t *index = nullptr;
@@ -301,41 +304,56 @@
     uint32_t nullValue = initialValue;
     if (filter != nullptr) { nullValue = filter(context, nullValue); }
     UChar32 c = start;
-    uint32_t value;
+    uint32_t trieValue, value;
     bool haveValue = false;
     int32_t i = c >> UCPTRIE_SHIFT_3;
     do {
         if (flags[i] == ALL_SAME) {
-            uint32_t value2 = maybeFilterValue(index[i], initialValue, nullValue,
-                                               filter, context);
+            uint32_t trieValue2 = index[i];
             if (haveValue) {
-                if (value2 != value) {
-                    return c - 1;
+                if (trieValue2 != trieValue) {
+                    if (filter == nullptr ||
+                            maybeFilterValue(trieValue2, initialValue, nullValue,
+                                             filter, context) != value) {
+                        return c - 1;
+                    }
+                    trieValue = trieValue2;  // may or may not help
                 }
             } else {
-                value = value2;
+                trieValue = trieValue2;
+                value = maybeFilterValue(trieValue2, initialValue, nullValue, filter, context);
                 if (pValue != nullptr) { *pValue = value; }
                 haveValue = true;
             }
             c = (c + UCPTRIE_SMALL_DATA_BLOCK_LENGTH) & ~UCPTRIE_SMALL_DATA_MASK;
         } else /* MIXED */ {
             int32_t di = index[i] + (c & UCPTRIE_SMALL_DATA_MASK);
-            uint32_t value2 = maybeFilterValue(data[di], initialValue, nullValue,
-                                               filter, context);
+            uint32_t trieValue2 = data[di];
             if (haveValue) {
-                if (value2 != value) {
-                    return c - 1;
+                if (trieValue2 != trieValue) {
+                    if (filter == nullptr ||
+                            maybeFilterValue(trieValue2, initialValue, nullValue,
+                                             filter, context) != value) {
+                        return c - 1;
+                    }
+                    trieValue = trieValue2;  // may or may not help
                 }
             } else {
-                value = value2;
+                trieValue = trieValue2;
+                value = maybeFilterValue(trieValue2, initialValue, nullValue, filter, context);
                 if (pValue != nullptr) { *pValue = value; }
                 haveValue = true;
             }
             while ((++c & UCPTRIE_SMALL_DATA_MASK) != 0) {
-                if (maybeFilterValue(data[++di], initialValue, nullValue,
-                                     filter, context) != value) {
-                    return c - 1;
+                trieValue2 = data[++di];
+                if (trieValue2 != trieValue) {
+                    if (filter == nullptr ||
+                            maybeFilterValue(trieValue2, initialValue, nullValue,
+                                             filter, context) != value) {
+                        return c - 1;
+                    }
                 }
+                trieValue = trieValue2;  // may or may not help
             }
         }
         ++i;
@@ -548,28 +566,8 @@
     }
 }
 
-inline bool
-equalBlocks(const uint32_t *s, const uint32_t *t, int32_t length) {
-    while (length > 0 && *s == *t) {
-        ++s;
-        ++t;
-        --length;
-    }
-    return length == 0;
-}
-
-inline bool
-equalBlocks(const uint16_t *s, const uint32_t *t, int32_t length) {
-    while (length > 0 && *s == *t) {
-        ++s;
-        ++t;
-        --length;
-    }
-    return length == 0;
-}
-
-inline bool
-equalBlocks(const uint16_t *s, const uint16_t *t, int32_t length) {
+template<typename UIntA, typename UIntB>
+bool equalBlocks(const UIntA *s, const UIntB *t, int32_t length) {
     while (length > 0 && *s == *t) {
         ++s;
         ++t;
@@ -585,36 +583,6 @@
 }
 
 /** Search for an identical block. */
-int32_t findSameBlock(const uint32_t *p, int32_t pStart, int32_t length,
-                      const uint32_t *q, int32_t qStart, int32_t blockLength) {
-    // Ensure that we do not even partially get past length.
-    length -= blockLength;
-
-    q += qStart;
-    while (pStart <= length) {
-        if (equalBlocks(p + pStart, q, blockLength)) {
-            return pStart;
-        }
-        ++pStart;
-    }
-    return -1;
-}
-
-int32_t findSameBlock(const uint16_t *p, int32_t pStart, int32_t length,
-                      const uint32_t *q, int32_t qStart, int32_t blockLength) {
-    // Ensure that we do not even partially get past length.
-    length -= blockLength;
-
-    q += qStart;
-    while (pStart <= length) {
-        if (equalBlocks(p + pStart, q, blockLength)) {
-            return pStart;
-        }
-        ++pStart;
-    }
-    return -1;
-}
-
 int32_t findSameBlock(const uint16_t *p, int32_t pStart, int32_t length,
                       const uint16_t *q, int32_t qStart, int32_t blockLength) {
     // Ensure that we do not even partially get past length.
@@ -655,30 +623,9 @@
  * Look for maximum overlap of the beginning of the other block
  * with the previous, adjacent block.
  */
-int32_t getOverlap(const uint32_t *p, int32_t length,
-                   const uint32_t *q, int32_t qStart, int32_t blockLength) {
-    int32_t overlap = blockLength - 1;
-    U_ASSERT(overlap <= length);
-    q += qStart;
-    while (overlap > 0 && !equalBlocks(p + (length - overlap), q, overlap)) {
-        --overlap;
-    }
-    return overlap;
-}
-
-int32_t getOverlap(const uint16_t *p, int32_t length,
-                   const uint32_t *q, int32_t qStart, int32_t blockLength) {
-    int32_t overlap = blockLength - 1;
-    U_ASSERT(overlap <= length);
-    q += qStart;
-    while (overlap > 0 && !equalBlocks(p + (length - overlap), q, overlap)) {
-        --overlap;
-    }
-    return overlap;
-}
-
-int32_t getOverlap(const uint16_t *p, int32_t length,
-                   const uint16_t *q, int32_t qStart, int32_t blockLength) {
+template<typename UIntA, typename UIntB>
+int32_t getOverlap(const UIntA *p, int32_t length,
+                   const UIntB *q, int32_t qStart, int32_t blockLength) {
     int32_t overlap = blockLength - 1;
     U_ASSERT(overlap <= length);
     q += qStart;
@@ -807,6 +754,171 @@
     int32_t refCounts[CAPACITY];
 };
 
+// Custom hash table for mixed-value blocks to be found anywhere in the
+// compacted data or index so far.
+class MixedBlocks {
+public:
+    MixedBlocks() {}
+    ~MixedBlocks() {
+        uprv_free(table);
+    }
+
+    bool init(int32_t maxLength, int32_t newBlockLength) {
+        // We store actual data indexes + 1 to reserve 0 for empty entries.
+        int32_t maxDataIndex = maxLength - newBlockLength + 1;
+        int32_t newLength;
+        if (maxDataIndex <= 0xfff) {  // 4k
+            newLength = 6007;
+            shift = 12;
+            mask = 0xfff;
+        } else if (maxDataIndex <= 0x7fff) {  // 32k
+            newLength = 50021;
+            shift = 15;
+            mask = 0x7fff;
+        } else if (maxDataIndex <= 0x1ffff) {  // 128k
+            newLength = 200003;
+            shift = 17;
+            mask = 0x1ffff;
+        } else {
+            // maxDataIndex up to around MAX_DATA_LENGTH, ca. 1.1M
+            newLength = 1500007;
+            shift = 21;
+            mask = 0x1fffff;
+        }
+        if (newLength > capacity) {
+            uprv_free(table);
+            table = (uint32_t *)uprv_malloc(newLength * 4);
+            if (table == nullptr) {
+                return false;
+            }
+            capacity = newLength;
+        }
+        length = newLength;
+        uprv_memset(table, 0, length * 4);
+
+        blockLength = newBlockLength;
+        return true;
+    }
+
+    template<typename UInt>
+    void extend(const UInt *data, int32_t minStart, int32_t prevDataLength, int32_t newDataLength) {
+        int32_t start = prevDataLength - blockLength;
+        if (start >= minStart) {
+            ++start;  // Skip the last block that we added last time.
+        } else {
+            start = minStart;  // Begin with the first full block.
+        }
+        for (int32_t end = newDataLength - blockLength; start <= end; ++start) {
+            uint32_t hashCode = makeHashCode(data, start);
+            addEntry(data, start, hashCode, start);
+        }
+    }
+
+    template<typename UIntA, typename UIntB>
+    int32_t findBlock(const UIntA *data, const UIntB *blockData, int32_t blockStart) const {
+        uint32_t hashCode = makeHashCode(blockData, blockStart);
+        int32_t entryIndex = findEntry(data, blockData, blockStart, hashCode);
+        if (entryIndex >= 0) {
+            return (table[entryIndex] & mask) - 1;
+        } else {
+            return -1;
+        }
+    }
+
+    int32_t findAllSameBlock(const uint32_t *data, uint32_t blockValue) const {
+        uint32_t hashCode = makeHashCode(blockValue);
+        int32_t entryIndex = findEntry(data, blockValue, hashCode);
+        if (entryIndex >= 0) {
+            return (table[entryIndex] & mask) - 1;
+        } else {
+            return -1;
+        }
+    }
+
+private:
+    template<typename UInt>
+    uint32_t makeHashCode(const UInt *blockData, int32_t blockStart) const {
+        int32_t blockLimit = blockStart + blockLength;
+        uint32_t hashCode = blockData[blockStart++];
+        do {
+            hashCode = 37 * hashCode + blockData[blockStart++];
+        } while (blockStart < blockLimit);
+        return hashCode;
+    }
+
+    uint32_t makeHashCode(uint32_t blockValue) const {
+        uint32_t hashCode = blockValue;
+        for (int32_t i = 1; i < blockLength; ++i) {
+            hashCode = 37 * hashCode + blockValue;
+        }
+        return hashCode;
+    }
+
+    template<typename UInt>
+    void addEntry(const UInt *data, int32_t blockStart, uint32_t hashCode, int32_t dataIndex) {
+        U_ASSERT(0 <= dataIndex && dataIndex < (int32_t)mask);
+        int32_t entryIndex = findEntry(data, data, blockStart, hashCode);
+        if (entryIndex < 0) {
+            table[~entryIndex] = (hashCode << shift) | (dataIndex + 1);
+        }
+    }
+
+    template<typename UIntA, typename UIntB>
+    int32_t findEntry(const UIntA *data, const UIntB *blockData, int32_t blockStart,
+                      uint32_t hashCode) const {
+        uint32_t shiftedHashCode = hashCode << shift;
+        int32_t initialEntryIndex = (hashCode % (length - 1)) + 1;  // 1..length-1
+        for (int32_t entryIndex = initialEntryIndex;;) {
+            uint32_t entry = table[entryIndex];
+            if (entry == 0) {
+                return ~entryIndex;
+            }
+            if ((entry & ~mask) == shiftedHashCode) {
+                int32_t dataIndex = (entry & mask) - 1;
+                if (equalBlocks(data + dataIndex, blockData + blockStart, blockLength)) {
+                    return entryIndex;
+                }
+            }
+            entryIndex = nextIndex(initialEntryIndex, entryIndex);
+        }
+    }
+
+    int32_t findEntry(const uint32_t *data, uint32_t blockValue, uint32_t hashCode) const {
+        uint32_t shiftedHashCode = hashCode << shift;
+        int32_t initialEntryIndex = (hashCode % (length - 1)) + 1;  // 1..length-1
+        for (int32_t entryIndex = initialEntryIndex;;) {
+            uint32_t entry = table[entryIndex];
+            if (entry == 0) {
+                return ~entryIndex;
+            }
+            if ((entry & ~mask) == shiftedHashCode) {
+                int32_t dataIndex = (entry & mask) - 1;
+                if (allValuesSameAs(data + dataIndex, blockLength, blockValue)) {
+                    return entryIndex;
+                }
+            }
+            entryIndex = nextIndex(initialEntryIndex, entryIndex);
+        }
+    }
+
+    inline int32_t nextIndex(int32_t initialEntryIndex, int32_t entryIndex) const {
+        // U_ASSERT(0 < initialEntryIndex && initialEntryIndex < length);
+        return (entryIndex + initialEntryIndex) % length;
+    }
+
+    // Hash table.
+    // The length is a prime number, larger than the maximum data length.
+    // The "shift" lower bits store a data index + 1.
+    // The remaining upper bits store a partial hashCode of the block data values.
+    uint32_t *table = nullptr;
+    int32_t capacity = 0;
+    int32_t length = 0;
+    int32_t shift = 0;
+    uint32_t mask = 0;
+
+    int32_t blockLength = 0;
+};
+
 int32_t MutableCodePointTrie::compactWholeDataBlocks(int32_t fastILimit, AllSameBlocks &allSameBlocks) {
 #ifdef UCPTRIE_DEBUG
     bool overflow = false;
@@ -962,8 +1074,9 @@
  *
  * It does not try to find an optimal order of writing, deduplicating, and overlapping blocks.
  */
-int32_t MutableCodePointTrie::compactData(int32_t fastILimit,
-                                          uint32_t *newData, int32_t dataNullIndex) {
+int32_t MutableCodePointTrie::compactData(
+        int32_t fastILimit, uint32_t *newData, int32_t newDataCapacity,
+        int32_t dataNullIndex, MixedBlocks &mixedBlocks, UErrorCode &errorCode) {
 #ifdef UCPTRIE_DEBUG
     int32_t countSame=0, sumOverlaps=0;
     bool printData = dataLength == 29088 /* line.brk */ ||
@@ -983,8 +1096,14 @@
 #endif
     }
 
-    int32_t iLimit = highStart >> UCPTRIE_SHIFT_3;
     int32_t blockLength = UCPTRIE_FAST_DATA_BLOCK_LENGTH;
+    if (!mixedBlocks.init(newDataCapacity, blockLength)) {
+        errorCode = U_MEMORY_ALLOCATION_ERROR;
+        return 0;
+    }
+    mixedBlocks.extend(newData, 0, 0, newDataLength);
+
+    int32_t iLimit = highStart >> UCPTRIE_SHIFT_3;
     int32_t inc = SMALL_DATA_BLOCKS_PER_BMP_BLOCK;
     int32_t fastLength = 0;
     for (int32_t i = ASCII_I_LIMIT; i < iLimit; i += inc) {
@@ -992,12 +1111,17 @@
             blockLength = UCPTRIE_SMALL_DATA_BLOCK_LENGTH;
             inc = 1;
             fastLength = newDataLength;
+            if (!mixedBlocks.init(newDataCapacity, blockLength)) {
+                errorCode = U_MEMORY_ALLOCATION_ERROR;
+                return 0;
+            }
+            mixedBlocks.extend(newData, 0, 0, newDataLength);
         }
         if (flags[i] == ALL_SAME) {
             uint32_t value = index[i];
-            int32_t n;
             // Find an earlier part of the data array of length blockLength
             // that is filled with this value.
+            int32_t n = mixedBlocks.findAllSameBlock(newData, value);
             // If we find a match, and the current block is the data null block,
             // and it is not a fast block but matches the start of a fast block,
             // then we need to continue looking.
@@ -1005,12 +1129,10 @@
             // and not all of the rest of the fast block is filled with this value.
             // Otherwise trie.getRange() would detect that the fast block starts at
             // dataNullOffset and assume incorrectly that it is filled with the null value.
-            for (int32_t start = 0;
-                    (n = findAllSameBlock(newData, start, newDataLength,
-                                value, blockLength)) >= 0 &&
-                            i == dataNullIndex && i >= fastILimit && n < fastLength &&
-                            isStartOfSomeFastBlock(n, index, fastILimit);
-                    start = n + 1) {}
+            while (n >= 0 && i == dataNullIndex && i >= fastILimit && n < fastLength &&
+                    isStartOfSomeFastBlock(n, index, fastILimit)) {
+                n = findAllSameBlock(newData, n + 1, newDataLength, value, blockLength);
+            }
             if (n >= 0) {
                 DEBUG_DO(++countSame);
                 index[i] = n;
@@ -1023,14 +1145,16 @@
                 }
 #endif
                 index[i] = newDataLength - n;
+                int32_t prevDataLength = newDataLength;
                 while (n < blockLength) {
                     newData[newDataLength++] = value;
                     ++n;
                 }
+                mixedBlocks.extend(newData, 0, prevDataLength, newDataLength);
             }
         } else if (flags[i] == MIXED) {
             const uint32_t *block = data + index[i];
-            int32_t n = findSameBlock(newData, 0, newDataLength, block, 0, blockLength);
+            int32_t n = mixedBlocks.findBlock(newData, block, 0);
             if (n >= 0) {
                 DEBUG_DO(++countSame);
                 index[i] = n;
@@ -1043,9 +1167,11 @@
                 }
 #endif
                 index[i] = newDataLength - n;
+                int32_t prevDataLength = newDataLength;
                 while (n < blockLength) {
                     newData[newDataLength++] = block[n++];
                 }
+                mixedBlocks.extend(newData, 0, prevDataLength, newDataLength);
             }
         } else /* SAME_AS */ {
             uint32_t j = index[i];
@@ -1061,7 +1187,8 @@
     return newDataLength;
 }
 
-int32_t MutableCodePointTrie::compactIndex(int32_t fastILimit, UErrorCode &errorCode) {
+int32_t MutableCodePointTrie::compactIndex(int32_t fastILimit, MixedBlocks &mixedBlocks,
+                                           UErrorCode &errorCode) {
     int32_t fastIndexLength = fastILimit >> (UCPTRIE_FAST_SHIFT - UCPTRIE_SHIFT_3);
     if ((highStart >> UCPTRIE_FAST_SHIFT) <= fastIndexLength) {
         // Only the linear fast index, no multi-stage index tables.
@@ -1095,6 +1222,12 @@
         }
     }
 
+    if (!mixedBlocks.init(fastIndexLength, UCPTRIE_INDEX_3_BLOCK_LENGTH)) {
+        errorCode = U_MEMORY_ALLOCATION_ERROR;
+        return 0;
+    }
+    mixedBlocks.extend(fastIndex, 0, 0, fastIndexLength);
+
     // Examine index-3 blocks. For each determine one of:
     // - same as the index-3 null block
     // - same as a fast-index block
@@ -1105,6 +1238,7 @@
     // Also determine an upper limit for the index-3 table length.
     int32_t index3Capacity = 0;
     i3FirstNull = index3NullOffset;
+    bool hasLongI3Blocks = false;
     // If the fast index covers the whole BMP, then
     // the multi-stage index is only for supplementary code points.
     // Otherwise, the multi-stage index covers all of Unicode.
@@ -1129,13 +1263,13 @@
                     index3Capacity += UCPTRIE_INDEX_3_BLOCK_LENGTH;
                 } else {
                     index3Capacity += INDEX_3_18BIT_BLOCK_LENGTH;
+                    hasLongI3Blocks = true;
                 }
                 i3FirstNull = 0;
             }
         } else {
             if (oredI3 <= 0xffff) {
-                int32_t n = findSameBlock(fastIndex, 0, fastIndexLength,
-                                          index, i, UCPTRIE_INDEX_3_BLOCK_LENGTH);
+                int32_t n = mixedBlocks.findBlock(fastIndex, index, i);
                 if (n >= 0) {
                     flags[i] = I3_BMP;
                     index[i] = n;
@@ -1146,6 +1280,7 @@
             } else {
                 flags[i] = I3_18;
                 index3Capacity += INDEX_3_18BIT_BLOCK_LENGTH;
+                hasLongI3Blocks = true;
             }
         }
         i = j;
@@ -1166,6 +1301,18 @@
     }
     uprv_memcpy(index16, fastIndex, fastIndexLength * 2);
 
+    if (!mixedBlocks.init(index16Capacity, UCPTRIE_INDEX_3_BLOCK_LENGTH)) {
+        errorCode = U_MEMORY_ALLOCATION_ERROR;
+        return 0;
+    }
+    MixedBlocks longI3Blocks;
+    if (hasLongI3Blocks) {
+        if (!longI3Blocks.init(index16Capacity, INDEX_3_18BIT_BLOCK_LENGTH)) {
+            errorCode = U_MEMORY_ALLOCATION_ERROR;
+            return 0;
+        }
+    }
+
     // Compact the index-3 table and write an uncompacted version of the index-2 table.
     uint16_t index2[UNICODE_LIMIT >> UCPTRIE_SHIFT_2];  // index2Capacity
     int32_t i2Length = 0;
@@ -1185,8 +1332,7 @@
         } else if (f == I3_BMP) {
             i3 = index[i];
         } else if (f == I3_16) {
-            int32_t n = findSameBlock(index16, index3Start, indexLength,
-                                      index, i, UCPTRIE_INDEX_3_BLOCK_LENGTH);
+            int32_t n = mixedBlocks.findBlock(index16, index, i);
             if (n >= 0) {
                 i3 = n;
             } else {
@@ -1198,12 +1344,18 @@
                                    index, i, UCPTRIE_INDEX_3_BLOCK_LENGTH);
                 }
                 i3 = indexLength - n;
+                int32_t prevIndexLength = indexLength;
                 while (n < UCPTRIE_INDEX_3_BLOCK_LENGTH) {
                     index16[indexLength++] = index[i + n++];
                 }
+                mixedBlocks.extend(index16, index3Start, prevIndexLength, indexLength);
+                if (hasLongI3Blocks) {
+                    longI3Blocks.extend(index16, index3Start, prevIndexLength, indexLength);
+                }
             }
         } else {
             U_ASSERT(f == I3_18);
+            U_ASSERT(hasLongI3Blocks);
             // Encode an index-3 block that contains one or more data indexes exceeding 16 bits.
             int32_t j = i;
             int32_t jLimit = i + UCPTRIE_INDEX_3_BLOCK_LENGTH;
@@ -1236,8 +1388,7 @@
                 index16[k++] = v;
                 index16[k - 9] = upperBits;
             } while (j < jLimit);
-            int32_t n = findSameBlock(index16, index3Start, indexLength,
-                                      index16, indexLength, INDEX_3_18BIT_BLOCK_LENGTH);
+            int32_t n = longI3Blocks.findBlock(index16, index16, indexLength);
             if (n >= 0) {
                 i3 = n | 0x8000;
             } else {
@@ -1249,6 +1400,7 @@
                                    index16, indexLength, INDEX_3_18BIT_BLOCK_LENGTH);
                 }
                 i3 = (indexLength - n) | 0x8000;
+                int32_t prevIndexLength = indexLength;
                 if (n > 0) {
                     int32_t start = indexLength;
                     while (n < INDEX_3_18BIT_BLOCK_LENGTH) {
@@ -1257,6 +1409,10 @@
                 } else {
                     indexLength += INDEX_3_18BIT_BLOCK_LENGTH;
                 }
+                mixedBlocks.extend(index16, index3Start, prevIndexLength, indexLength);
+                if (hasLongI3Blocks) {
+                    longI3Blocks.extend(index16, index3Start, prevIndexLength, indexLength);
+                }
             }
         }
         if (index3NullOffset < 0 && i3FirstNull >= 0) {
@@ -1279,16 +1435,23 @@
     }
 
     // Compact the index-2 table and write the index-1 table.
+    static_assert(UCPTRIE_INDEX_2_BLOCK_LENGTH == UCPTRIE_INDEX_3_BLOCK_LENGTH,
+                  "must re-init mixedBlocks");
     int32_t blockLength = UCPTRIE_INDEX_2_BLOCK_LENGTH;
     int32_t i1 = fastIndexLength;
     for (int32_t i = 0; i < i2Length; i += blockLength) {
-        if ((i2Length - i) < blockLength) {
+        int32_t n;
+        if ((i2Length - i) >= blockLength) {
+            // normal block
+            U_ASSERT(blockLength == UCPTRIE_INDEX_2_BLOCK_LENGTH);
+            n = mixedBlocks.findBlock(index16, index2, i);
+        } else {
             // highStart is inside the last index-2 block. Shorten it.
             blockLength = i2Length - i;
+            n = findSameBlock(index16, index3Start, indexLength,
+                              index2, i, blockLength);
         }
         int32_t i2;
-        int32_t n = findSameBlock(index16, index3Start, indexLength,
-                                  index2, i, blockLength);
         if (n >= 0) {
             i2 = n;
         } else {
@@ -1299,9 +1462,11 @@
                 n = getOverlap(index16, indexLength, index2, i, blockLength);
             }
             i2 = indexLength - n;
+            int32_t prevIndexLength = indexLength;
             while (n < blockLength) {
                 index16[indexLength++] = index2[i + n++];
             }
+            mixedBlocks.extend(index16, index3Start, prevIndexLength, indexLength);
         }
         // Set the index-1 table entry.
         index16[i1++] = i2;
@@ -1369,7 +1534,11 @@
     uprv_memcpy(newData, asciiData, sizeof(asciiData));
 
     int32_t dataNullIndex = allSameBlocks.findMostUsed();
-    int32_t newDataLength = compactData(fastILimit, newData, dataNullIndex);
+
+    MixedBlocks mixedBlocks;
+    int32_t newDataLength = compactData(fastILimit, newData, newDataCapacity,
+                                        dataNullIndex, mixedBlocks, errorCode);
+    if (U_FAILURE(errorCode)) { return 0; }
     U_ASSERT(newDataLength <= newDataCapacity);
     uprv_free(data);
     data = newData;
@@ -1394,7 +1563,7 @@
         dataNullOffset = UCPTRIE_NO_DATA_NULL_OFFSET;
     }
 
-    int32_t indexLength = compactIndex(fastILimit, errorCode);
+    int32_t indexLength = compactIndex(fastILimit, mixedBlocks, errorCode);
     highStart = realHighStart;
     return indexLength;
 }
diff -Nru icu-63.1/source/common/umutex.h icu-63.2/source/common/umutex.h
--- icu-63.1/source/common/umutex.h	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/common/umutex.h	2019-04-11 22:38:30.000000000 +0000
@@ -54,15 +54,23 @@
 
 #include <atomic>
 
-U_NAMESPACE_BEGIN
-
 // Export an explicit template instantiation of std::atomic<int32_t>. 
 // When building DLLs for Windows this is required as it is used as a data member of the exported SharedObject class.
 // See digitlst.h, pluralaffix.h, datefmt.h, and others for similar examples.
-#if U_PF_WINDOWS <= U_PLATFORM && U_PLATFORM <= U_PF_CYGWIN
+#if U_PF_WINDOWS <= U_PLATFORM && U_PLATFORM <= U_PF_CYGWIN && !defined(U_IN_DOXYGEN)
+  #if defined(__clang__)
+  // Suppress the warning that the explicit instantiation after explicit specialization has no effect.
+  #pragma clang diagnostic push
+  #pragma clang diagnostic ignored "-Winstantiation-after-specialization"
+  #endif
 template struct U_COMMON_API std::atomic<int32_t>;
+  #if defined(__clang__)
+  #pragma clang diagnostic pop
+  #endif
 #endif
 
+U_NAMESPACE_BEGIN
+
 typedef std::atomic<int32_t> u_atomic_int32_t;
 #define ATOMIC_INT32_T_INITIALIZER(val) ATOMIC_VAR_INIT(val)
 
diff -Nru icu-63.1/source/common/unicode/uniset.h icu-63.2/source/common/unicode/uniset.h
--- icu-63.1/source/common/unicode/uniset.h	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/common/unicode/uniset.h	2019-04-11 22:38:30.000000000 +0000
@@ -27,7 +27,6 @@
 
 // Forward Declarations.
 class BMPSet;
-class CharacterProperties;
 class ParsePosition;
 class RBBIRuleScanner;
 class SymbolTable;
@@ -276,14 +275,23 @@
  * @stable ICU 2.0
  */
 class U_COMMON_API UnicodeSet U_FINAL : public UnicodeFilter {
-
-    int32_t len; // length of list used; 0 <= len <= capacity
-    int32_t capacity; // capacity of list
-    UChar32* list; // MUST be terminated with HIGH
-    BMPSet *bmpSet; // The set is frozen iff either bmpSet or stringSpan is not NULL.
-    UChar32* buffer; // internal buffer, may be NULL
-    int32_t bufferCapacity; // capacity of buffer
-    int32_t patLen;
+private:
+    /**
+     * Enough for sets with few ranges.
+     * For example, White_Space has 10 ranges, list length 21.
+     */
+    static constexpr int32_t INITIAL_CAPACITY = 25;
+    // fFlags constant
+    static constexpr uint8_t kIsBogus = 1;  // This set is bogus (i.e. not valid)
+
+    UChar32* list = stackList; // MUST be terminated with HIGH
+    int32_t capacity = INITIAL_CAPACITY; // capacity of list
+    int32_t len = 1; // length of list used; 1 <= len <= capacity
+    uint8_t fFlags = 0;         // Bit flag (see constants above)
+
+    BMPSet *bmpSet = nullptr; // The set is frozen iff either bmpSet or stringSpan is not NULL.
+    UChar32* buffer = nullptr; // internal buffer, may be NULL
+    int32_t bufferCapacity = 0; // capacity of buffer
 
     /**
      * The pattern representation of this set.  This may not be the
@@ -294,15 +302,19 @@
      * indicating that toPattern() must generate a pattern
      * representation from the inversion list.
      */
-    char16_t *pat;
-    UVector* strings; // maintained in sorted order
-    UnicodeSetStringSpan *stringSpan;
+    char16_t *pat = nullptr;
+    int32_t patLen = 0;
+
+    UVector* strings = nullptr; // maintained in sorted order
+    UnicodeSetStringSpan *stringSpan = nullptr;
+
+    /**
+     * Initial list array.
+     * Avoids some heap allocations, and list is never nullptr.
+     * Increases the object size a bit.
+     */
+    UChar32 stackList[INITIAL_CAPACITY];
 
-private:
-    enum { // constants
-        kIsBogus = 1       // This set is bogus (i.e. not valid)
-    };
-    uint8_t fFlags;         // Bit flag (see constants above)
 public:
     /**
      * Determine if this object contains a valid set.
@@ -1480,8 +1492,6 @@
 
     friend class USetAccess;
 
-    int32_t getStringCount() const;
-
     const UnicodeString* getString(int32_t index) const;
 
     //----------------------------------------------------------------
@@ -1528,13 +1538,18 @@
     // Implementation: Utility methods
     //----------------------------------------------------------------
 
-    void ensureCapacity(int32_t newLen, UErrorCode& ec);
+    static int32_t nextCapacity(int32_t minCapacity);
+
+    bool ensureCapacity(int32_t newLen);
 
-    void ensureBufferCapacity(int32_t newLen, UErrorCode& ec);
+    bool ensureBufferCapacity(int32_t newLen);
 
     void swapBuffers(void);
 
     UBool allocateStrings(UErrorCode &status);
+    UBool hasStrings() const;
+    int32_t stringsSize() const;
+    UBool stringsContains(const UnicodeString &s) const;
 
     UnicodeString& _toPattern(UnicodeString& result,
                               UBool escapeUnprintable) const;
@@ -1614,7 +1629,6 @@
                               UnicodeString& rebuiltPat,
                               UErrorCode& ec);
 
-    friend class CharacterProperties;
     static const UnicodeSet* getInclusions(int32_t src, UErrorCode &status);
 
     /**
@@ -1646,7 +1660,10 @@
     /**
      * Set the new pattern to cache.
      */
-    void setPattern(const UnicodeString& newPat);
+    void setPattern(const UnicodeString& newPat) {
+        setPattern(newPat.getBuffer(), newPat.length());
+    }
+    void setPattern(const char16_t *newPat, int32_t newPatLen);
     /**
      * Release existing cached pattern.
      */
diff -Nru icu-63.1/source/common/unicode/urename.h icu-63.2/source/common/unicode/urename.h
--- icu-63.1/source/common/unicode/urename.h	2018-10-15 18:02:37.000000000 +0000
+++ icu-63.2/source/common/unicode/urename.h	2019-04-11 22:38:30.000000000 +0000
@@ -110,7 +110,6 @@
 #define _UTF7Data U_ICU_ENTRY_POINT_RENAME(_UTF7Data)
 #define _UTF8Data U_ICU_ENTRY_POINT_RENAME(_UTF8Data)
 #define allowedHourFormatsCleanup U_ICU_ENTRY_POINT_RENAME(allowedHourFormatsCleanup)
-#define checkImpl U_ICU_ENTRY_POINT_RENAME(checkImpl)
 #define cmemory_cleanup U_ICU_ENTRY_POINT_RENAME(cmemory_cleanup)
 #define dayPeriodRulesCleanup U_ICU_ENTRY_POINT_RENAME(dayPeriodRulesCleanup)
 #define deleteAllowedHourFormats U_ICU_ENTRY_POINT_RENAME(deleteAllowedHourFormats)
diff -Nru icu-63.1/source/common/unicode/uvernum.h icu-63.2/source/common/unicode/uvernum.h
--- icu-63.1/source/common/unicode/uvernum.h	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/common/unicode/uvernum.h	2019-04-11 22:38:30.000000000 +0000
@@ -66,7 +66,7 @@
  *  This value will change in the subsequent releases of ICU
  *  @stable ICU 2.6
  */
-#define U_ICU_VERSION_MINOR_NUM 1
+#define U_ICU_VERSION_MINOR_NUM 2
 
 /** The current ICU patchlevel version as an integer.
  *  This value will change in the subsequent releases of ICU
@@ -121,7 +121,7 @@
  *  This value will change in the subsequent releases of ICU
  *  @stable ICU 2.4
  */
-#define U_ICU_VERSION "63.1"
+#define U_ICU_VERSION "63.2"
 
 /**
  * The current ICU library major version number as a string, for library name suffixes.
@@ -140,7 +140,7 @@
 /** Data version in ICU4C.
  * @internal ICU 4.4 Internal Use Only
  **/
-#define U_ICU_DATA_VERSION "63.1"
+#define U_ICU_DATA_VERSION "63.2"
 #endif  /* U_HIDE_INTERNAL_API */
 
 /*===========================================================================
diff -Nru icu-63.1/source/common/uniset.cpp icu-63.2/source/common/uniset.cpp
--- icu-63.1/source/common/uniset.cpp	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/common/uniset.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -14,6 +14,7 @@
 #include "unicode/parsepos.h"
 #include "unicode/symtable.h"
 #include "unicode/uniset.h"
+#include "unicode/ustring.h"
 #include "unicode/utf8.h"
 #include "unicode/utf16.h"
 #include "ruleiter.h"
@@ -53,11 +54,8 @@
 // LOW <= all valid values. ZERO for codepoints
 #define UNICODESET_LOW 0x000000
 
-// initial storage. Must be >= 0
-#define START_EXTRA 16
-
-// extra amount for growth. Must be >= 0
-#define GROW_EXTRA START_EXTRA
+/** Max list [0, 1, 2, ..., max code point, HIGH] */
+constexpr int32_t MAX_LENGTH = UNICODESET_HIGH + 1;
 
 U_NAMESPACE_BEGIN
 
@@ -137,6 +135,18 @@
     return a.compare(b);
 }
 
+UBool UnicodeSet::hasStrings() const {
+    return strings != nullptr && !strings->isEmpty();
+}
+
+int32_t UnicodeSet::stringsSize() const {
+    return strings == nullptr ? 0 : strings->size();
+}
+
+UBool UnicodeSet::stringsContains(const UnicodeString &s) const {
+    return strings != nullptr && strings->contains((void*) &s);
+}
+
 //----------------------------------------------------------------
 // Constructors &c
 //----------------------------------------------------------------
@@ -144,24 +154,8 @@
 /**
  * Constructs an empty set.
  */
-UnicodeSet::UnicodeSet() :
-    len(1), capacity(1 + START_EXTRA), list(0), bmpSet(0), buffer(0),
-    bufferCapacity(0), patLen(0), pat(NULL), strings(NULL), stringSpan(NULL),
-    fFlags(0)
-{
-    UErrorCode status = U_ZERO_ERROR;
-    allocateStrings(status);
-    if (U_FAILURE(status)) {
-        setToBogus(); // If memory allocation failed, set to bogus state.
-        return;
-    }
-    list = (UChar32*) uprv_malloc(sizeof(UChar32) * capacity);
-    if(list!=NULL){
-        list[0] = UNICODESET_HIGH;
-    } else { // If memory allocation failed, set to bogus state.
-        setToBogus();
-        return;
-    }
+UnicodeSet::UnicodeSet() {
+    list[0] = UNICODESET_HIGH;
     _dbgct(this);
 }
 
@@ -172,89 +166,39 @@
  * @param start first character, inclusive, of range
  * @param end last character, inclusive, of range
  */
-UnicodeSet::UnicodeSet(UChar32 start, UChar32 end) :
-    len(1), capacity(1 + START_EXTRA), list(0), bmpSet(0), buffer(0),
-    bufferCapacity(0), patLen(0), pat(NULL), strings(NULL), stringSpan(NULL),
-    fFlags(0)
-{
-    UErrorCode status = U_ZERO_ERROR;
-    allocateStrings(status);
-    if (U_FAILURE(status)) {
-        setToBogus(); // If memory allocation failed, set to bogus state.
-        return;
-    }
-    list = (UChar32*) uprv_malloc(sizeof(UChar32) * capacity);
-    if(list!=NULL){
-        list[0] = UNICODESET_HIGH;
-        complement(start, end);
-    } else { // If memory allocation failed, set to bogus state.
-        setToBogus();
-        return;
-    }
+UnicodeSet::UnicodeSet(UChar32 start, UChar32 end) {
+    list[0] = UNICODESET_HIGH;
+    add(start, end);
     _dbgct(this);
 }
 
 /**
  * Constructs a set that is identical to the given UnicodeSet.
  */
-UnicodeSet::UnicodeSet(const UnicodeSet& o) :
-    UnicodeFilter(o),
-    len(0), capacity(o.isFrozen() ? o.len : o.len + GROW_EXTRA), list(0),
-    bmpSet(0),
-    buffer(0), bufferCapacity(0),
-    patLen(0), pat(NULL), strings(NULL), stringSpan(NULL),
-    fFlags(0)
-{
-    UErrorCode status = U_ZERO_ERROR;
-    allocateStrings(status);
-    if (U_FAILURE(status)) {
-        setToBogus(); // If memory allocation failed, set to bogus state.
-        return;
-    }
-    list = (UChar32*) uprv_malloc(sizeof(UChar32) * capacity);
-    if(list!=NULL){
-        *this = o;
-    } else { // If memory allocation failed, set to bogus state.
-        setToBogus();
-        return;
-    }
+UnicodeSet::UnicodeSet(const UnicodeSet& o) : UnicodeFilter(o) {
+    *this = o;
     _dbgct(this);
 }
 
 // Copy-construct as thawed.
-UnicodeSet::UnicodeSet(const UnicodeSet& o, UBool /* asThawed */) :
-    UnicodeFilter(o),
-    len(0), capacity(o.len + GROW_EXTRA), list(0),
-    bmpSet(0),
-    buffer(0), bufferCapacity(0),
-    patLen(0), pat(NULL), strings(NULL), stringSpan(NULL),
-    fFlags(0)
-{
-    UErrorCode status = U_ZERO_ERROR;
-    allocateStrings(status);
-    if (U_FAILURE(status)) {
-        setToBogus(); // If memory allocation failed, set to bogus state.
-        return;
-    }
-    list = (UChar32*) uprv_malloc(sizeof(UChar32) * capacity);
-    if(list!=NULL){
+UnicodeSet::UnicodeSet(const UnicodeSet& o, UBool /* asThawed */) : UnicodeFilter(o) {
+    if (ensureCapacity(o.len)) {
         // *this = o except for bmpSet and stringSpan
         len = o.len;
         uprv_memcpy(list, o.list, (size_t)len*sizeof(UChar32));
-        if (strings != NULL && o.strings != NULL) {
-            strings->assign(*o.strings, cloneUnicodeString, status);
-        } else { // Invalid strings.
-            setToBogus();
-            return;
+        if (o.hasStrings()) {
+            UErrorCode status = U_ZERO_ERROR;
+            if (!allocateStrings(status) ||
+                    (strings->assign(*o.strings, cloneUnicodeString, status), U_FAILURE(status))) {
+                setToBogus();
+                return;
+            }
         }
         if (o.pat) {
-            setPattern(UnicodeString(o.pat, o.patLen));
+            setPattern(o.pat, o.patLen);
         }
-    } else { // If memory allocation failed, set to bogus state.
-        setToBogus();
-        return;
+        _dbgct(this);
     }
-    _dbgct(this);
 }
 
 /**
@@ -262,9 +206,11 @@
  */
 UnicodeSet::~UnicodeSet() {
     _dbgdt(this); // first!
-    uprv_free(list);
+    if (list != stackList) {
+        uprv_free(list);
+    }
     delete bmpSet;
-    if (buffer) {
+    if (buffer != stackList) {
         uprv_free(buffer);
     }
     delete strings;
@@ -290,32 +236,30 @@
         setToBogus();
         return *this;
     }
-    UErrorCode ec = U_ZERO_ERROR;
-    ensureCapacity(o.len, ec);
-    if (U_FAILURE(ec)) {
+    if (!ensureCapacity(o.len)) {
         // ensureCapacity will mark the UnicodeSet as Bogus if OOM failure happens.
         return *this;
     }
     len = o.len;
     uprv_memcpy(list, o.list, (size_t)len*sizeof(UChar32));
-    if (o.bmpSet == NULL || asThawed) {
-        bmpSet = NULL;
-    } else {
+    if (o.bmpSet != nullptr && !asThawed) {
         bmpSet = new BMPSet(*o.bmpSet, list, len);
         if (bmpSet == NULL) { // Check for memory allocation error.
             setToBogus();
             return *this;
         }
     }
-    if (strings != NULL && o.strings != NULL) {
-        strings->assign(*o.strings, cloneUnicodeString, ec);
-    } else { // Invalid strings.
-        setToBogus();
-        return *this;
+    if (o.hasStrings()) {
+        UErrorCode status = U_ZERO_ERROR;
+        if ((strings == nullptr && !allocateStrings(status)) ||
+                (strings->assign(*o.strings, cloneUnicodeString, status), U_FAILURE(status))) {
+            setToBogus();
+            return *this;
+        }
+    } else if (hasStrings()) {
+        strings->removeAllElements();
     }
-    if (o.stringSpan == NULL || asThawed) {
-        stringSpan = NULL;
-    } else {
+    if (o.stringSpan != nullptr && !asThawed) {
         stringSpan = new UnicodeSetStringSpan(*o.stringSpan, *strings);
         if (stringSpan == NULL) { // Check for memory allocation error.
             setToBogus();
@@ -324,7 +268,7 @@
     }
     releasePattern();
     if (o.pat) {
-        setPattern(UnicodeString(o.pat, o.patLen));
+        setPattern(o.pat, o.patLen);
     }
     return *this;
 }
@@ -357,7 +301,8 @@
     for (int32_t i = 0; i < len; ++i) {
         if (list[i] != o.list[i]) return FALSE;
     }
-    if (*strings != *o.strings) return FALSE;
+    if (hasStrings() != o.hasStrings()) { return FALSE; }
+    if (hasStrings() && *strings != *o.strings) return FALSE;
     return TRUE;
 }
 
@@ -393,7 +338,7 @@
     for (int32_t i = 0; i < count; ++i) {
         n += getRangeEnd(i) - getRangeStart(i) + 1;
     }
-    return n + strings->size();
+    return n + stringsSize();
 }
 
 /**
@@ -402,7 +347,7 @@
  * @return <tt>true</tt> if this set contains no elements.
  */
 UBool UnicodeSet::isEmpty(void) const {
-    return len == 1 && strings->size() == 0;
+    return len == 1 && !hasStrings();
 }
 
 /**
@@ -502,7 +447,7 @@
     if (s.length() == 0) return FALSE;
     int32_t cp = getSingleCP(s);
     if (cp < 0) {
-        return strings->contains((void*) &s);
+        return stringsContains(s);
     } else {
         return contains((UChar32) cp);
     }
@@ -524,8 +469,7 @@
             return FALSE;
         }
     }
-    if (!strings->containsAll(*c.strings)) return FALSE;
-    return TRUE;
+    return !c.hasStrings() || (strings != nullptr && strings->containsAll(*c.strings));
 }
 
 /**
@@ -571,8 +515,7 @@
             return FALSE;
         }
     }
-    if (!strings->containsNone(*c.strings)) return FALSE;
-    return TRUE;
+    return strings == nullptr || !c.hasStrings() || strings->containsNone(*c.strings);
 }
 
 /**
@@ -613,7 +556,7 @@
             return TRUE;
         }
     }
-    if (strings->size() != 0) {
+    if (hasStrings()) {
         for (i=0; i<strings->size(); ++i) {
             const UnicodeString& s = *(const UnicodeString*)strings->elementAt(i);
             //if (s.length() == 0) {
@@ -648,7 +591,7 @@
             return U_MISMATCH;
         }
     } else {
-        if (strings->size() != 0) { // try strings first
+        if (hasStrings()) { // try strings first
 
             // might separate forward and backward loops later
             // for now they are combined
@@ -849,7 +792,39 @@
  */
 UnicodeSet& UnicodeSet::add(UChar32 start, UChar32 end) {
     if (pinCodePoint(start) < pinCodePoint(end)) {
-        UChar32 range[3] = { start, end+1, UNICODESET_HIGH };
+        UChar32 limit = end + 1;
+        // Fast path for adding a new range after the last one.
+        // Odd list length: [..., lastStart, lastLimit, HIGH]
+        if ((len & 1) != 0) {
+            // If the list is empty, set lastLimit low enough to not be adjacent to 0.
+            UChar32 lastLimit = len == 1 ? -2 : list[len - 2];
+            if (lastLimit <= start && !isFrozen() && !isBogus()) {
+                if (lastLimit == start) {
+                    // Extend the last range.
+                    list[len - 2] = limit;
+                    if (limit == UNICODESET_HIGH) {
+                        --len;
+                    }
+                } else {
+                    list[len - 1] = start;
+                    if (limit < UNICODESET_HIGH) {
+                        if (ensureCapacity(len + 2)) {
+                            list[len++] = limit;
+                            list[len++] = UNICODESET_HIGH;
+                        }
+                    } else {  // limit == UNICODESET_HIGH
+                        if (ensureCapacity(len + 1)) {
+                            list[len++] = UNICODESET_HIGH;
+                        }
+                    }
+                }
+                releasePattern();
+                return *this;
+            }
+        }
+        // This is slow. Could be much faster using findCodePoint(start)
+        // and modifying the list, dealing with adjacent & overlapping ranges.
+        UChar32 range[3] = { start, limit, UNICODESET_HIGH };
         add(range, 2, 0);
     } else if (start == end) {
         add(start);
@@ -918,9 +893,7 @@
         list[i] = c;
         // if we touched the HIGH mark, then add a new one
         if (c == (UNICODESET_HIGH - 1)) {
-            UErrorCode status = U_ZERO_ERROR;
-            ensureCapacity(len+1, status);
-            if (U_FAILURE(status)) {
+            if (!ensureCapacity(len+1)) {
                 // ensureCapacity will mark the object as Bogus if OOM failure happens.
                 return *this;
             }
@@ -964,21 +937,13 @@
         //                             ^
         //                             list[i]
 
-        UErrorCode status = U_ZERO_ERROR;
-        ensureCapacity(len+2, status);
-        if (U_FAILURE(status)) {
+        if (!ensureCapacity(len+2)) {
             // ensureCapacity will mark the object as Bogus if OOM failure happens.
             return *this;
         }
 
-        //for (int32_t k=len-1; k>=i; --k) {
-        //    list[k+2] = list[k];
-        //}
-        UChar32* src = list + len;
-        UChar32* dst = src + 2;
-        UChar32* srclimit = list + i;
-        while (src > srclimit) *(--dst) = *(--src);
-
+        UChar32 *p = list + i;
+        uprv_memmove(p + 2, p, (len - i) * sizeof(*p));
         list[i] = c;
         list[i+1] = c+1;
         len += 2;
@@ -1014,7 +979,7 @@
     if (s.length() == 0 || isFrozen() || isBogus()) return *this;
     int32_t cp = getSingleCP(s);
     if (cp < 0) {
-        if (!strings->contains((void*) &s)) {
+        if (!stringsContains(s)) {
             _add(s);
             releasePattern();
         }
@@ -1033,12 +998,16 @@
     if (isFrozen() || isBogus()) {
         return;
     }
+    UErrorCode ec = U_ZERO_ERROR;
+    if (strings == nullptr && !allocateStrings(ec)) {
+        setToBogus();
+        return;
+    }
     UnicodeString* t = new UnicodeString(s);
     if (t == NULL) { // Check for memory allocation error.
         setToBogus();
         return;
     }
-    UErrorCode ec = U_ZERO_ERROR;
     strings->sortedInsert(t, compareUnicodeString, ec);
     if (U_FAILURE(ec)) {
         setToBogus();
@@ -1121,7 +1090,10 @@
 }
 
 UnicodeSet& UnicodeSet::removeAllStrings() {
-    strings->removeAllElements();
+    if (!isFrozen() && hasStrings()) {
+        strings->removeAllElements();
+        releasePattern();
+    }
     return *this;
 }
 
@@ -1217,8 +1189,9 @@
     if (s.length() == 0 || isFrozen() || isBogus()) return *this;
     int32_t cp = getSingleCP(s);
     if (cp < 0) {
-        strings->removeElement((void*) &s);
-        releasePattern();
+        if (strings != nullptr && strings->removeElement((void*) &s)) {
+            releasePattern();
+        }
     } else {
         remove((UChar32)cp, (UChar32)cp);
     }
@@ -1260,24 +1233,17 @@
     if (isFrozen() || isBogus()) {
         return *this;
     }
-    UErrorCode status = U_ZERO_ERROR;
     if (list[0] == UNICODESET_LOW) {
-        ensureBufferCapacity(len-1, status);
-        if (U_FAILURE(status)) {
-            return *this;
-        }
-        uprv_memcpy(buffer, list + 1, (size_t)(len-1)*sizeof(UChar32));
+        uprv_memmove(list, list + 1, (size_t)(len-1)*sizeof(UChar32));
         --len;
     } else {
-        ensureBufferCapacity(len+1, status);
-        if (U_FAILURE(status)) {
+        if (!ensureCapacity(len+1)) {
             return *this;
         }
-        uprv_memcpy(buffer + 1, list, (size_t)len*sizeof(UChar32));
-        buffer[0] = UNICODESET_LOW;
+        uprv_memmove(list + 1, list, (size_t)len*sizeof(UChar32));
+        list[0] = UNICODESET_LOW;
         ++len;
     }
-    swapBuffers();
     releasePattern();
     return *this;
 }
@@ -1294,7 +1260,7 @@
     if (s.length() == 0 || isFrozen() || isBogus()) return *this;
     int32_t cp = getSingleCP(s);
     if (cp < 0) {
-        if (strings->contains((void*) &s)) {
+        if (stringsContains(s)) {
             strings->removeElement((void*) &s);
         } else {
             _add(s);
@@ -1325,7 +1291,7 @@
     if ( c.strings!=NULL ) {
         for (int32_t i=0; i<c.strings->size(); ++i) {
             const UnicodeString* s = (const UnicodeString*)c.strings->elementAt(i);
-            if (!strings->contains((void*) s)) {
+            if (!stringsContains(*s)) {
                 _add(*s);
             }
         }
@@ -1347,7 +1313,13 @@
         return *this;
     }
     retain(c.list, c.len, 0);
-    strings->retainAll(*c.strings);
+    if (hasStrings()) {
+        if (!c.hasStrings()) {
+            strings->removeAllElements();
+        } else {
+            strings->retainAll(*c.strings);
+        }
+    }
     return *this;
 }
 
@@ -1365,7 +1337,9 @@
         return *this;
     }
     retain(c.list, c.len, 2);
-    strings->removeAll(*c.strings);
+    if (hasStrings() && c.hasStrings()) {
+        strings->removeAll(*c.strings);
+    }
     return *this;
 }
 
@@ -1383,10 +1357,12 @@
     }
     exclusiveOr(c.list, c.len, 0);
 
-    for (int32_t i=0; i<c.strings->size(); ++i) {
-        void* e = c.strings->elementAt(i);
-        if (!strings->removeElement(e)) {
-            _add(*(const UnicodeString*)e);
+    if (c.strings != nullptr) {
+        for (int32_t i=0; i<c.strings->size(); ++i) {
+            void* e = c.strings->elementAt(i);
+            if (strings == nullptr || !strings->removeElement(e)) {
+                _add(*(const UnicodeString*)e);
+            }
         }
     }
     return *this;
@@ -1400,18 +1376,14 @@
     if (isFrozen()) {
         return *this;
     }
-    if (list != NULL) {
-        list[0] = UNICODESET_HIGH;
-    }
+    list[0] = UNICODESET_HIGH;
     len = 1;
     releasePattern();
     if (strings != NULL) {
         strings->removeAllElements();
     }
-    if (list != NULL && strings != NULL) {
-        // Remove bogus
-        fFlags = 0;
-    }
+    // Remove bogus
+    fFlags = 0;
     return *this;
 }
 
@@ -1445,10 +1417,6 @@
     return list[index*2 + 1] - 1;
 }
 
-int32_t UnicodeSet::getStringCount() const {
-    return strings->size();
-}
-
 const UnicodeString* UnicodeSet::getString(int32_t index) const {
     return (const UnicodeString*) strings->elementAt(index);
 }
@@ -1462,22 +1430,32 @@
         return *this;
     }
     // Delete buffer first to defragment memory less.
-    if (buffer != NULL) {
+    if (buffer != stackList) {
         uprv_free(buffer);
         buffer = NULL;
+        bufferCapacity = 0;
     }
-    if (len < capacity) {
-        // Make the capacity equal to len or 1.
-        // We don't want to realloc of 0 size.
-        int32_t newCapacity = len + (len == 0);
-        UChar32* temp = (UChar32*) uprv_realloc(list, sizeof(UChar32) * newCapacity);
+    if (list == stackList) {
+        // pass
+    } else if (len <= INITIAL_CAPACITY) {
+        uprv_memcpy(stackList, list, len * sizeof(UChar32));
+        uprv_free(list);
+        list = stackList;
+        capacity = INITIAL_CAPACITY;
+    } else if ((len + 7) < capacity) {
+        // If we have more than a little unused capacity, shrink it to len.
+        UChar32* temp = (UChar32*) uprv_realloc(list, sizeof(UChar32) * len);
         if (temp) {
             list = temp;
-            capacity = newCapacity;
+            capacity = len;
         }
         // else what the heck happened?! We allocated less memory!
         // Oh well. We'll keep our original array.
     }
+    if (strings != nullptr && strings->isEmpty()) {
+        delete strings;
+        strings = nullptr;
+    }
     return *this;
 }
 
@@ -1488,10 +1466,8 @@
 /**
  * Deserialize constructor.
  */
-UnicodeSet::UnicodeSet(const uint16_t data[], int32_t dataLen, ESerialization serialization, UErrorCode &ec)
-  : len(1), capacity(1+START_EXTRA), list(0), bmpSet(0), buffer(0),
-    bufferCapacity(0), patLen(0), pat(NULL), strings(NULL), stringSpan(NULL),
-    fFlags(0) {
+UnicodeSet::UnicodeSet(const uint16_t data[], int32_t dataLen, ESerialization serialization,
+                       UErrorCode &ec) {
 
   if(U_FAILURE(ec)) {
     setToBogus();
@@ -1506,24 +1482,15 @@
     return;
   }
 
-  allocateStrings(ec);
-  if (U_FAILURE(ec)) {
-    setToBogus();
-    return;
-  }
-
   // bmp?
   int32_t headerSize = ((data[0]&0x8000)) ?2:1;
   int32_t bmpLength = (headerSize==1)?data[0]:data[1];
 
-  len = (((data[0]&0x7FFF)-bmpLength)/2)+bmpLength;
+  int32_t newLength = (((data[0]&0x7FFF)-bmpLength)/2)+bmpLength;
 #ifdef DEBUG_SERIALIZE
-  printf("dataLen %d headerSize %d bmpLen %d len %d. data[0]=%X/%X/%X/%X\n", dataLen,headerSize,bmpLength,len, data[0],data[1],data[2],data[3]);
+  printf("dataLen %d headerSize %d bmpLen %d len %d. data[0]=%X/%X/%X/%X\n", dataLen,headerSize,bmpLength,newLength, data[0],data[1],data[2],data[3]);
 #endif
-  capacity = len+1;
-  list = (UChar32*) uprv_malloc(sizeof(UChar32) * capacity);
-  if(!list || U_FAILURE(ec)) {
-    setToBogus();
+  if(!ensureCapacity(newLength + 1)) {  // +1 for HIGH
     return;
   }
   // copy bmp
@@ -1535,15 +1502,18 @@
 #endif
   }
   // copy smp
-  for(i=bmpLength;i<len;i++) {
+  for(i=bmpLength;i<newLength;i++) {
     list[i] = ((UChar32)data[headerSize+bmpLength+(i-bmpLength)*2+0] << 16) +
               ((UChar32)data[headerSize+bmpLength+(i-bmpLength)*2+1]);
 #ifdef DEBUG_SERIALIZE
     printf("<<32@%d+[%d] %lX\n", headerSize+bmpLength+i, i, list[i]);
 #endif
   }
-  // terminator
-  list[len++]=UNICODESET_HIGH;
+  U_ASSERT(i == newLength);
+  if (i == 0 || list[i - 1] != UNICODESET_HIGH) {
+    list[i++] = UNICODESET_HIGH;
+  }
+  len = i;
 }
 
 
@@ -1664,33 +1634,65 @@
     return TRUE;
 }
 
-void UnicodeSet::ensureCapacity(int32_t newLen, UErrorCode& ec) {
+int32_t UnicodeSet::nextCapacity(int32_t minCapacity) {
+    // Grow exponentially to reduce the frequency of allocations.
+    if (minCapacity < INITIAL_CAPACITY) {
+        return minCapacity + INITIAL_CAPACITY;
+    } else if (minCapacity <= 2500) {
+        return 5 * minCapacity;
+    } else {
+        int32_t newCapacity = 2 * minCapacity;
+        if (newCapacity > MAX_LENGTH) {
+            newCapacity = MAX_LENGTH;
+        }
+        return newCapacity;
+    }
+}
+
+bool UnicodeSet::ensureCapacity(int32_t newLen) {
+    if (newLen > MAX_LENGTH) {
+        newLen = MAX_LENGTH;
+    }
     if (newLen <= capacity) {
-        return;
+        return true;
     }
-    UChar32* temp = (UChar32*) uprv_realloc(list, sizeof(UChar32) * (newLen + GROW_EXTRA));
+    int32_t newCapacity = nextCapacity(newLen);
+    UChar32* temp = (UChar32*) uprv_malloc(newCapacity * sizeof(UChar32));
     if (temp == NULL) {
-        ec = U_MEMORY_ALLOCATION_ERROR;
         setToBogus(); // set the object to bogus state if an OOM failure occurred.
-        return;
+        return false;
+    }
+    // Copy only the actual contents.
+    uprv_memcpy(temp, list, len * sizeof(UChar32));
+    if (list != stackList) {
+        uprv_free(list);
     }
     list = temp;
-    capacity = newLen + GROW_EXTRA;
-    // else we keep the original contents on the memory failure.
+    capacity = newCapacity;
+    return true;
 }
 
-void UnicodeSet::ensureBufferCapacity(int32_t newLen, UErrorCode& ec) {
-    if (buffer != NULL && newLen <= bufferCapacity)
-        return;
-    UChar32* temp = (UChar32*) uprv_realloc(buffer, sizeof(UChar32) * (newLen + GROW_EXTRA));
+bool UnicodeSet::ensureBufferCapacity(int32_t newLen) {
+    if (newLen > MAX_LENGTH) {
+        newLen = MAX_LENGTH;
+    }
+    if (newLen <= bufferCapacity) {
+        return true;
+    }
+    int32_t newCapacity = nextCapacity(newLen);
+    UChar32* temp = (UChar32*) uprv_malloc(newCapacity * sizeof(UChar32));
     if (temp == NULL) {
-        ec = U_MEMORY_ALLOCATION_ERROR;
         setToBogus();
-        return;
+        return false;
+    }
+    // The buffer has no contents to be copied.
+    // It is always filled from scratch after this call.
+    if (buffer != stackList) {
+        uprv_free(buffer);
     }
     buffer = temp;
-    bufferCapacity = newLen + GROW_EXTRA;
-    // else we keep the original contents on the memory failure.
+    bufferCapacity = newCapacity;
+    return true;
 }
 
 /**
@@ -1727,9 +1729,7 @@
     if (isFrozen() || isBogus()) {
         return;
     }
-    UErrorCode status = U_ZERO_ERROR;
-    ensureBufferCapacity(len + otherLen, status);
-    if (U_FAILURE(status)) {
+    if (!ensureBufferCapacity(len + otherLen)) {
         return;
     }
 
@@ -1777,9 +1777,7 @@
     if (isFrozen() || isBogus() || other==NULL) {
         return;
     }
-    UErrorCode status = U_ZERO_ERROR;
-    ensureBufferCapacity(len + otherLen, status);
-    if (U_FAILURE(status)) {
+    if (!ensureBufferCapacity(len + otherLen)) {
         return;
     }
 
@@ -1890,9 +1888,7 @@
     if (isFrozen() || isBogus()) {
         return;
     }
-    UErrorCode status = U_ZERO_ERROR;
-    ensureBufferCapacity(len + otherLen, status);
-    if (U_FAILURE(status)) {
+    if (!ensureBufferCapacity(len + otherLen)) {
         return;
     }
 
@@ -2138,12 +2134,14 @@
         }
     }
 
-    for (int32_t i = 0; i<strings->size(); ++i) {
-        result.append(OPEN_BRACE);
-        _appendToPat(result,
-                     *(const UnicodeString*) strings->elementAt(i),
-                     escapeUnprintable);
-        result.append(CLOSE_BRACE);
+    if (strings != nullptr) {
+        for (int32_t i = 0; i<strings->size(); ++i) {
+            result.append(OPEN_BRACE);
+            _appendToPat(result,
+                         *(const UnicodeString*) strings->elementAt(i),
+                         escapeUnprintable);
+            result.append(CLOSE_BRACE);
+        }
     }
     return result.append(SET_CLOSE);
 }
@@ -2162,13 +2160,12 @@
 /**
 * Set the new pattern to cache.
 */
-void UnicodeSet::setPattern(const UnicodeString& newPat) {
+void UnicodeSet::setPattern(const char16_t *newPat, int32_t newPatLen) {
     releasePattern();
-    int32_t newPatLen = newPat.length();
     pat = (UChar *)uprv_malloc((newPatLen + 1) * sizeof(UChar));
     if (pat) {
         patLen = newPatLen;
-        newPat.extractBetween(0, patLen, pat);
+        u_memcpy(pat, newPat, patLen);
         pat[patLen] = 0;
     }
     // else we don't care if malloc failed. This was just a nice cache.
@@ -2177,30 +2174,15 @@
 
 UnicodeFunctor *UnicodeSet::freeze() {
     if(!isFrozen() && !isBogus()) {
-        // Do most of what compact() does before freezing because
-        // compact() will not work when the set is frozen.
-        // Small modification: Don't shrink if the savings would be tiny (<=GROW_EXTRA).
-
-        // Delete buffer first to defragment memory less.
-        if (buffer != NULL) {
-            uprv_free(buffer);
-            buffer = NULL;
-        }
-        if (capacity > (len + GROW_EXTRA)) {
-            // Make the capacity equal to len or 1.
-            // We don't want to realloc of 0 size.
-            capacity = len + (len == 0);
-            list = (UChar32*) uprv_realloc(list, sizeof(UChar32) * capacity);
-            if (list == NULL) { // Check for memory allocation error.
-                setToBogus();
-                return this;
-            }
-        }
+        compact();
 
         // Optimize contains() and span() and similar functions.
-        if (!strings->isEmpty()) {
+        if (hasStrings()) {
             stringSpan = new UnicodeSetStringSpan(*this, *strings, UnicodeSetStringSpan::ALL);
-            if (stringSpan != NULL && !stringSpan->needsStringSpanUTF16()) {
+            if (stringSpan == nullptr) {
+                setToBogus();
+                return this;
+            } else if (!stringSpan->needsStringSpanUTF16()) {
                 // All strings are irrelevant for span() etc. because
                 // all of each string's code points are contained in this set.
                 // Do not check needsStringSpanUTF8() because UTF-8 has at most as
@@ -2233,7 +2215,7 @@
     }
     if(stringSpan!=NULL) {
         return stringSpan->span(s, length, spanCondition);
-    } else if(!strings->isEmpty()) {
+    } else if(hasStrings()) {
         uint32_t which= spanCondition==USET_SPAN_NOT_CONTAINED ?
                             UnicodeSetStringSpan::FWD_UTF16_NOT_CONTAINED :
                             UnicodeSetStringSpan::FWD_UTF16_CONTAINED;
@@ -2270,7 +2252,7 @@
     }
     if(stringSpan!=NULL) {
         return stringSpan->spanBack(s, length, spanCondition);
-    } else if(!strings->isEmpty()) {
+    } else if(hasStrings()) {
         uint32_t which= spanCondition==USET_SPAN_NOT_CONTAINED ?
                             UnicodeSetStringSpan::BACK_UTF16_NOT_CONTAINED :
                             UnicodeSetStringSpan::BACK_UTF16_CONTAINED;
@@ -2308,7 +2290,7 @@
     }
     if(stringSpan!=NULL) {
         return stringSpan->spanUTF8((const uint8_t *)s, length, spanCondition);
-    } else if(!strings->isEmpty()) {
+    } else if(hasStrings()) {
         uint32_t which= spanCondition==USET_SPAN_NOT_CONTAINED ?
                             UnicodeSetStringSpan::FWD_UTF8_NOT_CONTAINED :
                             UnicodeSetStringSpan::FWD_UTF8_CONTAINED;
@@ -2346,7 +2328,7 @@
     }
     if(stringSpan!=NULL) {
         return stringSpan->spanBackUTF8((const uint8_t *)s, length, spanCondition);
-    } else if(!strings->isEmpty()) {
+    } else if(hasStrings()) {
         uint32_t which= spanCondition==USET_SPAN_NOT_CONTAINED ?
                             UnicodeSetStringSpan::BACK_UTF8_NOT_CONTAINED :
                             UnicodeSetStringSpan::BACK_UTF8_CONTAINED;
diff -Nru icu-63.1/source/common/uniset_closure.cpp icu-63.2/source/common/uniset_closure.cpp
--- icu-63.1/source/common/uniset_closure.cpp	2018-09-29 00:34:41.000000000 +0000
+++ icu-63.2/source/common/uniset_closure.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -31,10 +31,6 @@
 #include "util.h"
 #include "uvector.h"
 
-// initial storage. Must be >= 0
-// *** same as in uniset.cpp ! ***
-#define START_EXTRA 16
-
 U_NAMESPACE_BEGIN
 
 // TODO memory debugging provided inside uniset.cpp
@@ -49,42 +45,16 @@
 UnicodeSet::UnicodeSet(const UnicodeString& pattern,
                        uint32_t options,
                        const SymbolTable* symbols,
-                       UErrorCode& status) :
-    len(0), capacity(START_EXTRA), list(0), bmpSet(0), buffer(0),
-    bufferCapacity(0), patLen(0), pat(NULL), strings(NULL), stringSpan(NULL),
-    fFlags(0)
-{
-    if(U_SUCCESS(status)){
-        list = (UChar32*) uprv_malloc(sizeof(UChar32) * capacity);
-        /* test for NULL */
-        if(list == NULL) {
-            status = U_MEMORY_ALLOCATION_ERROR;  
-        }else{
-            allocateStrings(status);
-            applyPattern(pattern, options, symbols, status);
-        }
-    }
+                       UErrorCode& status) {
+    applyPattern(pattern, options, symbols, status);
     _dbgct(this);
 }
 
 UnicodeSet::UnicodeSet(const UnicodeString& pattern, ParsePosition& pos,
                        uint32_t options,
                        const SymbolTable* symbols,
-                       UErrorCode& status) :
-    len(0), capacity(START_EXTRA), list(0), bmpSet(0), buffer(0),
-    bufferCapacity(0), patLen(0), pat(NULL), strings(NULL), stringSpan(NULL),
-    fFlags(0)
-{
-    if(U_SUCCESS(status)){
-        list = (UChar32*) uprv_malloc(sizeof(UChar32) * capacity);
-        /* test for NULL */
-        if(list == NULL) {
-            status = U_MEMORY_ALLOCATION_ERROR;   
-        }else{
-            allocateStrings(status);
-            applyPattern(pattern, pos, options, symbols, status);
-        }
-    }
+                       UErrorCode& status) {
+    applyPattern(pattern, pos, options, symbols, status);
     _dbgct(this);
 }
 
@@ -199,7 +169,7 @@
             // start with input set to guarantee inclusion
             // USET_CASE: remove strings because the strings will actually be reduced (folded);
             //            therefore, start with no strings and add only those needed
-            if (attribute & USET_CASE_INSENSITIVE) {
+            if ((attribute & USET_CASE_INSENSITIVE) && foldSet.hasStrings()) {
                 foldSet.strings->removeAllElements();
             }
 
@@ -234,7 +204,7 @@
                     }
                 }
             }
-            if (strings != NULL && strings->size() > 0) {
+            if (hasStrings()) {
                 if (attribute & USET_CASE_INSENSITIVE) {
                     for (int32_t j=0; j<strings->size(); ++j) {
                         str = *(const UnicodeString *) strings->elementAt(j);
diff -Nru icu-63.1/source/common/uniset_props.cpp icu-63.2/source/common/uniset_props.cpp
--- icu-63.1/source/common/uniset_props.cpp	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/common/uniset_props.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -47,10 +47,6 @@
 
 U_NAMESPACE_USE
 
-// initial storage. Must be >= 0
-// *** same as in uniset.cpp ! ***
-#define START_EXTRA 16
-
 // Define UChar constants using hex for EBCDIC compatibility
 // Used #define to reduce private static exports and memory access time.
 #define SET_OPEN        ((UChar)0x005B) /*[*/
@@ -185,21 +181,8 @@
  * @param pattern a string specifying what characters are in the set
  */
 UnicodeSet::UnicodeSet(const UnicodeString& pattern,
-                       UErrorCode& status) :
-    len(0), capacity(START_EXTRA), list(0), bmpSet(0), buffer(0),
-    bufferCapacity(0), patLen(0), pat(NULL), strings(NULL), stringSpan(NULL),
-    fFlags(0)
-{
-    if(U_SUCCESS(status)){
-        list = (UChar32*) uprv_malloc(sizeof(UChar32) * capacity);
-        /* test for NULL */
-        if(list == NULL) {
-            status = U_MEMORY_ALLOCATION_ERROR;  
-        }else{
-            allocateStrings(status);
-            applyPattern(pattern, status);
-        }
-    }
+                       UErrorCode& status) {
+    applyPattern(pattern, status);
     _dbgct(this);
 }
 
@@ -713,6 +696,11 @@
     return u_getNumericValue(ch) == *(double*)context;
 }
 
+static UBool generalCategoryMaskFilter(UChar32 ch, void* context) {
+    int32_t value = *(int32_t*)context;
+    return (U_GET_GC_MASK((UChar32) ch) & value) != 0;
+}
+
 static UBool versionFilter(UChar32 ch, void* context) {
     static const UVersionInfo none = { 0, 0, 0, 0 };
     UVersionInfo v;
@@ -721,6 +709,16 @@
     return uprv_memcmp(&v, &none, sizeof(v)) > 0 && uprv_memcmp(&v, version, sizeof(v)) <= 0;
 }
 
+typedef struct {
+    UProperty prop;
+    int32_t value;
+} IntPropertyContext;
+
+static UBool intPropertyFilter(UChar32 ch, void* context) {
+    IntPropertyContext* c = (IntPropertyContext*)context;
+    return u_getIntPropertyValue((UChar32) ch, c->prop) == c->value;
+}
+
 static UBool scriptExtensionsFilter(UChar32 ch, void* context) {
     return uscript_hasScript(ch, *(UScriptCode*)context);
 }
@@ -781,43 +779,6 @@
 
 namespace {
 
-/** Maps map values to 1 if the mask contains their value'th bit, all others to 0. */
-uint32_t U_CALLCONV generalCategoryMaskFilter(const void *context, uint32_t value) {
-    uint32_t mask = *(const uint32_t *)context;
-    value = U_MASK(value) & mask;
-    if (value != 0) { value = 1; }
-    return value;
-}
-
-/** Maps one map value to 1, all others to 0. */
-uint32_t U_CALLCONV intValueFilter(const void *context, uint32_t value) {
-    uint32_t v = *(const uint32_t *)context;
-    return value == v ? 1 : 0;
-}
-
-}  // namespace
-
-void UnicodeSet::applyIntPropertyValue(const UCPMap *map,
-                                       UCPMapValueFilter *filter, const void *context,
-                                       UErrorCode &errorCode) {
-    if (U_FAILURE(errorCode)) { return; }
-    clear();
-    UChar32 start = 0, end;
-    uint32_t value;
-    while ((end = ucpmap_getRange(map, start, UCPMAP_RANGE_NORMAL, 0,
-                                  filter, context, &value)) >= 0) {
-        if (value != 0) {
-            add(start, end);
-        }
-        start = end + 1;
-    }
-    if (isBogus()) {
-        errorCode = U_MEMORY_ALLOCATION_ERROR;
-    }
-}
-
-namespace {
-
 static UBool mungeCharName(char* dst, const char* src, int32_t dstCapacity) {
     /* Note: we use ' ' in compiler code page */
     int32_t j = 0;
@@ -845,11 +806,10 @@
 
 UnicodeSet&
 UnicodeSet::applyIntPropertyValue(UProperty prop, int32_t value, UErrorCode& ec) {
-    if (U_FAILURE(ec)) { return *this; }
-    // All of the following check isFrozen() before modifying this set.
+    if (U_FAILURE(ec) || isFrozen()) { return *this; }
     if (prop == UCHAR_GENERAL_CATEGORY_MASK) {
-        const UCPMap *map = u_getIntPropertyMap(UCHAR_GENERAL_CATEGORY, &ec);
-        applyIntPropertyValue(map, generalCategoryMaskFilter, &value, ec);
+        const UnicodeSet* inclusions = CharacterProperties::getInclusionsForProperty(prop, ec);
+        applyFilter(generalCategoryMaskFilter, &value, inclusions, ec);
     } else if (prop == UCHAR_SCRIPT_EXTENSIONS) {
         const UnicodeSet* inclusions = CharacterProperties::getInclusionsForProperty(prop, ec);
         UScriptCode script = (UScriptCode)value;
@@ -866,14 +826,11 @@
             clear();
         }
     } else if (UCHAR_INT_START <= prop && prop < UCHAR_INT_LIMIT) {
-        const UCPMap *map = u_getIntPropertyMap(prop, &ec);
-        applyIntPropertyValue(map, intValueFilter, &value, ec);
+        const UnicodeSet* inclusions = CharacterProperties::getInclusionsForProperty(prop, ec);
+        IntPropertyContext c = {prop, value};
+        applyFilter(intPropertyFilter, &c, inclusions, ec);
     } else {
-        // This code used to always call getInclusions(property source)
-        // which sets an error for an unsupported property.
         ec = U_ILLEGAL_ARGUMENT_ERROR;
-        // Otherwise we would just clear() this set because
-        // getIntPropertyValue(c, prop) returns 0 for all code points.
     }
     return *this;
 }
diff -Nru icu-63.1/source/common/uprops.h icu-63.2/source/common/uprops.h
--- icu-63.1/source/common/uprops.h	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/common/uprops.h	2019-04-11 22:38:30.000000000 +0000
@@ -462,7 +462,6 @@
 class CharacterProperties {
 public:
     CharacterProperties() = delete;
-    static void U_CALLCONV initInclusion(UPropertySource src, UErrorCode &errorCode);
     static const UnicodeSet *getInclusionsForProperty(UProperty prop, UErrorCode &errorCode);
 };
 
diff -Nru icu-63.1/source/common/uset.cpp icu-63.2/source/common/uset.cpp
--- icu-63.1/source/common/uset.cpp	2018-09-29 00:34:41.000000000 +0000
+++ icu-63.2/source/common/uset.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -249,7 +249,7 @@
 public:
     /* Try to have the compiler inline these*/
     inline static int32_t getStringCount(const UnicodeSet& set) {
-        return set.getStringCount();
+        return set.stringsSize();
     }
     inline static const UnicodeString* getString(const UnicodeSet& set,
                                                  int32_t i) {
diff -Nru icu-63.1/source/common/usetiter.cpp icu-63.2/source/common/usetiter.cpp
--- icu-63.1/source/common/usetiter.cpp	2018-09-29 00:34:41.000000000 +0000
+++ icu-63.2/source/common/usetiter.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -116,7 +116,7 @@
         stringCount = 0;
     } else {
         endRange = set->getRangeCount() - 1;
-        stringCount = set->strings->size();
+        stringCount = set->stringsSize();
     }
     range = 0;
     endElement = -1;
diff -Nru icu-63.1/source/configure icu-63.2/source/configure
--- icu-63.1/source/configure	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/configure	2019-04-11 22:38:30.000000000 +0000
@@ -1,6 +1,6 @@
 #! /bin/sh
 # Guess values for system-dependent variables and create Makefiles.
-# Generated by GNU Autoconf 2.69 for ICU 63.1.
+# Generated by GNU Autoconf 2.69 for ICU 63.2.
 #
 # Report bugs to <http://icu-project.org/bugs>.
 #
@@ -582,8 +582,8 @@
 # Identity of this package.
 PACKAGE_NAME='ICU'
 PACKAGE_TARNAME='International Components for Unicode'
-PACKAGE_VERSION='63.1'
-PACKAGE_STRING='ICU 63.1'
+PACKAGE_VERSION='63.2'
+PACKAGE_STRING='ICU 63.2'
 PACKAGE_BUGREPORT='http://icu-project.org/bugs'
 PACKAGE_URL='http://icu-project.org'
 
@@ -1370,7 +1370,7 @@
   # Omit some internal or obsolete options to make the list less imposing.
   # This message is too long to be a string in the A/UX 3.1 sh.
   cat <<_ACEOF
-\`configure' configures ICU 63.1 to adapt to many kinds of systems.
+\`configure' configures ICU 63.2 to adapt to many kinds of systems.
 
 Usage: $0 [OPTION]... [VAR=VALUE]...
 
@@ -1437,7 +1437,7 @@
 
 if test -n "$ac_init_help"; then
   case $ac_init_help in
-     short | recursive ) echo "Configuration of ICU 63.1:";;
+     short | recursive ) echo "Configuration of ICU 63.2:";;
    esac
   cat <<\_ACEOF
 
@@ -1574,7 +1574,7 @@
 test -n "$ac_init_help" && exit $ac_status
 if $ac_init_version; then
   cat <<\_ACEOF
-ICU configure 63.1
+ICU configure 63.2
 generated by GNU Autoconf 2.69
 
 Copyright (C) 2012 Free Software Foundation, Inc.
@@ -2266,7 +2266,7 @@
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
 
-It was created by ICU $as_me 63.1, which was
+It was created by ICU $as_me 63.2, which was
 generated by GNU Autoconf 2.69.  Invocation command line was
 
   $ $0 $@
@@ -8434,7 +8434,7 @@
 # report actual input values of CONFIG_FILES etc. instead of their
 # values after options handling.
 ac_log="
-This file was extended by ICU $as_me 63.1, which was
+This file was extended by ICU $as_me 63.2, which was
 generated by GNU Autoconf 2.69.  Invocation command line was
 
   CONFIG_FILES    = $CONFIG_FILES
@@ -8488,7 +8488,7 @@
 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
 ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`"
 ac_cs_version="\\
-ICU config.status 63.1
+ICU config.status 63.2
 configured by $0, generated by GNU Autoconf 2.69,
   with options \\"\$ac_cs_config\\"
 
Binary files /tmp/Jsaapc2Fpt/icu-63.1/source/data/in/icudt63l.dat and /tmp/cdhst3GSjB/icu-63.2/source/data/in/icudt63l.dat differ
diff -Nru icu-63.1/source/i18n/fmtable.cpp icu-63.2/source/i18n/fmtable.cpp
--- icu-63.1/source/i18n/fmtable.cpp	2018-09-29 00:34:42.000000000 +0000
+++ icu-63.2/source/i18n/fmtable.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -734,7 +734,7 @@
       // not print scientific notation for magnitudes greater than -5 and smaller than some amount (+5?).
       if (fDecimalQuantity->isZero()) {
         fDecimalStr->append("0", -1, status);
-      } else if (std::abs(fDecimalQuantity->getMagnitude()) < 5) {
+      } else if (fDecimalQuantity->getMagnitude() != INT32_MIN && std::abs(fDecimalQuantity->getMagnitude()) < 5) {
         fDecimalStr->appendInvariantChars(fDecimalQuantity->toPlainString(), status);
       } else {
         fDecimalStr->appendInvariantChars(fDecimalQuantity->toScientificString(), status);
diff -Nru icu-63.1/source/i18n/japancal.cpp icu-63.2/source/i18n/japancal.cpp
--- icu-63.1/source/i18n/japancal.cpp	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/i18n/japancal.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -18,6 +18,16 @@
 #if !UCONFIG_NO_FORMATTING
 #if U_PLATFORM_HAS_WINUWP_API == 0
 #include <stdlib.h> // getenv() is not available in UWP env
+#else
+#ifndef WIN32_LEAN_AND_MEAN
+#   define WIN32_LEAN_AND_MEAN
+#endif
+#   define VC_EXTRALEAN
+#   define NOUSER
+#   define NOSERVICE
+#   define NOIME
+#   define NOMCX
+#include <windows.h>
 #endif
 #include "cmemory.h"
 #include "erarules.h"
diff -Nru icu-63.1/source/i18n/number_decimalquantity.cpp icu-63.2/source/i18n/number_decimalquantity.cpp
--- icu-63.1/source/i18n/number_decimalquantity.cpp	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/i18n/number_decimalquantity.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -820,7 +820,10 @@
     }
     result.append(u'E');
     int32_t _scale = upperPos + scale;
-    if (_scale < 0) {
+    if (_scale == INT32_MIN) {
+        result.append({u"-2147483648", -1});
+        return result;
+    } else if (_scale < 0) {
         _scale *= -1;
         result.append(u'-');
     } else {
diff -Nru icu-63.1/source/i18n/unicode/numberrangeformatter.h icu-63.2/source/i18n/unicode/numberrangeformatter.h
--- icu-63.1/source/i18n/unicode/numberrangeformatter.h	2018-10-15 18:02:37.000000000 +0000
+++ icu-63.2/source/i18n/unicode/numberrangeformatter.h	2019-04-11 22:38:30.000000000 +0000
@@ -185,8 +185,14 @@
  * Export an explicit template instantiation. See datefmt.h
  * (When building DLLs for Windows this is required.)
  */
-#if U_PF_WINDOWS <= U_PLATFORM && U_PLATFORM <= U_PF_CYGWIN && !defined(U_IN_DOXYGEN)
-template struct U_I18N_API std::atomic<impl::NumberRangeFormatterImpl*>;
+#if U_PLATFORM == U_PF_WINDOWS && !defined(U_IN_DOXYGEN)
+} // namespace icu::number
+U_NAMESPACE_END
+
+template struct U_I18N_API std::atomic< U_NAMESPACE_QUALIFIER number::impl::NumberRangeFormatterImpl*>;
+
+U_NAMESPACE_BEGIN
+namespace number {  // icu::number
 #endif
 /** \endcond */
 
diff -Nru icu-63.1/source/i18n/uspoof.cpp icu-63.2/source/i18n/uspoof.cpp
--- icu-63.1/source/i18n/uspoof.cpp	2018-09-29 00:34:42.000000000 +0000
+++ icu-63.2/source/i18n/uspoof.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -547,7 +547,7 @@
     return uspoof_check2UnicodeString(sc, id, NULL, status);
 }
 
-int32_t checkImpl(const SpoofImpl* This, const UnicodeString& id, CheckResult* checkResult, UErrorCode* status) {
+static int32_t checkImpl(const SpoofImpl* This, const UnicodeString& id, CheckResult* checkResult, UErrorCode* status) {
     U_ASSERT(This != NULL);
     U_ASSERT(checkResult != NULL);
     checkResult->clear();
diff -Nru icu-63.1/source/test/intltest/convtest.cpp icu-63.2/source/test/intltest/convtest.cpp
--- icu-63.1/source/test/intltest/convtest.cpp	2018-09-29 00:34:42.000000000 +0000
+++ icu-63.2/source/test/intltest/convtest.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -606,12 +606,7 @@
                 // First try to see if we have different sets because ucnv_getUnicodeSet()
                 // added strings: The above conversion method does not tell us what strings might be convertible.
                 // Remove strings from the set and compare again.
-                // Unfortunately, there are no good, direct set methods for finding out whether there are strings
-                // in the set, nor for enumerating or removing just them.
-                // Intersect all code points with the set. The intersection will not contain strings.
-                UnicodeSet temp(0, 0x10ffff);
-                temp.retainAll(set);
-                set=temp;
+                set.removeAllStrings();
             }
             if(set!=expected) {
                 UnicodeSet diffSet;
diff -Nru icu-63.1/source/test/intltest/incaltst.cpp icu-63.2/source/test/intltest/incaltst.cpp
--- icu-63.1/source/test/intltest/incaltst.cpp	2018-09-29 00:34:42.000000000 +0000
+++ icu-63.2/source/test/intltest/incaltst.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -77,6 +77,7 @@
     CASE(7,TestPersian);
     CASE(8,TestPersianFormat);
     CASE(9,TestTaiwan);
+    CASE(10,TestJapaneseHeiseiToReiwa);
     default: name = ""; break;
     }
 }
@@ -626,23 +627,23 @@
         // Test simple parse/format with adopt
         UDate aDate = 0; 
         
-        // Test parse with missing era (should default to current era, heisei)
+        // Test parse with missing era (should default to current era)
         // Test parse with incomplete information
         logln("Testing parse w/ missing era...");
-        SimpleDateFormat *fmt = new SimpleDateFormat(UnicodeString("y.M.d"), Locale("ja_JP@calendar=japanese"), status);
+        SimpleDateFormat *fmt = new SimpleDateFormat(UnicodeString("y/M/d"), Locale("ja_JP@calendar=japanese"), status);
         CHECK(status, "creating date format instance");
         if(!fmt) { 
             errln("Couldn't create en_US instance");
         } else {
             UErrorCode s2 = U_ZERO_ERROR;
             cal2->clear();
-            UnicodeString samplestr("1.1.9");
+            UnicodeString samplestr("1/5/9");
             logln(UnicodeString() + "Test Year: " + samplestr);
             aDate = fmt->parse(samplestr, s2);
             ParsePosition pp=0;
             fmt->parse(samplestr, *cal2, pp);
-            CHECK(s2, "parsing the 1.1.9 string");
-            logln("*cal2 after 119 parse:");
+            CHECK(s2, "parsing the 1/5/9 string");
+            logln("*cal2 after 159 parse:");
             str.remove();
             fmt2->format(aDate, str);
             logln(UnicodeString() + "as Gregorian Calendar: " + str);
@@ -653,7 +654,7 @@
             int32_t expectYear = 1;
             int32_t expectEra = JapaneseCalendar::getCurrentEra();
             if((gotYear!=1) || (gotEra != expectEra)) {
-                errln(UnicodeString("parse "+samplestr+" of 'y.m.d' as Japanese Calendar, expected year ") + expectYear + 
+                errln(UnicodeString("parse "+samplestr+" of 'y/m/d' as Japanese Calendar, expected year ") + expectYear + 
                     UnicodeString(" and era ") + expectEra +", but got year " + gotYear + " and era " + gotEra + " (Gregorian:" + str +")");
             } else {            
                 logln(UnicodeString() + " year: " + gotYear + ", era: " + gotEra);
@@ -666,7 +667,7 @@
         // Test simple parse/format with adopt
         UDate aDate = 0; 
         
-        // Test parse with missing era (should default to current era, heisei)
+        // Test parse with missing era (should default to current era)
         // Test parse with incomplete information
         logln("Testing parse w/ just year...");
         SimpleDateFormat *fmt = new SimpleDateFormat(UnicodeString("y"), Locale("ja_JP@calendar=japanese"), status);
@@ -678,7 +679,7 @@
             cal2->clear();
             UnicodeString samplestr("1");
             logln(UnicodeString() + "Test Year: " + samplestr);
-            aDate = fmt->parse(samplestr, s2);
+            aDate = fmt->parse(samplestr, s2);  // Should be parsed as the first day of the current era
             ParsePosition pp=0;
             fmt->parse(samplestr, *cal2, pp);
             CHECK(s2, "parsing the 1 string");
@@ -691,7 +692,7 @@
             int32_t gotYear = cal2->get(UCAL_YEAR, s2);
             int32_t gotEra = cal2->get(UCAL_ERA, s2);
             int32_t expectYear = 1;
-            int32_t expectEra = 235; //JapaneseCalendar::kCurrentEra;
+            int32_t expectEra = JapaneseCalendar::getCurrentEra();
             if((gotYear!=1) || (gotEra != expectEra)) {
                 errln(UnicodeString("parse "+samplestr+" of 'y' as Japanese Calendar, expected year ") + expectYear + 
                     UnicodeString(" and era ") + expectEra +", but got year " + gotYear + " and era " + gotEra + " (Gregorian:" + str +")");
@@ -707,6 +708,40 @@
     delete fmt2;
 }
 
+void IntlCalendarTest::TestJapaneseHeiseiToReiwa() {
+    Calendar *cal;
+    UErrorCode status = U_ZERO_ERROR;
+    cal = Calendar::createInstance(status);
+    CHECK(status, UnicodeString("Creating default Gregorian Calendar"));
+    cal->set(2019, UCAL_APRIL, 29);
+
+    DateFormat *jfmt = DateFormat::createDateInstance(DateFormat::LONG, "ja@calendar=japanese");
+    CHECK(status, UnicodeString("Creating date format ja@calendar=japanese"))
+
+    const char* EXPECTED_FORMAT[4] = {
+        "\\u5E73\\u621031\\u5E744\\u670829\\u65E5", // Heisei 31 April 29
+        "\\u5E73\\u621031\\u5E744\\u670830\\u65E5", // Heisei 31 April 30
+        "\\u4EE4\\u548c1\\u5E745\\u67081\\u65E5",   // Reiwa 1 May 1
+        "\\u4EE4\\u548c1\\u5E745\\u67082\\u65E5"    // Reiwa 1 May 2
+    };
+
+    for (int32_t i = 0; i < 4; i++) {
+        UnicodeString dateStr;
+        UDate d = cal->getTime(status);
+        CHECK(status, UnicodeString("Get test date"));
+        jfmt->format(d, dateStr);
+        UnicodeString expected(UnicodeString(EXPECTED_FORMAT[i], -1, US_INV).unescape());
+        if (expected.compare(dateStr) != 0) {
+            errln(UnicodeString("Formatting year:") + cal->get(UCAL_YEAR, status) + " month:"
+                + cal->get(UCAL_MONTH, status) + " day:" + (cal->get(UCAL_DATE, status) + 1)
+                + " - expected: " + expected + " / actual: " + dateStr);
+        }
+        cal->add(UCAL_DATE, 1, status);
+        CHECK(status, UnicodeString("Add 1 day"));
+    }
+    delete jfmt;
+    delete cal;
+}
 
 
 
diff -Nru icu-63.1/source/test/intltest/incaltst.h icu-63.2/source/test/intltest/incaltst.h
--- icu-63.1/source/test/intltest/incaltst.h	2018-09-29 00:34:42.000000000 +0000
+++ icu-63.2/source/test/intltest/incaltst.h	2019-04-11 22:38:30.000000000 +0000
@@ -34,6 +34,7 @@
     void TestJapanese(void);
     void TestJapaneseFormat(void);
     void TestJapanese3860(void);
+    void TestJapaneseHeiseiToReiwa(void);
     
     void TestPersian(void);
     void TestPersianFormat(void);
diff -Nru icu-63.1/source/test/intltest/numbertest.h icu-63.2/source/test/intltest/numbertest.h
--- icu-63.1/source/test/intltest/numbertest.h	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/test/intltest/numbertest.h	2019-04-11 22:38:30.000000000 +0000
@@ -10,6 +10,7 @@
 #include "intltest.h"
 #include "number_affixutils.h"
 #include "numparse_stringsegment.h"
+#include "numrange_impl.h"
 #include "unicode/locid.h"
 #include "unicode/numberformatter.h"
 #include "unicode/numberrangeformatter.h"
diff -Nru icu-63.1/source/test/intltest/numfmtst.cpp icu-63.2/source/test/intltest/numfmtst.cpp
--- icu-63.1/source/test/intltest/numfmtst.cpp	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/test/intltest/numfmtst.cpp	2019-04-11 22:38:30.000000000 +0000
@@ -9226,6 +9226,14 @@
     assertEquals(u"Should not overflow and should parse only the first exponent",
                  u"1E-2147483647",
                  {sp.data(), sp.length(), US_INV});
+
+    // Test edge case overflow of exponent
+    result = Formattable();
+    nf->parse(u".0003e-2147483644", result, status);
+    sp = result.getDecimalNumber(status);
+    assertEquals(u"Should not overflow",
+                 u"3E-2147483648",
+                 {sp.data(), sp.length(), US_INV});
 }
 
 void NumberFormatTest::Test13840_ParseLongStringCrash() {
diff -Nru icu-63.1/source/test/testdata/format.txt icu-63.2/source/test/testdata/format.txt
--- icu-63.1/source/test/testdata/format.txt	2018-10-01 22:39:56.000000000 +0000
+++ icu-63.2/source/test/testdata/format.txt	2019-04-11 22:38:30.000000000 +0000
@@ -488,42 +488,44 @@
                     "AD 02008"
                },
 
-				// Japanese
-               {
-                    "en_US@calendar=japanese",         
-                    "",
-                    "PATTERN=G y",
-                    "YEAR=8",
-                    "Heisei 8"
-               },
-               {
-                    "en_US@calendar=japanese",         
-                    "",
-                    "PATTERN=G yy",
-                    "YEAR=8",
-                    "Heisei 08"
-               },
-               {
-                    "en_US@calendar=japanese",         
-                    "",
-                    "PATTERN=G yyy",
-                    "YEAR=8",
-                    "Heisei 008"
-               },
-               {
-                    "en_US@calendar=japanese",         
-                    "",
-                    "PATTERN=G yyyy",
-                    "YEAR=8",
-                    "Heisei 0008"
-               },
-               {
-                    "en_US@calendar=japanese",         
-                    "",
-                    "PATTERN=G yyyyy",
-                    "YEAR=8",
-                    "Heisei 00008"
-               },
+// The following test case is commented out as the current era
+// depends on the current time when the test is run.
+//				// Japanese
+//               {
+//                    "en_US@calendar=japanese",         
+//                    "",
+//                    "PATTERN=G y",
+//                    "YEAR=8",
+//                    "Reiwa 8"
+//               },
+//               {
+//                    "en_US@calendar=japanese",         
+//                    "",
+//                    "PATTERN=G yy",
+//                    "YEAR=8",
+//                    "Reiwa 08"
+//               },
+//               {
+//                    "en_US@calendar=japanese",         
+//                    "",
+//                    "PATTERN=G yyy",
+//                    "YEAR=8",
+//                    "Reiwa 008"
+//               },
+//               {
+//                    "en_US@calendar=japanese",         
+//                    "",
+//                    "PATTERN=G yyyy",
+//                    "YEAR=8",
+//                    "Reiwa 0008"
+//               },
+//               {
+//                    "en_US@calendar=japanese",         
+//                    "",
+//                    "PATTERN=G yyyyy",
+//                    "YEAR=8",
+//                    "Reiwa 00008"
+//               },
 
             }
         }

Reply to: