Bug#992681: scipy breaks statsmodels autopkgtest: place: mask and data must be the same size

To: submit@bugs.debian.org
Subject: Bug#992681: scipy breaks statsmodels autopkgtest: place: mask and data must be the same size
From: Paul Gevers <elbrus@debian.org>
Date: Sun, 22 Aug 2021 10:39:31 +0200
Message-id: <[🔎] 796bd00a-7965-782d-a942-4fe793f570e4@debian.org>
Reply-to: Paul Gevers <elbrus@debian.org>, 992681@bugs.debian.org
Source: scipy, statsmodels
Control: found -1 scipy/1.7.1-1
Control: found -1 statsmodels/0.12.2-1
Severity: serious
Tags: sid bookworm
X-Debbugs-CC: debian-ci@lists.debian.org
User: debian-ci@lists.debian.org
Usertags: breaks needs-update

Dear maintainer(s),

With a recent upload of scipy the autopkgtest of statsmodels fails in
testing when that autopkgtest is run with the binary packages of scipy
from unstable. It passes when run with only packages from testing. In
tabular form:

                       pass            fail
scipy                  from testing    1.7.1-1
statsmodels            from testing    0.12.2-1
all others             from testing    from testing

I copied some of the output at the bottom of this report.

Currently this regression is blocking the migration of scipy to testing
[1]. Due to the nature of this issue, I filed this bug report against
both packages. Can you please investigate the situation and reassign the
bug to the right package?

More information about this bug and the reason for filing it can be found on
https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation

Paul

[1] https://qa.debian.org/excuses.php?package=scipy

https://ci.debian.net/data/autopkgtest/testing/amd64/s/statsmodels/14751207/log.gz

=================================== FAILURES
===================================
______________ TestZeroInflatedPoisson_predict.test_predict_prob
_______________

self =
<statsmodels.discrete.tests.test_count_model.TestZeroInflatedPoisson_predict
object at 0x7f5324966a60>

    def test_predict_prob(self):
        res = self.res

>       pr = res.predict(which='prob')

/usr/lib/python3/dist-packages/statsmodels/discrete/tests/test_count_model.py:267:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
/usr/lib/python3/dist-packages/statsmodels/base/model.py:1099: in predict
    predict_results = self.model.predict(self.params, exog, *args,
/usr/lib/python3/dist-packages/statsmodels/discrete/count_model.py:451:
in predict
    return self._predict_prob(params, exog, exog_infl, exposure, offset)
/usr/lib/python3/dist-packages/statsmodels/discrete/count_model.py:535:
in _predict_prob
    result = self.distribution.pmf(counts, mu, w)
/usr/lib/python3/dist-packages/scipy/stats/_distn_infrastructure.py:3150: in
pmf
    place(output, cond, np.clip(self._pmf(*goodargs), 0, 1))
/usr/lib/python3/dist-packages/statsmodels/distributions/discrete.py:40:
in _pmf
    return np.exp(self._logpmf(x, mu, w))
/usr/lib/python3/dist-packages/statsmodels/distributions/discrete.py:34:
in _logpmf
    return _lazywhere(x != 0, (x, mu, w),
/usr/lib/python3/dist-packages/statsmodels/compat/scipy.py:97: in _lazywhere
    np.place(out, cond, f(*temp))
<__array_function__ internals>:5: in place
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _

arr = array([[-1.73119153, -1.73119153, -1.73119153, ..., -1.73119153,
        -1.73119153, -1.73119153],
       [-1.7311915...895, -1.28321895],
       [-1.28321895, -1.28321895, -1.28321895, ..., -1.28321895,
        -1.28321895, -1.28321895]])
mask = array([[False,  True,  True,  True,  True,  True,  True,  True,
True,
         True]])
vals = array([-1.37428414, -1.36072044, -1.75262184, -2.43220532,
-3.33493235,
       -4.41998093, -5.6591802 , -7.03191085, -8.52242455])

    @array_function_dispatch(_place_dispatcher)
    def place(arr, mask, vals):
        """
        Change elements of an array based on conditional and input values.

        Similar to ``np.copyto(arr, vals, where=mask)``, the difference
is that
        `place` uses the first N elements of `vals`, where N is the
number of
        True values in `mask`, while `copyto` uses the elements where `mask`
        is True.

        Note that `extract` does the exact opposite of `place`.

        Parameters
        ----------
        arr : ndarray
            Array to put data into.
        mask : array_like
            Boolean mask array. Must have the same size as `a`.
        vals : 1-D sequence
            Values to put into `a`. Only the first N elements are used,
where
            N is the number of True values in `mask`. If `vals` is smaller
            than N, it will be repeated, and if elements of `a` are to
be masked,
            this sequence must be non-empty.

        See Also
        --------
        copyto, put, take, extract

        Examples
        --------
        >>> arr = np.arange(6).reshape(2, 3)
        >>> np.place(arr, arr>2, [44, 55])
        >>> arr
        array([[ 0,  1,  2],
               [44, 55, 44]])

        """
        if not isinstance(arr, np.ndarray):
            raise TypeError("argument 1 must be numpy.ndarray, "
                            "not {name}".format(name=type(arr).__name__))

>       return _insert(arr, mask, vals)
E       ValueError: place: mask and data must be the same size

/usr/lib/python3/dist-packages/numpy/lib/function_base.py:1742: ValueError
_________ TestZeroInflatedGeneralizedPoisson_predict.test_predict_prob
_________

self =
<statsmodels.discrete.tests.test_count_model.TestZeroInflatedGeneralizedPoisson_predict
object at 0x7f5324d31340>

    def test_predict_prob(self):
        res = self.res

>       pr = res.predict(which='prob')

/usr/lib/python3/dist-packages/statsmodels/discrete/tests/test_count_model.py:397:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
/usr/lib/python3/dist-packages/statsmodels/base/model.py:1099: in predict
    predict_results = self.model.predict(self.params, exog, *args,
/usr/lib/python3/dist-packages/statsmodels/discrete/count_model.py:451:
in predict
    return self._predict_prob(params, exog, exog_infl, exposure, offset)
/usr/lib/python3/dist-packages/statsmodels/discrete/count_model.py:610:
in _predict_prob
    result = self.distribution.pmf(counts, mu, params_main[-1], p, w)
/usr/lib/python3/dist-packages/scipy/stats/_distn_infrastructure.py:3150: in
pmf
    place(output, cond, np.clip(self._pmf(*goodargs), 0, 1))
/usr/lib/python3/dist-packages/statsmodels/distributions/discrete.py:83:
in _pmf
    return np.exp(self._logpmf(x, mu, alpha, p, w))
/usr/lib/python3/dist-packages/statsmodels/distributions/discrete.py:76:
in _logpmf
    return _lazywhere(x != 0, (x, mu, alpha, p, w),
/usr/lib/python3/dist-packages/statsmodels/compat/scipy.py:97: in _lazywhere
    np.place(out, cond, f(*temp))
<__array_function__ internals>:5: in place
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _

arr = array([[-0.45911613, -0.48614313, -0.5506312 , ..., -0.71569584,
        -0.71569958, -0.71570203],
       [-0.4591161...405, -0.715705  ],
       [-0.3476652 , -0.48199281, -0.58039586, ..., -0.71570259,
        -0.71570405, -0.715705  ]])
mask = array([[False,  True,  True,  True,  True,  True,  True,  True,
True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True]])
vals = array([ -2.07030493,  -2.43338553,  -2.86597292,  -3.32288546,
        -3.78777419,  -4.2538559 ,  -4.71814042,  -5.17...55133,
-10.45451419,
       -10.87752249, -11.29869988, -11.71815969, -12.13600587,
       -12.55233382, -12.96723119])

    @array_function_dispatch(_place_dispatcher)
    def place(arr, mask, vals):
        """
        Change elements of an array based on conditional and input values.

        Similar to ``np.copyto(arr, vals, where=mask)``, the difference
is that
        `place` uses the first N elements of `vals`, where N is the
number of
        True values in `mask`, while `copyto` uses the elements where `mask`
        is True.

        Note that `extract` does the exact opposite of `place`.

        Parameters
        ----------
        arr : ndarray
            Array to put data into.
        mask : array_like
            Boolean mask array. Must have the same size as `a`.
        vals : 1-D sequence
            Values to put into `a`. Only the first N elements are used,
where
            N is the number of True values in `mask`. If `vals` is smaller
            than N, it will be repeated, and if elements of `a` are to
be masked,
            this sequence must be non-empty.

        See Also
        --------
        copyto, put, take, extract

        Examples
        --------
        >>> arr = np.arange(6).reshape(2, 3)
        >>> np.place(arr, arr>2, [44, 55])
        >>> arr
        array([[ 0,  1,  2],
               [44, 55, 44]])

        """
        if not isinstance(arr, np.ndarray):
            raise TypeError("argument 1 must be numpy.ndarray, "
                            "not {name}".format(name=type(arr).__name__))

>       return _insert(arr, mask, vals)
E       ValueError: place: mask and data must be the same size

/usr/lib/python3/dist-packages/numpy/lib/function_base.py:1742: ValueError
_________ TestZeroInflatedNegativeBinomialP_predict.test_predict_prob
__________

self =
<statsmodels.discrete.tests.test_count_model.TestZeroInflatedNegativeBinomialP_predict
object at 0x7f5324466c10>

    def test_predict_prob(self):
        res = self.res
        endog = res.model.endog

>       pr = res.predict(which='prob')

/usr/lib/python3/dist-packages/statsmodels/discrete/tests/test_count_model.py:542:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
/usr/lib/python3/dist-packages/statsmodels/base/model.py:1099: in predict
    predict_results = self.model.predict(self.params, exog, *args,
/usr/lib/python3/dist-packages/statsmodels/discrete/count_model.py:451:
in predict
    return self._predict_prob(params, exog, exog_infl, exposure, offset)
/usr/lib/python3/dist-packages/statsmodels/discrete/count_model.py:689:
in _predict_prob
    result = self.distribution.pmf(counts, mu, params_main[-1], p, w)
/usr/lib/python3/dist-packages/scipy/stats/_distn_infrastructure.py:3150: in
pmf
    place(output, cond, np.clip(self._pmf(*goodargs), 0, 1))
/usr/lib/python3/dist-packages/statsmodels/distributions/discrete.py:115: in
_pmf
    return np.exp(self._logpmf(x, mu, alpha, p, w))
/usr/lib/python3/dist-packages/statsmodels/distributions/discrete.py:108: in
_logpmf
    return _lazywhere(x != 0, (x, s, p, w),
/usr/lib/python3/dist-packages/statsmodels/compat/scipy.py:97: in _lazywhere
    np.place(out, cond, f(*temp))
<__array_function__ internals>:5: in place
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _

arr = array([[-1.16651535, -1.09866187, -1.17857054, ..., -1.85937182,
        -1.85937187, -1.8593719 ],
       [-1.1665153...622, -1.85793214],
       [-1.64298738, -1.53672373, -1.48753839, ..., -1.8571836 ,
        -1.85759622, -1.85793214]])
mask = array([False,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,... True,  True,  True,
 True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True])
vals = array([ -1.72852346,  -1.88421783,  -2.15741638,  -2.49498617,
        -2.87326646,  -3.27963799,  -3.70656908,  -4.14...4252 ,
-15.76582549, -16.29519967,
       -16.82548955, -17.35664206, -17.88860857, -18.42134447,
       -18.95480873])

    @array_function_dispatch(_place_dispatcher)
    def place(arr, mask, vals):
        """
        Change elements of an array based on conditional and input values.

        Similar to ``np.copyto(arr, vals, where=mask)``, the difference
is that
        `place` uses the first N elements of `vals`, where N is the
number of
        True values in `mask`, while `copyto` uses the elements where `mask`
        is True.

        Note that `extract` does the exact opposite of `place`.

        Parameters
        ----------
        arr : ndarray
            Array to put data into.
        mask : array_like
            Boolean mask array. Must have the same size as `a`.
        vals : 1-D sequence
            Values to put into `a`. Only the first N elements are used,
where
            N is the number of True values in `mask`. If `vals` is smaller
            than N, it will be repeated, and if elements of `a` are to
be masked,
            this sequence must be non-empty.

        See Also
        --------
        copyto, put, take, extract

        Examples
        --------
        >>> arr = np.arange(6).reshape(2, 3)
        >>> np.place(arr, arr>2, [44, 55])
        >>> arr
        array([[ 0,  1,  2],
               [44, 55, 44]])

        """
        if not isinstance(arr, np.ndarray):
            raise TypeError("argument 1 must be numpy.ndarray, "
                            "not {name}".format(name=type(arr).__name__))

>       return _insert(arr, mask, vals)
E       ValueError: place: mask and data must be the same size

/usr/lib/python3/dist-packages/numpy/lib/function_base.py:1742: ValueError
______ TestZeroInflatedNegativeBinomialP_predict.test_predict_generic_zi
_______

self =
<statsmodels.discrete.tests.test_count_model.TestZeroInflatedNegativeBinomialP_predict
object at 0x7f5324e73fa0>

    def test_predict_generic_zi(self):
        # These tests do not use numbers from other packages.
        # Tests are on closeness of estimated to true/DGP values
        # and theoretical relationship between quantities
        res = self.res
        endog = self.endog
        exog = self.res.model.exog
        prob_infl = self.prob_infl
        nobs = len(endog)

        freq = np.bincount(endog.astype(int)) / len(endog)
>       probs = res.predict(which='prob')

/usr/lib/python3/dist-packages/statsmodels/discrete/tests/test_count_model.py:563:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
/usr/lib/python3/dist-packages/statsmodels/base/model.py:1099: in predict
    predict_results = self.model.predict(self.params, exog, *args,
/usr/lib/python3/dist-packages/statsmodels/discrete/count_model.py:451:
in predict
    return self._predict_prob(params, exog, exog_infl, exposure, offset)
/usr/lib/python3/dist-packages/statsmodels/discrete/count_model.py:689:
in _predict_prob
    result = self.distribution.pmf(counts, mu, params_main[-1], p, w)
/usr/lib/python3/dist-packages/scipy/stats/_distn_infrastructure.py:3150: in
pmf
    place(output, cond, np.clip(self._pmf(*goodargs), 0, 1))
/usr/lib/python3/dist-packages/statsmodels/distributions/discrete.py:115: in
_pmf
    return np.exp(self._logpmf(x, mu, alpha, p, w))
/usr/lib/python3/dist-packages/statsmodels/distributions/discrete.py:108: in
_logpmf
    return _lazywhere(x != 0, (x, s, p, w),
/usr/lib/python3/dist-packages/statsmodels/compat/scipy.py:97: in _lazywhere
    np.place(out, cond, f(*temp))
<__array_function__ internals>:5: in place
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _

arr = array([[-1.16651535, -1.09866187, -1.17857054, ..., -1.85937182,
        -1.85937187, -1.8593719 ],
       [-1.1665153...622, -1.85793214],
       [-1.64298738, -1.53672373, -1.48753839, ..., -1.8571836 ,
        -1.85759622, -1.85793214]])
mask = array([False,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,... True,  True,  True,
 True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True])
vals = array([ -1.72852346,  -1.88421783,  -2.15741638,  -2.49498617,
        -2.87326646,  -3.27963799,  -3.70656908,  -4.14...4252 ,
-15.76582549, -16.29519967,
       -16.82548955, -17.35664206, -17.88860857, -18.42134447,
       -18.95480873])

    @array_function_dispatch(_place_dispatcher)
    def place(arr, mask, vals):
        """
        Change elements of an array based on conditional and input values.

        Similar to ``np.copyto(arr, vals, where=mask)``, the difference
is that
        `place` uses the first N elements of `vals`, where N is the
number of
        True values in `mask`, while `copyto` uses the elements where `mask`
        is True.

        Note that `extract` does the exact opposite of `place`.

        Parameters
        ----------
        arr : ndarray
            Array to put data into.
        mask : array_like
            Boolean mask array. Must have the same size as `a`.
        vals : 1-D sequence
            Values to put into `a`. Only the first N elements are used,
where
            N is the number of True values in `mask`. If `vals` is smaller
            than N, it will be repeated, and if elements of `a` are to
be masked,
            this sequence must be non-empty.

        See Also
        --------
        copyto, put, take, extract

        Examples
        --------
        >>> arr = np.arange(6).reshape(2, 3)
        >>> np.place(arr, arr>2, [44, 55])
        >>> arr
        array([[ 0,  1,  2],
               [44, 55, 44]])

        """
        if not isinstance(arr, np.ndarray):
            raise TypeError("argument 1 must be numpy.ndarray, "
                            "not {name}".format(name=type(arr).__name__))

>       return _insert(arr, mask, vals)
E       ValueError: place: mask and data must be the same size

/usr/lib/python3/dist-packages/numpy/lib/function_base.py:1742: ValueError
_____________________________ test_extension_types
_____________________________

df =            a  b     c     d
0   1.764052  0   NaN  <NA>
1   0.400157  1   1.0     1
2   0.978738  2   NaN  <NA>
3   2....  NaN  <NA>
97  1.785870  7  97.0    97
98  0.126912  8   NaN  <NA>
99  0.401989  9  99.0    99

[100 rows x 4 columns]

    @pytest.mark.skipif(not hasattr(pd, "NA"), reason="Must support NA")
    def test_extension_types(df):
        df["c"] = pd.Series(np.arange(100.0))
        df["d"] = pd.Series(np.arange(100), dtype=pd.Int64Dtype())
        df.loc[df.index[::2], "c"] = np.nan
        df.loc[df.index[::2], "d"] = pd.NA
        res = Description(df)
>       np.testing.assert_allclose(res.frame.c, res.frame.d)

/usr/lib/python3/dist-packages/statsmodels/stats/tests/test_descriptivestats.py:212:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
pandas/_libs/properties.pyx:33: in
pandas._libs.properties.CachedProperty.__get__
    ???
/usr/lib/python3/dist-packages/statsmodels/stats/descriptivestats.py:384: in
frame
    numeric = self.numeric
pandas/_libs/properties.pyx:33: in
pandas._libs.properties.CachedProperty.__get__
    ???
/usr/lib/python3/dist-packages/statsmodels/stats/descriptivestats.py:449: in
numeric
    jb = df.apply(
/usr/lib/python3/dist-packages/pandas/core/frame.py:7552: in apply
    return op.get_result()
/usr/lib/python3/dist-packages/pandas/core/apply.py:185: in get_result
    return self.apply_standard()
/usr/lib/python3/dist-packages/pandas/core/apply.py:276: in apply_standard
    results, res_index = self.apply_series_generator()
/usr/lib/python3/dist-packages/pandas/core/apply.py:305: in
apply_series_generator
    results[i] = self.f(v)
/usr/lib/python3/dist-packages/statsmodels/stats/descriptivestats.py:450: in
<lambda>
    lambda x: list(jarque_bera(x.dropna())), result_type="expand"
/usr/lib/python3/dist-packages/statsmodels/stats/stattools.py:123: in
jarque_bera
    skew = stats.skew(resids, axis=axis)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _

a = array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31,
33, 35,
       37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69,
       71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99],
      dtype=object)
axis = 0, bias = True, nan_policy = 'propagate'

    def skew(a, axis=0, bias=True, nan_policy='propagate'):
        r"""Compute the sample skewness of a data set.

        For normally distributed data, the skewness should be about
zero. For
        unimodal continuous distributions, a skewness value greater than
zero means
        that there is more weight in the right tail of the distribution. The
        function `skewtest` can be used to determine if the skewness value
        is close enough to zero, statistically speaking.

        Parameters
        ----------
        a : ndarray
            Input array.
        axis : int or None, optional
            Axis along which skewness is calculated. Default is 0.
            If None, compute over the whole array `a`.
        bias : bool, optional
            If False, then the calculations are corrected for
statistical bias.
        nan_policy : {'propagate', 'raise', 'omit'}, optional
            Defines how to handle when input contains nan.
            The following options are available (default is 'propagate'):

              * 'propagate': returns nan
              * 'raise': throws an error
              * 'omit': performs the calculations ignoring nan values

        Returns
        -------
        skewness : ndarray
            The skewness of values along an axis, returning 0 where all
values are
            equal.

        Notes
        -----
        The sample skewness is computed as the Fisher-Pearson coefficient
        of skewness, i.e.

        .. math::

            g_1=\frac{m_3}{m_2^{3/2}}

        where

        .. math::

            m_i=\frac{1}{N}\sum_{n=1}^N(x[n]-\bar{x})^i

        is the biased sample :math:`i\texttt{th}` central moment, and
        :math:`\bar{x}` is
        the sample mean.  If ``bias`` is False, the calculations are
        corrected for bias and the value computed is the adjusted
        Fisher-Pearson standardized moment coefficient, i.e.

        .. math::

            G_1=\frac{k_3}{k_2^{3/2}}=
                \frac{\sqrt{N(N-1)}}{N-2}\frac{m_3}{m_2^{3/2}}.

        References
        ----------
        .. [1] Zwillinger, D. and Kokoska, S. (2000). CRC Standard
           Probability and Statistics Tables and Formulae. Chapman &
Hall: New
           York. 2000.
           Section 2.2.24.1

        Examples
        --------
        >>> from scipy.stats import skew
        >>> skew([1, 2, 3, 4, 5])
        0.0
        >>> skew([2, 8, 0, 4, 1, 9, 9, 0])
        0.2650554122698573

        """
        a, axis = _chk_asarray(a, axis)
        n = a.shape[axis]

        contains_nan, nan_policy = _contains_nan(a, nan_policy)

        if contains_nan and nan_policy == 'omit':
            a = ma.masked_invalid(a)
            return mstats_basic.skew(a, axis, bias)

        mean = a.mean(axis, keepdims=True)
        m2 = _moment(a, 2, axis, mean=mean)
        m3 = _moment(a, 3, axis, mean=mean)
        with np.errstate(all='ignore'):
>           zero = (m2 <= (np.finfo(m2.dtype).resolution *
mean.squeeze(axis))**2)
E           AttributeError: 'float' object has no attribute 'dtype'

/usr/lib/python3/dist-packages/scipy/stats/stats.py:1111: AttributeError
_________ TestDistDependenceMeasures.test_results_on_the_iris_dataset
__________

self =
<statsmodels.stats.tests.test_dist_dependant_measures.TestDistDependenceMeasures
object at 0x7f5317f90cd0>

    def test_results_on_the_iris_dataset(self):
        """
        R code example from the `energy` package documentation for
        `energy::distance_covariance.test`:

        > x <- iris[1:50, 1:4]
        > y <- iris[51:100, 1:4]
        > set.seed(1)
        > dcov.test(x, y, R=200)

            dCov independence test (permutation test)

        data:  index 1, replicates 200
        nV^2 = 0.5254, p-value = 0.9552
        sample estimates:
             dCov
        0.1025087
        """
        try:
            iris = get_rdataset("iris", cache=True).data.values[:, :4]
        except IGNORED_EXCEPTIONS:
            pytest.skip('Failed with HTTPError or URLError, these are
random')

        x = iris[:50]
        y = iris[50:100]

>       stats = ddm.distance_statistics(x, y)

/usr/lib/python3/dist-packages/statsmodels/stats/tests/test_dist_dependant_measures.py:147:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
/usr/lib/python3/dist-packages/statsmodels/stats/dist_dependence_measures.py:355:
in distance_statistics
    a = x_dist if x_dist is not None else squareform(pdist(x, "euclidean"))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _

X = array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3.0, 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
  ..., 3.8, 1.6, 0.2],
       [4.6, 3.2, 1.4, 0.2],
       [5.3, 3.7, 1.5, 0.2],
       [5.0, 3.3, 1.4, 0.2]], dtype=object)
metric = 'euclidean', out = None, kwargs = {}, s = (50, 4), m = 50, n = 4
mstr = 'euclidean'
metric_info = MetricInfo(canonical_name='euclidean', aka={'eu',
'euclidean', 'euclid', 'e'}, dist_func=<function euclidean at
0x7f53...pdist_euclidean of PyCapsule object at 0x7f53360b7390>,
validator=None, types=['double'], requires_contiguous_out=True)

    def pdist(X, metric='euclidean', *, out=None, **kwargs):
        """
        Pairwise distances between observations in n-dimensional space.

        See Notes for common calling conventions.

        Parameters
        ----------
        X : array_like
            An m by n array of m original observations in an
            n-dimensional space.
        metric : str or function, optional
            The distance metric to use. The distance function can
            be 'braycurtis', 'canberra', 'chebyshev', 'cityblock',
            'correlation', 'cosine', 'dice', 'euclidean', 'hamming',
            'jaccard', 'jensenshannon', 'kulsinski', 'mahalanobis',
'matching',
            'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean',
            'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
        **kwargs : dict, optional
            Extra arguments to `metric`: refer to each metric
documentation for a
            list of all possible arguments.

            Some possible arguments:

            p : scalar
            The p-norm to apply for Minkowski, weighted and unweighted.
            Default: 2.

            w : ndarray
            The weight vector for metrics that support weights (e.g.,
Minkowski).

            V : ndarray
            The variance vector for standardized Euclidean.
            Default: var(X, axis=0, ddof=1)

            VI : ndarray
            The inverse of the covariance matrix for Mahalanobis.
            Default: inv(cov(X.T)).T

            out : ndarray.
            The output array
            If not None, condensed distance matrix Y is stored in this
array.

        Returns
        -------
        Y : ndarray
            Returns a condensed distance matrix Y. For each :math:`i`
and :math:`j`
            (where :math:`i<j<m`),where m is the number of original
observations.
            The metric ``dist(u=X[i], v=X[j])`` is computed and stored
in entry ``m
            * i + j - ((i + 2) * (i + 1)) // 2``.

        See Also
        --------
        squareform : converts between condensed distance matrices and
                     square distance matrices.

        Notes
        -----
        See ``squareform`` for information on how to calculate the index of
        this entry or to convert the condensed distance matrix to a
        redundant square matrix.

        The following are common calling conventions.

        1. ``Y = pdist(X, 'euclidean')``

           Computes the distance between m points using Euclidean distance
           (2-norm) as the distance metric between the points. The points
           are arranged as m n-dimensional row vectors in the matrix X.

        2. ``Y = pdist(X, 'minkowski', p=2.)``

           Computes the distances using the Minkowski distance
           :math:`||u-v||_p` (p-norm) where :math:`p \\geq 1`.

        3. ``Y = pdist(X, 'cityblock')``

           Computes the city block or Manhattan distance between the
           points.

        4. ``Y = pdist(X, 'seuclidean', V=None)``

           Computes the standardized Euclidean distance. The standardized
           Euclidean distance between two n-vectors ``u`` and ``v`` is

           .. math::

              \\sqrt{\\sum {(u_i-v_i)^2 / V[x_i]}}


           V is the variance vector; V[i] is the variance computed over all
           the i'th components of the points.  If not passed, it is
           automatically computed.

        5. ``Y = pdist(X, 'sqeuclidean')``

           Computes the squared Euclidean distance :math:`||u-v||_2^2`
between
           the vectors.

        6. ``Y = pdist(X, 'cosine')``

           Computes the cosine distance between vectors u and v,

           .. math::

              1 - \\frac{u \\cdot v}
                       {{||u||}_2 {||v||}_2}

           where :math:`||*||_2` is the 2-norm of its argument ``*``, and
           :math:`u \\cdot v` is the dot product of ``u`` and ``v``.

        7. ``Y = pdist(X, 'correlation')``

           Computes the correlation distance between vectors u and v.
This is

           .. math::

              1 - \\frac{(u - \\bar{u}) \\cdot (v - \\bar{v})}
                       {{||(u - \\bar{u})||}_2 {||(v - \\bar{v})||}_2}

           where :math:`\\bar{v}` is the mean of the elements of vector v,
           and :math:`x \\cdot y` is the dot product of :math:`x` and
:math:`y`.

        8. ``Y = pdist(X, 'hamming')``

           Computes the normalized Hamming distance, or the proportion of
           those vector elements between two n-vectors ``u`` and ``v``
           which disagree. To save memory, the matrix ``X`` can be of type
           boolean.

        9. ``Y = pdist(X, 'jaccard')``

           Computes the Jaccard distance between the points. Given two
           vectors, ``u`` and ``v``, the Jaccard distance is the
           proportion of those elements ``u[i]`` and ``v[i]`` that
           disagree.

        10. ``Y = pdist(X, 'jensenshannon')``

            Computes the Jensen-Shannon distance between two probability
arrays.
            Given two probability vectors, :math:`p` and :math:`q`, the
            Jensen-Shannon distance is

            .. math::

               \\sqrt{\\frac{D(p \\parallel m) + D(q \\parallel m)}{2}}

            where :math:`m` is the pointwise mean of :math:`p` and :math:`q`
            and :math:`D` is the Kullback-Leibler divergence.

        11. ``Y = pdist(X, 'chebyshev')``

            Computes the Chebyshev distance between the points. The
            Chebyshev distance between two n-vectors ``u`` and ``v`` is the
            maximum norm-1 distance between their respective elements. More
            precisely, the distance is given by

            .. math::

               d(u,v) = \\max_i {|u_i-v_i|}

        12. ``Y = pdist(X, 'canberra')``

            Computes the Canberra distance between the points. The
            Canberra distance between two points ``u`` and ``v`` is

            .. math::

              d(u,v) = \\sum_i \\frac{|u_i-v_i|}
                                   {|u_i|+|v_i|}


        13. ``Y = pdist(X, 'braycurtis')``

            Computes the Bray-Curtis distance between the points. The
            Bray-Curtis distance between two points ``u`` and ``v`` is


            .. math::

                 d(u,v) = \\frac{\\sum_i {|u_i-v_i|}}
                                {\\sum_i {|u_i+v_i|}}

        14. ``Y = pdist(X, 'mahalanobis', VI=None)``

            Computes the Mahalanobis distance between the points. The
            Mahalanobis distance between two points ``u`` and ``v`` is
            :math:`\\sqrt{(u-v)(1/V)(u-v)^T}` where :math:`(1/V)` (the
``VI``
            variable) is the inverse covariance. If ``VI`` is not None,
            ``VI`` will be used as the inverse covariance matrix.

        15. ``Y = pdist(X, 'yule')``

            Computes the Yule distance between each pair of boolean
            vectors. (see yule function documentation)

        16. ``Y = pdist(X, 'matching')``

            Synonym for 'hamming'.

        17. ``Y = pdist(X, 'dice')``

            Computes the Dice distance between each pair of boolean
            vectors. (see dice function documentation)

        18. ``Y = pdist(X, 'kulsinski')``

            Computes the Kulsinski distance between each pair of
            boolean vectors. (see kulsinski function documentation)

        19. ``Y = pdist(X, 'rogerstanimoto')``

            Computes the Rogers-Tanimoto distance between each pair of
            boolean vectors. (see rogerstanimoto function documentation)

        20. ``Y = pdist(X, 'russellrao')``

            Computes the Russell-Rao distance between each pair of
            boolean vectors. (see russellrao function documentation)

        21. ``Y = pdist(X, 'sokalmichener')``

            Computes the Sokal-Michener distance between each pair of
            boolean vectors. (see sokalmichener function documentation)

        22. ``Y = pdist(X, 'sokalsneath')``

            Computes the Sokal-Sneath distance between each pair of
            boolean vectors. (see sokalsneath function documentation)

        23. ``Y = pdist(X, 'wminkowski', p=2, w=w)``

            Computes the weighted Minkowski distance between each pair of
            vectors. (see wminkowski function documentation)

            'wminkowski' is deprecated and will be removed in SciPy 1.8.0.
            Use 'minkowski' instead.

        24. ``Y = pdist(X, f)``

            Computes the distance between all pairs of vectors in X
            using the user supplied 2-arity function f. For example,
            Euclidean distance between the vectors could be computed
            as follows::

              dm = pdist(X, lambda u, v: np.sqrt(((u-v)**2).sum()))

            Note that you should avoid passing a reference to one of
            the distance functions defined in this library. For example,::

              dm = pdist(X, sokalsneath)

            would calculate the pair-wise distances between the vectors in
            X using the Python function sokalsneath. This would result in
            sokalsneath being called :math:`{n \\choose 2}` times, which
            is inefficient. Instead, the optimized C version is more
            efficient, and we call it using the following syntax.::

              dm = pdist(X, 'sokalsneath')

        """
        # You can also call this as:
        #     Y = pdist(X, 'test_abc')
        # where 'abc' is the metric being tested.  This computes the
distance
        # between all pairs of vectors in X using the distance metric
'abc' but
        # with a more succinct, verifiable, but less efficient
implementation.

        X = _asarray_validated(X, sparse_ok=False, objects_ok=True,
mask_ok=True,
                               check_finite=False)

        s = X.shape
        if len(s) != 2:
            raise ValueError('A 2-dimensional array must be passed.')

        m, n = s

        if callable(metric):
            mstr = getattr(metric, '__name__', 'UnknownCustomMetric')
            metric_info = _METRIC_ALIAS.get(mstr, None)

            if metric_info is not None:
                X, typ, kwargs = _validate_pdist_input(
                    X, m, n, metric_info, **kwargs)

            return _pdist_callable(X, metric=metric, out=out, **kwargs)
        elif isinstance(metric, str):
            mstr = metric.lower()
            metric_info = _METRIC_ALIAS.get(mstr, None)

            if metric_info is not None:
                pdist_fn = metric_info.pdist_func
>               return pdist_fn(X, out=out, **kwargs)
E               ValueError: Unsupported dtype object

/usr/lib/python3/dist-packages/scipy/spatial/distance.py:2250: ValueError
Attachment: OpenPGP_signature
Description: OpenPGP digital signature
Reply to:
Prev by Date: Bug#992676: scipy breaks python-skbio autopkgtest: Unsupported dtype object
Next by Date: Re: First google result for "autopkgtest" is an outdated people.d.o page
Previous by thread: Bug#992676: scipy breaks python-skbio autopkgtest: Unsupported dtype object
Next by thread: Bug#992798: initramfs-tools: please drop shellcheck autopkgtest
Index(es):
- Date
- Thread