home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1858211666

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1858211666 I_kwDOAMm_X85uwg9S 8092 xarray/tests/test_dask.py::TestToDaskDataFrame::test_to_dask_dataframe_* test failures when dask+pyarrow are installed 110765 open 0     0 2023-08-20T17:45:57Z 2023-08-20T17:45:57Z   CONTRIBUTOR      

What happened?

When running the test suite in an environment where both dask and pyarrow are installed, two tests fail (log below):

FAILED xarray/tests/test_dask.py::TestToDaskDataFrame::test_to_dask_dataframe_2D - AssertionError: Attributes of DataFrame.iloc[:, 1] (column name="y") are different FAILED xarray/tests/test_dask.py::TestToDaskDataFrame::test_to_dask_dataframe_not_daskarray - AssertionError: DataFrame.index are different

What did you expect to happen?

Tests passing ;-).

Minimal Complete Verifiable Example

Python pip install . pytest 'dask[complete]' python -m pytest xarray/tests/test_dask.py

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python ______ TestToDaskDataFrame.test_to_dask_dataframe_2D ______

self = <xarray.tests.test_dask.TestToDaskDataFrame object at 0x7f59a87bce50>

def test_to_dask_dataframe_2D(self):
    # Test if 2-D dataset is supplied
    w = np.random.randn(2, 3)
    ds = Dataset({"w": (("x", "y"), da.from_array(w, chunks=(1, 2)))})
    ds["x"] = ("x", np.array([0, 1], np.int64))
    ds["y"] = ("y", list("abc"))

    # dask dataframes do not (yet) support multiindex,
    # but when it does, this would be the expected index:
    exp_index = pd.MultiIndex.from_arrays(
        [[0, 0, 0, 1, 1, 1], ["a", "b", "c", "a", "b", "c"]], names=["x", "y"]
    )
    expected = pd.DataFrame({"w": w.reshape(-1)}, index=exp_index)
    # so for now, reset the index
    expected = expected.reset_index(drop=False)
    actual = ds.to_dask_dataframe(set_index=False)

    assert isinstance(actual, dd.DataFrame)
  assert_frame_equal(expected, actual.compute())

E AssertionError: Attributes of DataFrame.iloc[:, 1] (column name="y") are different E
E Attribute "dtype" are different E [left]: object E [right]: string[pyarrow]

/tmp/xarray/xarray/tests/test_dask.py:822: AssertionError _____ TestToDaskDataFrame.testto_dask_dataframe_not_daskarray _______

self = <xarray.tests.test_dask.TestToDaskDataFrame object at 0x7f59a87d5750>

def test_to_dask_dataframe_not_daskarray(self):
    # Test if DataArray is not a dask array
    x = np.random.randn(10)
    y = np.arange(10, dtype="uint8")
    t = list("abcdefghij")

    ds = Dataset({"a": ("t", x), "b": ("t", y), "t": ("t", t)})

    expected = pd.DataFrame({"a": x, "b": y}, index=pd.Index(t, name="t"))

    actual = ds.to_dask_dataframe(set_index=True)
    assert isinstance(actual, dd.DataFrame)
  assert_frame_equal(expected, actual.compute())

/tmp/xarray/xarray/tests/test_dask.py:867:


left = Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'], dtype='object', name='t') right = Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'], dtype='string', name='t'), obj = 'DataFrame.index'

def _check_types(left, right, obj: str = "Index") -> None:
    if not exact:
        return

    assert_class_equal(left, right, exact=exact, obj=obj)
    assert_attr_equal("inferred_type", left, right, obj=obj)

    # Skip exact dtype checking when `check_categorical` is False
    if is_categorical_dtype(left.dtype) and is_categorical_dtype(right.dtype):
        if check_categorical:
            assert_attr_equal("dtype", left, right, obj=obj)
            assert_index_equal(left.categories, right.categories, exact=exact)
        return
  assert_attr_equal("dtype", left, right, obj=obj)

E AssertionError: DataFrame.index are different E
E Attribute "dtype" are different E [left]: object E [right]: string[pyarrow]

/tmp/xarray/.venv/lib/python3.11/site-packages/pandas/_testing/asserters.py:250: AssertionError ```

Anything else we need to know?

No response

Environment

``` INSTALLED VERSIONS ------------------ commit: 83c2919b27b4b2d8a01bfa380226134c71321aa0 python: 3.11.4 (main, Jun 8 2023, 06:01:19) [GCC 13.1.1 20230527] python-bits: 64 OS: Linux OS-release: 6.4.7-gentoo-dist machine: x86_64 processor: AMD Ryzen 5 3600 6-Core Processor byteorder: little LC_ALL: None LANG: pl_PL.UTF-8 LOCALE: ('pl_PL', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2023.8.0 pandas: 2.0.3 numpy: 1.25.2 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.8.1 distributed: 2023.8.1 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2023.6.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.1.2 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: None IPython: None sphinx: None ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8092/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 0 rows from issue in issue_comments
Powered by Datasette · Queries took 0.709ms · About: xarray-datasette