home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

24 rows where state = "open" and user = 5635139 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date)

type 2

  • issue 21
  • pull 3

state 1

  • open · 24 ✖

repo 1

  • xarray 24
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1920361792 PR_kwDOAMm_X85bl988 8258 Add a `.drop_attrs` method max-sixty 5635139 open 0     9 2023-09-30T18:42:12Z 2024-02-09T18:49:22Z   MEMBER   0 pydata/xarray/pulls/8258

Part of #3891

~Do we think this is a good idea? I'll add docs & tests if so...~

Ready to go, just needs agreement on whether it's good

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8258/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1916677049 I_kwDOAMm_X85yPiu5 8245 Tools for writing distributed zarrs max-sixty 5635139 open 0     0 2023-09-28T04:25:45Z 2024-01-04T00:15:09Z   MEMBER      

What is your issue?

There seems to be a common pattern for writing zarrs from a distributed set of machines, in parallel. It's somewhat described in the prose of the io docs. Quoting:

  • Creating the template — "the first step is creating an initial Zarr store without writing all of its array data. This can be done by first creating a Dataset with dummy values stored in dask, and then calling to_zarr with compute=False to write only metadata to Zarr"
  • Writing out each region from workers — "a Zarr store with the correct variable shapes and attributes exists that can be filled out by subsequent calls to to_zarr. The region provides a mapping from dimension names to Python slice objects indicating where the data should be written (in index space, not coordinate space)"

I've been using this fairly successfully recently. It's much better than writing hundreds or thousands of data variables, since many small data variables create a huge number of files.

Are there some tools we can provide to make this easier? Some ideas: - [ ] compute=False is arguably a less-than-obvious kwarg meaning "write metadata". Maybe this should be a method, maybe it's a candidate for renaming? Or maybe make_template can be an abstraction over it. Something like xarray_beam.make_template to make the template from a Dataset? - Or from an array of indexes? - https://github.com/pydata/xarray/issues/8343 - https://github.com/pydata/xarray/pull/8460 - [ ] What happens if one worker's data isn't aligned on some dimensions? Will that write to the wrong location? Could we offer an option, similar to the above, to reindex on the template dimensions?

  • [ ] When writing a region, we need to drop other vars. Can we offer this as a kwarg? Occasionally I'll add a dimension with an index to a dataset, run the function to write it — and it'll fail, because I forgot to add that index to the .drop_vars call that precedes the write. When we're writing a template, all the indexes are written up front anyway. (edit: #6260)
    • https://github.com/pydata/xarray/pull/8460

More minor papercuts: - [ ] I've hit an issue where writing a region seemed to cause the worker to attempt to load the whole array into memory — can we offer guarantees for when (non-metadata) data will be loaded during to_zarr? - [ ] How about adding raise_if_dask_computes to our public API? The alternative I've been doing is watching htop and existing if I see memory ballooning, which is less cerebral... - [ ] It doesn't seem easy to write coords on a DataArray. For example, writing xr.tutorial.load_dataset('air_temperature').assign_coords(lat2=da.lat + 2, a=(('lon',), ['a'] * len(da.lon))).chunk().to_zarr('foo.zarr', compute=False) will cause the non-index coords to be written as empty. But writing them separately conflicts with having a single variable. Currently I manually load each coord before writing, which is not super-friendly.

Some things that were in the list here, as they've been completed!! - [x] Requiring region to be specified as an int range can be inconvenient — would it feasible to have a function that grabs the template metadata, calculates the region ints, and then calculates the implied indexes? - Edit: suggested at https://github.com/pydata/xarray/issues/7702

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8245/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
    xarray 13221727 issue
2052840951 I_kwDOAMm_X856W933 8566 Use `ddof=1` for `std` & `var` max-sixty 5635139 open 0     2 2023-12-21T17:47:21Z 2023-12-27T16:58:46Z   MEMBER      

What is your issue?

I've discussed this a bunch with @dcherian (though I'm not sure he necessarily agrees, I'll let him comment)

Currently xarray uses ddof=0 for std & var. This is: - Rarely what someone actually wants — xarray data is almost always a sample of some underlying distribution, for which ddof=1 is correct - Inconsistent with pandas

OTOH: - It is consistent with numpy - It wouldn't be a painless change — folks who don't read deprecation messages would see values change very slightly

Any thoughts?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8566/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
988158051 MDU6SXNzdWU5ODgxNTgwNTE= 5764 Implement __sizeof__ on objects? max-sixty 5635139 open 0     6 2021-09-03T23:36:53Z 2023-12-19T18:23:08Z   MEMBER      

Is your feature request related to a problem? Please describe. Currently ds.nbytes returns the size of the data.

But sys.getsizeof(ds) returns a very small number.

Describe the solution you'd like If we implement __sizeof__ on DataArrays & Datasets, this would work.

I think that would be something like ds.nbytes + the size of the ds container, + maybe attrs if those aren't handled by .nbytes?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5764/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  reopened xarray 13221727 issue
2000154383 PR_kwDOAMm_X85fzju6 8466 Move Sphinx directives out of `See also` max-sixty 5635139 open 0     2 2023-11-18T01:57:17Z 2023-11-21T18:25:05Z   MEMBER   0 pydata/xarray/pulls/8466

This is potentially causing the See also to not render the links? (Does anyone know this better? It doesn't seem easy to build the docs locally...)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8466/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1995308522 I_kwDOAMm_X8527f3q 8454 Formalize `mode` / safety guarantees for Zarr max-sixty 5635139 open 0     1 2023-11-15T18:28:38Z 2023-11-15T20:38:04Z   MEMBER      

What is your issue?

It sounds like we're coalescing on when it's safe to write concurrently: - mode="r+" is safe to write concurrently to different parts of a dataset - mode="a" isn't safe, because it changes the shape of an array, for example extending a dimension

What are the existing operations that aren't consistent with this? - Is concurrently writing additional variables safe? Or it requires updating the centralized consolidated metadata? Currently that requires mode="a", which is overly conservative based on the above rules assuming it is safe — we can liberalize to allow with mode="r+". - https://github.com/pydata/xarray/issues/8371, ~but that's a bug~ — edit: or possibly an artifact of writing concurrently to overlapping chunks with a single to_zarr call. We could at least restrict non-aligned writes to mode="a", so it wasn't possible to hit this mistakenly while writing to different parts of a dataset. - Writing the same values to the same chunks concurrently isn't safe at the moment — we'll get an "Stale file handle" error if two processes write to the same location at the same time. I'm not sure if that's possible to allow; possibly it requires work on the Zarr side. If it were possible, we wouldn't have to be as careful about ensuring that each process has mutually exclusive chunks to write. (lower priority)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8454/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1953001043 I_kwDOAMm_X850aG5T 8343 Add `metadata_only` param to `.to_zarr`? max-sixty 5635139 open 0     17 2023-10-19T20:25:11Z 2023-11-15T05:22:12Z   MEMBER      

Is your feature request related to a problem?

A leaf from https://github.com/pydata/xarray/issues/8245, which has a bullet:

compute=False is arguably a less-than-obvious kwarg meaning "write metadata". Maybe this should be a method, maybe it's a candidate for renaming? Or maybe make_template can be an abstraction over it

I've also noticed that for large arrays, running compute=False can take several minutes, despite the indexes being very small. I think this is because it's building a dask task graph — which is then discarded, since the array is written from different machines with the region pattern.

Describe the solution you'd like

Would introducing a metadata_only parameter to to_zarr help here: - Better name - No dask graph

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8343/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1986643906 I_kwDOAMm_X852acfC 8437 Restrict pint test runs max-sixty 5635139 open 0     10 2023-11-10T00:50:52Z 2023-11-13T21:57:45Z   MEMBER      

What is your issue?

Pint tests are failing on main — https://github.com/pydata/xarray/actions/runs/6817674274/job/18541677930

E TypeError: no implementation found for 'numpy.min' on types that implement __array_function__: [<class 'pint.util.Quantity'>]

If we can't fix soon, should we disable?

CC @keewis

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8437/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
874039546 MDU6SXNzdWU4NzQwMzk1NDY= 5246 test_save_mfdataset_compute_false_roundtrip fails max-sixty 5635139 open 0     1 2021-05-02T20:41:48Z 2023-11-02T04:38:05Z   MEMBER      

What happened:

test_save_mfdataset_compute_false_roundtrip consistently fails in windows-latest-3.9, e.g. https://github.com/pydata/xarray/pull/5244/checks?check_run_id=2485202784

Here's the traceback:

```python self = <xarray.tests.test_backends.TestDask object at 0x000001FF45A9B640>

def test_save_mfdataset_compute_false_roundtrip(self):
    from dask.delayed import Delayed

    original = Dataset({"foo": ("x", np.random.randn(10))}).chunk()
    datasets = [original.isel(x=slice(5)), original.isel(x=slice(5, 10))]
    with create_tmp_file(allow_cleanup_failure=ON_WINDOWS) as tmp1:
        with create_tmp_file(allow_cleanup_failure=ON_WINDOWS) as tmp2:
            delayed_obj = save_mfdataset(
                datasets, [tmp1, tmp2], engine=self.engine, compute=False
            )
            assert isinstance(delayed_obj, Delayed)
            delayed_obj.compute()
            with open_mfdataset(
                [tmp1, tmp2], combine="nested", concat_dim="x"
            ) as actual:
              assert_identical(actual, original)

E AssertionError: Left and right Dataset objects are not identical E
E
E Differing data variables: E L foo (x) float64 dask.array<chunksize=(5,), meta=np.ndarray> E R foo (x) float64 dask.array<chunksize=(10,), meta=np.ndarray> ```

Anything else we need to know?:

xfailed in https://github.com/pydata/xarray/pull/5245

Environment:

[Eliding since it's the test env]

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5246/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1923431725 I_kwDOAMm_X85ypT0t 8264 Improve error messages max-sixty 5635139 open 0     4 2023-10-03T06:42:57Z 2023-10-24T18:40:04Z   MEMBER      

Is your feature request related to a problem?

Coming back to xarray, and using it based on what I remember from a year ago or so, means I make lots of mistakes. I've also been using it outside of a repl, where error messages are more important, given I can't explore a dataset inline.

Some of the error messages could be much more helpful. Take one example:

xarray.core.merge.MergeError: conflicting values for variable 'date' on objects to be combined. You can skip this check by specifying compat='override'.

The second sentence is nice. But the first could be give us much more information: - Which variables conflict? I'm merging four objects, so would be so helpful to know which are causing the issue. - What is the conflict? Is one a superset and I can join=...? Are they off by 1 or are they completely different types? - Our testing.assert_equal produces pretty nice errors, as a comparison

Having these good is really useful, lets folks stay in the flow while they're working, and it signals that we're a well-built, refined library.

Describe the solution you'd like

I'm not sure the best way to surface the issues — error messages make for less legible contributions than features or bug fixes, and the primary audience for good error messages is often the opposite of those actively developing the library. They're also more difficult to manage as GH issues — there could be scores of marginal issues which would often be out of date.

One thing we do in PRQL is have a file that snapshots error messages test_bad_error_messages.rs, which can then be a nice contribution to change those from bad to good. I'm not sure whether that would work here (python doesn't seem to have a great snapshotter, pytest-regtest is the best I've found; I wrote pytest-accept but requires doctests).

Any other ideas?

Describe alternatives you've considered

No response

Additional context

A couple of specific error-message issues: - https://github.com/pydata/xarray/issues/2078 - https://github.com/pydata/xarray/issues/5290

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8264/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1216647336 PR_kwDOAMm_X8421oXV 6521 Move license from readme to LICENSE max-sixty 5635139 open 0     3 2022-04-27T00:59:03Z 2023-10-01T09:31:37Z   MEMBER   0 pydata/xarray/pulls/6521  
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6521/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1918061661 I_kwDOAMm_X85yU0xd 8251 `.chunk()` doesn't create chunks on 0 dim arrays max-sixty 5635139 open 0     0 2023-09-28T18:30:50Z 2023-09-30T21:31:05Z   MEMBER      

What happened?

.chunk's docstring states:

``` """Coerce this array's data into a dask arrays with the given chunks.

    If this variable is a non-dask array, it will be converted to dask
    array. If it's a dask array, it will be rechunked to the given chunk
    sizes.

```

...but this doesn't happen for 0 dim arrays; example below.

For context, as part of #8245, I had a function that creates a template array. It created an empty DataArray, then expanded dims for each dimension. And it kept blowing up memory! ...until I realized that it was actually not a lazy array.

What did you expect to happen?

It may be that we can't have a 0-dim dask array — but then we should raise in this method, rather than return the wrong thing.

Minimal Complete Verifiable Example

```Python [ins] In [1]: type(xr.DataArray().chunk().data) Out[1]: numpy.ndarray

[ins] In [2]: type(xr.DataArray(1).chunk().data) Out[2]: numpy.ndarray

[ins] In [3]: type(xr.DataArray([1]).chunk().data) Out[3]: dask.array.core.Array ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: 0d6cd2a39f61128e023628c4352f653537585a12 python: 3.9.18 (main, Aug 24 2023, 21:19:58) [Clang 14.0.3 (clang-1403.0.22.14.1)] python-bits: 64 OS: Darwin OS-release: 22.6.0 machine: arm64 processor: arm byteorder: little LC_ALL: en_US.UTF-8 LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2023.8.1.dev25+g8215911a.d20230914 pandas: 2.1.1 numpy: 1.25.2 scipy: 1.11.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.0 cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.4.0 distributed: 2023.7.1 matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: 0.2.3.dev30+gd26e29e fsspec: 2021.11.1 cupy: None pint: None sparse: None flox: 0.7.2 numpy_groupies: 0.9.19 setuptools: 68.1.2 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: 1.5.1 IPython: 8.15.0 sphinx: 4.3.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8251/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1917820711 I_kwDOAMm_X85yT58n 8248 `write_empty_chunks` not in `DataArray.to_zarr` max-sixty 5635139 open 0     0 2023-09-28T15:48:22Z 2023-09-28T15:49:35Z   MEMBER      

What is your issue?

Our to_zarr methods on DataArray & Dataset are slightly inconsistent — Dataset.to_zarr has write_empty_chunks and chunkmanager_store_kwargs. They're also in a different order.


Up a level — not sure of the best way of enforcing consistency here; a couple of ideas. - We could have tests that operate on both a DataArray and Dataset, parameterized by fixtures (might also help reduce the duplication in some of our tests), though we then need to make the tests generic. We could have some general tests which just test that methods work, and then delegate to the current per-object tests for finer guarantees. - We could have a tool which collects the differences between DataArray & Dataset methods and snapshots them — then we'll see if they diverge, while allowing for some divergences.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8248/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
587895591 MDU6SXNzdWU1ODc4OTU1OTE= 3891 Keep attrs by default? (keep_attrs) max-sixty 5635139 open 0     14 2020-03-25T18:17:35Z 2023-09-22T02:27:50Z   MEMBER      

I've held this view in low confidence for a while and wanted to socialize it to see whether there's something to it: Should we keep attrs in operations by default?

Advantages: - I think most of the time people want to keep attrs after operations - Is that right? Are there cases where it wouldn't be a reasonable default? e.g. good points here for not always keeping coords around - It's easy to remove them with a (currently unimplemented) drop_attrs method when people do want to remove them

Disadvantages: - Backward incompatible change with an expensive deprecate cycle (would be impractical to have a deprecation warning every time someone ran a function on an object with attrs I think? At least without adding a once filter warning) - ?

Here are some existing relevant discussions: - https://github.com/pydata/xarray/issues/3815#issuecomment-603974527 - https://github.com/pydata/xarray/issues/688 - https://github.com/pydata/xarray/pull/2482 - https://github.com/pydata/xarray/issues/3304

I think this is an easy situation to get into: - We make an incorrect-but-insignificant design decision; e.g. some methods don't keep attrs - We want to change that, but avoid breaking backward-compatibility - So we add kwargs and eventually a global config - But now we have a global config that requires global context and lots of kwargs! :(

I'm up for leaning towards breaking changes if it makes the library better: I think xarray will grow immensely, and so the narrow immediate pain is worth the broader future positive impact. Clearly if the immediate pain stops xarray growing, then it's not a good tradeoff.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3891/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1905824568 I_kwDOAMm_X85xmJM4 8221 Frequent doc build timeout / OOM max-sixty 5635139 open 0     4 2023-09-20T23:02:37Z 2023-09-21T03:50:07Z   MEMBER      

What is your issue?

I'm frequently seeing Command killed due to timeout or excessive memory consumption in the doc build.

It's after 1552 seconds, so it not being a round number means it might be the memory?

It follows writing output... [ 90%] generated/xarray.core.rolling.DatasetRolling.max, which I wouldn't have thought as a particularly memory-intensive part of the build?

Here's an example: https://readthedocs.org/projects/xray/builds/21983708/

Any thoughts for what might be going on?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8221/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1890982762 I_kwDOAMm_X85wthtq 8173 HTML repr with many data vars max-sixty 5635139 open 0     1 2023-09-11T17:49:32Z 2023-09-11T20:38:01Z   MEMBER      

What is your issue?

I've been working with Datasets with 1000+ data vars. The HTML repr is extremely slow.

My current solution is to change the config to use the text at the top of the notebook, and then kick myself & restart when I forget.

Would folks be OK with us falling back to the text repr automatically for, say, >100 data vars?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8173/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1874148181 I_kwDOAMm_X85vtTtV 8123 `.rolling_exp` arguments could be clearer max-sixty 5635139 open 0     6 2023-08-30T18:09:04Z 2023-09-01T00:25:08Z   MEMBER      

Is your feature request related to a problem?

Currently we call .rolling_exp like:

da.rolling_exp(date=20).mean()

20 refers to a "standard" window type — broadly "the same average distance as a simple rolling window. That works well, and matches the .rolling(date=20).mean() format.

But we also have different window types, and this makes it a bit incongruent:

da.rolling_exp(date=0.5, window_type="alpha").mean()

...since the window_type is completely changing the meaning of the value we pass to the dimension argument. A bit like someone asking "how many apples would you like to buy", and replying "5", and then separately saying "when I said 5, I meant 5 tonnes".

Describe the solution you'd like

One option would be:

.rolling_exp(dptr={"alpha": 0.5})

We pass a dict if we want a non-standard window type — so the value is attached to its type.

We could still have the original form for da.rolling_exp(date=20).mean().

Describe alternatives you've considered

No response

Additional context

(I realize I wrote this originally, all criticism directed at me! This is based on feedback from a colleague, which on reflection I agree with.)

Unless anyone disagrees, I'll try and do this soon-ish™

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8123/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1410336255 I_kwDOAMm_X85UEAX_ 7164 Error on xarray warnings in tests? max-sixty 5635139 open 0     5 2022-10-16T01:09:27Z 2022-10-18T09:51:20Z   MEMBER      

What is your issue?

We've done a superb job of cutting the number of warnings in https://github.com/pydata/xarray/issues/3266.

On another project I've been spending time with recently, we raise an error on any warnings in the test suite. It's easy mode — the dependencies are locked (it's not python...), but I wonder whether we can do something some of the way with this:

Would it be worth failing on: - Warnings from within xarray - There's no chance of an external change causing main to fail. When we deprecate something, we'd update calling code with it. - This would also ensure doctests & docs don't use old versions. Currently doctests have some warnings. - Warnings from the min-versions test - It prevents us from using outdated APIs - min-versions are fixed dependencies, so also no chance of an external change causing main to fail - It would fail in a more deliberate way than the upstream tests do now - OTOH, possibly it would discourage us from bumping those min versions — the burden falls on the bumper — already a generous PR! - ...and it's not perfectly matched — really we want to update from an old API before it changes in the new version, not before it becomes deprecated in an old version

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7164/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
485446209 MDU6SXNzdWU0ODU0NDYyMDk= 3266 Warnings in the test suite max-sixty 5635139 open 0     8 2019-08-26T20:52:34Z 2022-07-16T14:14:00Z   MEMBER      

If anyone is looking for any bite-size contributions, the test suite is throwing off many warnings. Most of these indicate that something will break in the future without code changes; thought mostly the code changes are small.

```

=============================== warnings summary =============================== /usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/heapdict.py:11 /usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/heapdict.py:11: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working class heapdict(collections.MutableMapping):

/usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/pydap/model.py:175 /usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/pydap/model.py:175: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import OrderedDict, Mapping

/usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/pydap/responses/das.py:14 /usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/pydap/responses/das.py:14: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import Iterable

xarray/tests/test_accessor_dt.py::test_cftime_strftime_access[365_day] /home/vsts/work/1/s/xarray/tests/test_accessor_dt.py:226: RuntimeWarning: Converting a CFTimeIndex with dates from a non-standard calendar, 'noleap', to a pandas.DatetimeIndex, which uses dates from the standard calendar. This may lead to subtle errors in operations that depend on the length of time between dates. xr.coding.cftimeindex.CFTimeIndex(data.time.values).to_datetimeindex(),

xarray/tests/test_accessor_dt.py::test_cftime_strftime_access[360_day] /home/vsts/work/1/s/xarray/tests/test_accessor_dt.py:226: RuntimeWarning: Converting a CFTimeIndex with dates from a non-standard calendar, '360_day', to a pandas.DatetimeIndex, which uses dates from the standard calendar. This may lead to subtle errors in operations that depend on the length of time between dates. xr.coding.cftimeindex.CFTimeIndex(data.time.values).to_datetimeindex(),

xarray/tests/test_accessor_dt.py::test_cftime_strftime_access[julian] /home/vsts/work/1/s/xarray/tests/test_accessor_dt.py:226: RuntimeWarning: Converting a CFTimeIndex with dates from a non-standard calendar, 'julian', to a pandas.DatetimeIndex, which uses dates from the standard calendar. This may lead to subtle errors in operations that depend on the length of time between dates. xr.coding.cftimeindex.CFTimeIndex(data.time.values).to_datetimeindex(),

xarray/tests/test_accessor_dt.py::test_cftime_strftime_access[all_leap] xarray/tests/test_accessor_dt.py::test_cftime_strftime_access[366_day] /home/vsts/work/1/s/xarray/tests/test_accessor_dt.py:226: RuntimeWarning: Converting a CFTimeIndex with dates from a non-standard calendar, 'all_leap', to a pandas.DatetimeIndex, which uses dates from the standard calendar. This may lead to subtle errors in operations that depend on the length of time between dates. xr.coding.cftimeindex.CFTimeIndex(data.time.values).to_datetimeindex(),

xarray/tests/test_accessor_str.py::test_empty_str_methods xarray/tests/test_accessor_str.py::test_empty_str_methods xarray/tests/test_accessor_str.py::test_empty_str_methods xarray/tests/test_accessor_str.py::test_empty_str_methods xarray/tests/test_accessor_str.py::test_empty_str_methods xarray/tests/test_accessor_str.py::test_empty_str_methods xarray/tests/test_accessor_str.py::test_empty_str_methods xarray/tests/test_accessor_str.py::test_empty_str_methods xarray/tests/test_accessor_str.py::test_empty_str_methods /home/vsts/work/1/s/xarray/core/duck_array_ops.py:202: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison flag_array = (arr1 == arr2) | (isnull(arr1) & isnull(arr2))

xarray/tests/test_backends.py::TestZarrDictStore::test_to_zarr_append_compute_false_roundtrip xarray/tests/test_backends.py::TestZarrDictStore::test_to_zarr_append_compute_false_roundtrip xarray/tests/test_backends.py::TestZarrDirectoryStore::test_to_zarr_append_compute_false_roundtrip xarray/tests/test_backends.py::TestZarrDirectoryStore::test_to_zarr_append_compute_false_roundtrip /home/vsts/work/1/s/xarray/conventions.py:184: SerializationWarning: variable None has data in the form of a dask array with dtype=object, which means it is being loaded into memory to determine a data type that can be safely stored on disk. To avoid this, coerce this variable to a fixed-size dtype with astype() before saving it. SerializationWarning,

xarray/tests/test_backends.py::TestScipyInMemoryData::test_zero_dimensional_variable /usr/share/miniconda/envs/xarray-tests/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject return f(args, *kwds)

xarray/tests/test_backends.py::TestPseudoNetCDFFormat::test_ict_format xarray/tests/test_backends.py::TestPseudoNetCDFFormat::test_ict_format_write xarray/tests/test_backends.py::TestPseudoNetCDFFormat::test_ict_format_write /usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/PseudoNetCDF/icarttfiles/ffi1001.py:80: DeprecationWarning: 'U' mode is deprecated f = openf(path, 'rU', encoding = encoding)

xarray/tests/test_backends.py::TestPseudoNetCDFFormat::test_ict_format xarray/tests/test_backends.py::TestPseudoNetCDFFormat::test_ict_format_write /usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/_pytest/python.py:170: RuntimeWarning: deallocating CachingFileManager(<function pncopen at 0x7f252e49a6a8>, '/home/vsts/work/1/s/xarray/tests/data/example.ict', kwargs={'format': 'ffi1001'}), but file is not already closed. This may indicate a bug. result = testfunction(**testargs)

xarray/tests/test_backends.py::TestPseudoNetCDFFormat::test_uamiv_format_read xarray/tests/test_backends.py::TestPseudoNetCDFFormat::test_uamiv_format_mfread xarray/tests/test_backends.py::TestPseudoNetCDFFormat::test_uamiv_format_write xarray/tests/test_backends.py::TestPseudoNetCDFFormat::test_uamiv_format_write /usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/PseudoNetCDF/camxfiles/uamiv/Memmap.py:141: UserWarning: UnboundLocalError("local variable 'dims' referenced before assignment") warn(repr(e))

xarray/tests/test_backends.py::TestPseudoNetCDFFormat::test_uamiv_format_mfread /home/vsts/work/1/s/xarray/tests/test_backends.py:103: FutureWarning: In xarray version 0.13 the default behaviour of open_mfdataset will change. To retain the existing behavior, pass combine='nested'. To use future default behavior, pass combine='by_coords'. See http://xarray.pydata.org/en/stable/combining.html#combining-multi

**kwargs

xarray/tests/test_backends.py::TestPseudoNetCDFFormat::test_uamiv_format_mfread /home/vsts/work/1/s/xarray/backends/api.py:931: FutureWarning: Also open_mfdataset will no longer accept a concat_dim argument. To get equivalent behaviour from now on please use the new combine_nested function instead (or the combine='nested' option to open_mfdataset).The datasets supplied do not have global dimension coordinates. In future, to continue concatenating without supplying dimension coordinates, please use the new combine_nested function (or the combine='nested' option to open_mfdataset. from_openmfds=True,

xarray/tests/test_coding_times.py::test_cf_datetime_nan[num_dates1-days since 2000-01-01-expected_list1] xarray/tests/test_coding_times.py::test_cf_datetime_nan[num_dates2-days since 2000-01-01-expected_list2] /usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/numpy/testing/_private/utils.py:913: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'. To accept the future behavior, pass 'dtype=object'. To keep the old behavior, pass 'dtype="datetime64[ns]"'. verbose=verbose, header='Arrays are not equal')

xarray/tests/test_dataarray.py::TestDataArray::test_drop_index_labels xarray/tests/test_dataarray.py::TestDataArray::test_drop_index_labels xarray/tests/test_dataarray.py::TestDataArray::test_drop_index_labels /home/vsts/work/1/s/xarray/core/dataarray.py:1842: DeprecationWarning: dropping dimensions using list-like labels is deprecated; use dict-like arguments. ds = self._to_temp_dataset().drop(labels, dim, errors=errors)

xarray/tests/test_dataset.py::TestDataset::test_drop_index_labels /home/vsts/work/1/s/xarray/tests/test_dataset.py:2066: DeprecationWarning: dropping dimensions using list-like labels is deprecated; use dict-like arguments. actual = data.drop(["a"], "x")

xarray/tests/test_dataset.py::TestDataset::test_drop_index_labels /home/vsts/work/1/s/xarray/tests/test_dataset.py:2070: DeprecationWarning: dropping dimensions using list-like labels is deprecated; use dict-like arguments. actual = data.drop(["a", "b"], "x")

xarray/tests/test_dataset.py::TestDataset::test_drop_index_labels /home/vsts/work/1/s/xarray/tests/test_dataset.py:2078: DeprecationWarning: dropping dimensions using list-like labels is deprecated; use dict-like arguments. data.drop(["c"], dim="x")

xarray/tests/test_dataset.py::TestDataset::test_drop_index_labels /home/vsts/work/1/s/xarray/tests/test_dataset.py:2080: DeprecationWarning: dropping dimensions using list-like labels is deprecated; use dict-like arguments. actual = data.drop(["c"], dim="x", errors="ignore")

xarray/tests/test_dataset.py::TestDataset::test_drop_index_labels /home/vsts/work/1/s/xarray/tests/test_dataset.py:2086: DeprecationWarning: dropping dimensions using list-like labels is deprecated; use dict-like arguments. actual = data.drop(["a", "b", "c"], "x", errors="ignore")

xarray/tests/test_dataset.py::TestDataset::test_drop_labels_by_keyword /home/vsts/work/1/s/xarray/tests/test_dataset.py:2135: DeprecationWarning: dropping dimensions using list-like labels is deprecated; use dict-like arguments. data.drop(labels=["a"], dim="x", x="a")

xarray/tests/test_dataset.py::TestDataset::test_convert_dataframe_with_many_types_and_multiindex /home/vsts/work/1/s/xarray/core/dataset.py:3959: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'. To accept the future behavior, pass 'dtype=object'. To keep the old behavior, pass 'dtype="datetime64[ns]"'. data = np.asarray(series).reshape(shape)

xarray/tests/test_dataset.py::TestDataset::test_convert_dataframe_with_many_types_and_multiindex /usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/pandas/core/apply.py:321: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'. To accept the future behavior, pass 'dtype=object'. To keep the old behavior, pass 'dtype="datetime64[ns]"'. results[i] = self.f(v)

xarray/tests/test_distributed.py::test_dask_distributed_cfgrib_integration_test /usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/tornado/gen.py:772: RuntimeWarning: deallocating CachingFileManager(<function open at 0x7f2527b49bf8>, '/tmp/tmpt4tmnjh3/temp-2044.tif', mode='r', kwargs={}), but file is not already closed. This may indicate a bug. self.future = convert_yielded(yielded)

xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-False-False-float-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-False-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-False-False-int-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-False-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-False-False-float32-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-False-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-False-False-bool_-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-False-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-False-False-str-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-False-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-True-False-float-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-True-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-True-False-int-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-True-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-True-False-float32-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-True-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-True-False-bool_-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-True-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-True-False-str-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-min-True-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-False-False-float-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-False-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-False-False-int-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-False-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-False-False-float32-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-False-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-False-False-bool_-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-False-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-False-False-str-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-False-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-True-False-float-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-True-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-True-False-int-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-True-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-True-False-float32-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-True-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-True-False-bool_-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-True-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-True-False-str-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-False-max-True-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-False-True-bool_-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-False-True-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-False-True-str-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-False-True-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-False-False-float-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-False-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-False-False-int-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-False-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-False-False-float32-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-False-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-False-False-bool_-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-False-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-False-False-str-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-False-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-True-True-bool_-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-True-True-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-True-True-str-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-True-True-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-True-False-float-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-True-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-True-False-int-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-True-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-True-False-float32-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-True-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-True-False-bool_-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-True-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-True-False-str-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-min-True-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-False-True-bool_-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-False-True-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-False-True-str-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-False-True-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-False-False-float-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-False-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-False-False-int-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-False-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-False-False-float32-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-False-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-False-False-bool_-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-False-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-False-False-str-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-False-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-True-True-bool_-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-True-True-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-True-True-str-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-True-True-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-True-False-float-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-True-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-True-False-int-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-True-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-True-False-float32-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-True-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-True-False-bool_-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-True-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-True-False-str-1] xarray/tests/test_duck_array_ops.py::test_argmin_max[x-True-max-True-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-min-False-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-min-False-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-min-False-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-min-False-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-min-False-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-min-True-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-min-True-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-min-True-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-min-True-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-min-True-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-max-False-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-max-False-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-max-False-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-max-False-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-max-False-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-max-True-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-max-True-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-max-True-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-max-True-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-False-max-True-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-min-False-True-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-min-False-True-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-min-False-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-min-False-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-min-False-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-min-False-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-min-False-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-min-True-True-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-min-True-True-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-min-True-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-min-True-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-min-True-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-min-True-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-min-True-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-max-False-True-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-max-False-True-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-max-False-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-max-False-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-max-False-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-max-False-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-max-False-False-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-max-True-True-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-max-True-True-str-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-max-True-False-float-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-max-True-False-int-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-max-True-False-float32-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-max-True-False-bool_-2] xarray/tests/test_duck_array_ops.py::test_argmin_max[y-True-max-True-False-str-2] /home/vsts/work/1/s/xarray/core/dataarray.py:1842: FutureWarning: dropping coordinates using key values of dict-like labels is deprecated; use drop_vars or a list of coordinates. ds = self._to_temp_dataset().drop(labels, dim, errors=errors)

xarray/tests/test_plot.py::TestPlotStep::test_step /home/vsts/work/1/s/xarray/plot/plot.py:321: MatplotlibDeprecationWarning: Passing the drawstyle with the linestyle as a single string is deprecated since Matplotlib 3.1 and support will be removed in 3.3; please pass the drawstyle separately using the drawstyle keyword argument to Line2D or set_drawstyle() method (or ds/set_ds()). primitive = ax.plot(xplt_val, yplt_val, args, *kwargs)

xarray/tests/test_print_versions.py::test_show_versions /usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp

xarray/tests/test_sparse.py::test_dataarray_method[obj.roll((), {'x': 2})-True] /home/vsts/work/1/s/xarray/core/dataarray.py:2632: FutureWarning: roll_coords will be set to False in the future. Explicitly set roll_coords to silence warning. shifts=shifts, roll_coords=roll_coords, *shifts_kwargs

xarray/tests/test_sparse.py::TestSparseDataArrayAndDataset::test_ufuncs /home/vsts/work/1/s/xarray/tests/test_sparse.py:711: PendingDeprecationWarning: xarray.ufuncs will be deprecated when xarray no longer supports versions of numpy older than v1.17. Instead, use numpy ufuncs directly. assert_equal(np.sin(x), xu.sin(x))

xarray/tests/test_sparse.py::TestSparseDataArrayAndDataset::test_ufuncs /home/vsts/work/1/s/xarray/core/dataarray.py:2393: PendingDeprecationWarning: xarray.ufuncs will be deprecated when xarray no longer supports versions of numpy older than v1.17. Instead, use numpy ufuncs directly. return self.array_wrap(f(self.variable.data, args, *kwargs))

xarray/tests/test_sparse.py::TestSparseDataArrayAndDataset::test_groupby_bins /home/vsts/work/1/s/xarray/core/groupby.py:780: FutureWarning: Default reduction dimension will be changed to the grouped dimension in a future version of xarray. To silence this warning, pass dim=xarray.ALL_DIMS explicitly. **kwargs

-- Docs: https://docs.pytest.org/en/latest/warnings.html ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3266/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
295959111 MDU6SXNzdWUyOTU5NTkxMTE= 1900 Representing & checking Dataset schemas max-sixty 5635139 open 0     15 2018-02-09T18:06:08Z 2022-07-14T11:28:37Z   MEMBER      

What would be the best way to canonically describe a dataset, which could be read by both humans and machines?

For example, frequently in our code we have docstrings which look something like:

``` def get_returns(security_ids): """ Retuns mega-dimensional dataset which gives recent returns for a set of securities by: - Date - Return (raw / economic / smoothed / etc) - Scaling (constant / risk_scaled) - Span - Hedged vs Unhedged

Dataset keys are security ids. All dimensions have coords.
"""

```

This helps when attempting to understand what code is doing while only reading it. But this isn't consistent between docstrings and can't be read or checked by a machine. Has anyone solved this problem / have any suggestions for resources out there?

Tangentially related to https://github.com/python/typing/issues/513 (but our issues are less about the type, dimension sizes, and more about the arrays within a dataset, their dimensions, and their names)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1900/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1125030343 I_kwDOAMm_X85DDpnH 6243 Maintenance improvements max-sixty 5635139 open 0     0 2022-02-05T21:01:51Z 2022-02-05T21:01:51Z   MEMBER      

Is your feature request related to a problem?

At the end of the dev call, we discussed ways to do better at maintenance. I'd like to make Xarray a wonderful place to contribute, partly because it was so formative for me in becoming more involved with software engineering.

Describe the solution you'd like

We've already come far, because of the hard work of many of us!

A few ideas, in increasing order of radical-ness - We looked at @andersy005's dashboards for PRs & Issues. Could we expose this, both to hold ourselves accountable and signal to potential contributors that we care about turnaround time for their contributions? - Is there a systematic way of understanding who should review something? - FWIW a few months ago I looked for a bot that would recommend a reviewer based on who had contributed code in the past, which I think I've seen before. But I couldn't find one generally available. This would be really helpful — we wouldn't have n people each assessing whether they're the best reviewer for each contribution. If anyone does better than me at finding something like this, that would be awesome. - Could we add a label so people can say "now I'm waiting for a review", and track how long those stay up? - Ensuring the 95th percentile is < 2 days is more important than the median being in the hours. It does pain me when I see PRs get dropped for a few weeks. TBC, I'm as responsible as anyone. - Could we have a bot that asks for feedback on the review process — i.e. "I received a prompt and helpful review", "I would recommend a friend contribute to Xarray", etc?

Describe alternatives you've considered

No response

Additional context

There's always a danger with making stats legible that Goodhart's law strikes. And sometimes stats are not joyful, and lots of people come here for joy. So probably there's a tradeoff.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6243/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
907715257 MDU6SXNzdWU5MDc3MTUyNTc= 5409 Split up tests? max-sixty 5635139 open 0     4 2021-05-31T21:07:53Z 2021-06-16T15:51:19Z   MEMBER      

Currently a large share of our tests are in test_dataset.py and test_dataarray.py — each of which are around 7k lines.

There's a case for splitting these up: - Many of the tests are somewhat duplicated between the files (and test_variable.py in some cases) — i.e. we're running the same test over a Dataset & DataArray, but putting them far away from each other in separate files. Should we instead have them split by "function"; e.g. test_rolling.py for all rolling tests? - My editor takes 5-20 seconds to run the linter and save the file. This is a very narrow complaint. - Now that we're all onto pytest, there's no need to have them in the same class.

If we do this, we could start on the margin — new tests around some specific functionality — e.g. join / rolling / reindex / stack (just a few from browsing through) — could go into a new respective test_{}.py file. Rather than some big copy and paste commit.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5409/reactions",
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
521754870 MDU6SXNzdWU1MjE3NTQ4NzA= 3514 Should we cache some small properties? max-sixty 5635139 open 0     7 2019-11-12T19:28:21Z 2019-11-16T04:32:11Z   MEMBER      

I was doing some profiling on isel, and see there are some properties that (I think) never change, but are called frequently. Should we cache these on their object?

Pandas uses cache_readonly for these cases.

Here's a case: we call LazilyOuterIndexedArray.shape frequently when doing a simple indexing operation. Each call takes ~150µs. An attribute lookup on a python object takes ~50ns (i.e. 3000x faster). IIUC the result on that property should never change.

I don't think this is the solution to performance issues, and there's some additional complexity. Could they be easy & small wins, though?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3514/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
366510937 MDU6SXNzdWUzNjY1MTA5Mzc= 2460 Update docs to include how to Join using a non-index coord max-sixty 5635139 open 0 max-sixty 5635139   3 2018-10-03T20:19:15Z 2018-11-01T15:37:44Z   MEMBER      

I originally posted this on SO, as I thought it was a user question rather than a library issue. But after working on it more today, I'm not so sure.

I'm trying to do a 'join' in xarray, but using a non-index coordinate rather than a shared dim.

I have a Dataset indexed on 'a' with a coord on 'b', and a DataArray indexed on 'b':

```python In [17]: ds=xr.Dataset(dict(a=(('x'),np.random.rand(10))), coords=dict(b=(('x'),list(range(10)))))

In [18]: ds Out[18]: <xarray.Dataset> Dimensions: (x: 10) Coordinates: b (x) int64 0 1 2 3 4 5 6 7 8 9 Dimensions without coordinates: x Data variables: a (x) float64 0.3634 0.2132 0.6945 0.5359 0.1053 0.07045 0.5945 ...

In [19]: da=xr.DataArray(np.random.rand(10), dims=('b',), coords=dict(b=(('b'),list(range(10)))))

In [20]: da Out[20]: <xarray.DataArray (b: 10)> array([0.796987, 0.275992, 0.747882, 0.240374, 0.435143, 0.285271, 0.753582, 0.556038, 0.365889, 0.434844]) Coordinates: * b (b) int64 0 1 2 3 4 5 6 7 8 9

```

Can I add da onto my dataset, by joining on ds.b equalling da.b? The result would be:

python <xarray.Dataset> Dimensions: (x: 10) Coordinates: b (x) int64 0 1 2 3 4 5 6 7 8 9 Dimensions without coordinates: x Data variables: a (x) float64 0.3634 0.2132 0.6945 0.5359 0.1053 0.07045 0.5945 ... da (x) float64 0.796987, 0.275992, 0.747882, 0.240374, 0.435143 ...

(for completeness - the data isn't current in the correct position)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2460/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 52.193ms · About: xarray-datasette