home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

12 rows where comments = 4, type = "issue" and user = 2448579 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 9
  • open 3

type 1

  • issue · 12 ✖

repo 1

  • xarray 12
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2259316341 I_kwDOAMm_X86Gqm51 8965 Support concurrent loading of variables dcherian 2448579 open 0     4 2024-04-23T16:41:24Z 2024-04-29T22:21:51Z   MEMBER      

Is your feature request related to a problem?

Today if users have to concurrently load multiple variables in a DataArray or Dataset, they have to use dask.

It struck me that it'd be pretty easy for .load to gain an executor kwarg that accepts anything that follows the concurrent.futures executor interface, and parallelize this loop.

https://github.com/pydata/xarray/blob/b0036749542145794244dee4c4869f3750ff2dee/xarray/core/dataset.py#L853-L857

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8965/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2027147099 I_kwDOAMm_X854089b 8523 tree-reduce the combine for `open_mfdataset(..., parallel=True, combine="nested")` dcherian 2448579 open 0     4 2023-12-05T21:24:51Z 2023-12-18T19:32:39Z   MEMBER      

Is your feature request related to a problem?

When parallel=True and a distributed client is active, Xarray reads every file in parallel, constructs a Dataset per file with indexed coordinates loaded, and then sends all of that back to the "head node" for the combine.

Instead we can tree-reduce the combine (example) by switching to dask.bag instead of dask.delayed and skip the overhead of shipping 1000s of copies of an indexed coordinate back to the head node.

  1. The downside is the dask graph is "worse" but perhaps that shouldn't stop us.
  2. I think this is only feasible for combine="nested"

cc @TomNicholas

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8523/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1603957501 I_kwDOAMm_X85fmnL9 7573 Add optional min versions to conda-forge recipe (`run_constrained`) dcherian 2448579 closed 0     4 2023-02-28T23:12:15Z 2023-08-21T16:12:34Z 2023-08-21T16:12:21Z MEMBER      

Is your feature request related to a problem?

I opened this PR to add minimum versions for our optional dependencies: https://github.com/conda-forge/xarray-feedstock/pull/84/files to prevent issues like #7467

I think we'd need a policy to choose which ones to list. Here's the current list: run_constrained: - bottleneck >=1.3 - cartopy >=0.20 - cftime >=1.5 - dask-core >=2022.1 - distributed >=2022.1 - flox >=0.5 - h5netcdf >=0.13 - h5py >=3.6 - hdf5 >=1.12 - iris >=3.1 - matplotlib-base >=3.5 - nc-time-axis >=1.4 - netcdf4 >=1.5.7 - numba >=0.55 - pint >=0.18 - scipy >=1.7 - seaborn >=0.11 - sparse >=0.13 - toolz >=0.11 - zarr >=2.10

Some examples to think about: 1. iris seems like a bad one to force. It seems like people might use Iris and Xarray independently and Xarray shouldn't force a minimum version. 2. For backends, I arbitrarily kept netcdf4, h5netcdf and zarr. 3. It seems like we should keep array types: so dask, sparse, pint.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7573/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1789989152 I_kwDOAMm_X85qsREg 7962 Better chunk manager error dcherian 2448579 closed 0     4 2023-07-05T17:27:25Z 2023-07-24T22:26:14Z 2023-07-24T22:26:13Z MEMBER      

What happened?

I just ran in to this error in an environment without dask. TypeError: Could not find a Chunk Manager which recognises type <class 'dask.array.core.Array'>

I think we could easily recommend the user to install a package that provides dask by looking at type(array).__name__. This would make the message a lot friendlier

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7962/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1760733017 I_kwDOAMm_X85o8qdZ 7924 Migrate from nbsphinx to myst, myst-nb dcherian 2448579 open 0     4 2023-06-16T14:17:41Z 2023-06-20T22:07:42Z   MEMBER      

Is your feature request related to a problem?

I think we should switch to MyST markdown for our docs. I've been using MyST markdown and MyST-NB in docs in other projects and it works quite well.

Advantages: 1. We get HTML reprs in the docs (example) which is a big improvement. (#6620) 2. I think many find markdown a lot easier to write than RST

There's a tool to migrate RST to MyST (RTD's migration guide).

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7924/reactions",
    "total_count": 5,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1119738354 I_kwDOAMm_X85Cvdny 6222 test packaging & distribution dcherian 2448579 closed 0     4 2022-01-31T17:42:40Z 2022-02-03T15:45:17Z 2022-02-03T15:45:17Z MEMBER      

Is your feature request related to a problem?

It seems like we should have a test to make sure our dependencies are specified correctly.

Describe the solution you'd like

For instance we could add a step to the release workflow: https://github.com/pydata/xarray/blob/b09de8195a9e22dd35d1b7ed608ea15dad0806ef/.github/workflows/pypi-release.yaml#L34-L43

after twine check where we pip install and then try to import xarray.

Alternatively we could have another test config in our regular CI to build + import.

Thoughts? Is this excessive for a somewhat rare problem?

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6222/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1072473598 I_kwDOAMm_X84_7KX- 6051 Check for just ... in stack etc, and raise with a useful error message dcherian 2448579 closed 0     4 2021-12-06T18:35:27Z 2022-01-03T23:05:23Z 2022-01-03T23:05:23Z MEMBER      

Is your feature request related to a problem? Please describe.

The following doesn't work ``` python import xarray as xr

da = xr.DataArray([[1,2],[1,2]], dims=("x", "y")) da.stack(flat=...) ```

Describe the solution you'd like This could be equivalent to python da.stack(flat=da.dims)

I think using ds.dims it should be fine for datasets too.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6051/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
502149236 MDU6SXNzdWU1MDIxNDkyMzY= 3371 Add xr.unify_chunks top level method dcherian 2448579 closed 0     4 2019-10-03T15:49:09Z 2021-06-16T14:56:59Z 2021-06-16T14:56:58Z MEMBER      

This should handle multiple DataArrays and Datasets.

Implemented in #3276 as Dataset.unify_chunks and DataArray.unify_chunks

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3371/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
636666706 MDU6SXNzdWU2MzY2NjY3MDY= 4146 sparse upstream-dev test failures dcherian 2448579 closed 0     4 2020-06-11T02:20:11Z 2021-03-17T23:10:45Z 2020-06-16T16:00:10Z MEMBER      

Full log here: https://dev.azure.com/xarray/xarray/_build/results?buildId=3023&view=logs&jobId=2280efed-fda1-53bd-9213-1fa8ec9b4fa8&j=2280efed-fda1-53bd-9213-1fa8ec9b4fa8&t=175181ee-1928-5a6b-f537-168f7a8b7c2d

Here are three of the errors:

/usr/share/miniconda/envs/xarray-tests/lib/python3.8/site-packages/sparse/_coo/umath.py:739: SystemError _ test_variable_method[obj.where(*(), **{'cond': <xarray.Variable (x: 10, y: 5)>\n<COO: shape=(10, 5), dtype=bool, nnz=3, fill_value=False>})-True] _ TypeError: expected dtype object, got 'numpy.dtype[uint64]'

`` def _match_coo(*args, **kwargs): """ Matches the coordinates for any number of input :obj:COO` arrays. Equivalent to "sparse" broadcasting for all arrays.

    Parameters
    ----------
    args : Tuple[COO]
        The input :obj:`COO` arrays.
    return_midx : bool
        Whether to return matched indices or matched arrays. Matching
        only supported for two arrays. ``False`` by default.
    cache : dict
        Cache of things already matched. No cache by default.

    Returns
    -------
    matched_idx : List[ndarray]
        The indices of matched elements in the original arrays. Only returned if
        ``return_midx`` is ``True``.
    matched_arrays : List[COO]
        The expanded, matched :obj:`COO` objects. Only returned if
        ``return_midx`` is ``False``.
    """
    from .core import COO
    from .common import linear_loc

    cache = kwargs.pop("cache", None)
    return_midx = kwargs.pop("return_midx", False)
    broadcast_shape = kwargs.pop("broadcast_shape", None)

    if kwargs:
        linear = [idx[s] for idx, s in zip(linear, sorted_idx)]
      matched_idx = _match_arrays(*linear)

E SystemError: CPUDispatcher(<function _match_arrays at 0x7f66b6272af0>) returned a result with an error set

```

``` ____ testdask_token ______

@requires_dask
def test_dask_token():
    import dask

    s = sparse.COO.from_numpy(np.array([0, 0, 1, 2]))

    # https://github.com/pydata/sparse/issues/300
    s.__dask_tokenize__ = lambda: dask.base.normalize_token(s.__dict__)

    a = DataArray(s)
    t1 = dask.base.tokenize(a)
    t2 = dask.base.tokenize(a)
    t3 = dask.base.tokenize(a + 1)
    assert t1 == t2
    assert t3 != t2
    assert isinstance(a.data, sparse.COO)

    ac = a.chunk(2)
    t4 = dask.base.tokenize(ac)
    t5 = dask.base.tokenize(ac + 1)
    assert t4 != t5
  assert isinstance(ac.data._meta, sparse.COO)

E AssertionError: assert False E + where False = isinstance(array([], dtype=int64), <class 'sparse._coo.core.COO'>) E + where array([], dtype=int64) = dask.array<xarray-\<this-array>, shape=(4,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>._meta E + where dask.array<xarray-\<this-array>, shape=(4,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray> = <xarray.DataArray (dim_0: 4)>\ndask.array<xarray-\<this-array>, shape=(4,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>\nDimensions without coordinates: dim_0.data E + and <class 'sparse._coo.core.COO'> = sparse.COO ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4146/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
636665269 MDU6SXNzdWU2MzY2NjUyNjk= 4145 Fix matplotlib in upstream-dev test config dcherian 2448579 closed 0     4 2020-06-11T02:15:52Z 2020-06-12T09:11:31Z 2020-06-12T09:11:31Z MEMBER      

From @keewis comment in #4138

I just noticed that the rackcdn.org repository doesn't have matplotlib>=3.2.0, so since about late February we don't test against matplotlib upstream anymore.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4145/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
398152613 MDU6SXNzdWUzOTgxNTI2MTM= 2667 datetime interpolation doesn't work dcherian 2448579 closed 0     4 2019-01-11T06:45:55Z 2019-02-11T09:47:09Z 2019-02-11T09:47:09Z MEMBER      

Code Sample, a copy-pastable example if possible

This code doesn't work anymore on master.

python a = xr.DataArray(np.arange(21).reshape(3, 7), dims=['x', 'time'], coords={'x': [1, 2, 3], 'time': pd.date_range('01-01-2001', periods=7, freq='D')}) xi = xr.DataArray(np.linspace(1, 3, 50), dims=['time'], coords={'time': pd.date_range('01-01-2001', periods=50, freq='H')}) a.interp(x=xi, time=xi.time)

Problem description

The above code now raises the error

```

AttributeError Traceback (most recent call last) <ipython-input-26-dda3a6d5725b> in <module> 6 dims=['time'], 7 coords={'time': pd.date_range('01-01-2001', periods=50, freq='H')}) ----> 8 a.interp(x=xi, time=xi.time)

~/work/python/xarray/xarray/core/dataarray.py in interp(self, coords, method, assume_sorted, kwargs, coords_kwargs) 1032 ds = self._to_temp_dataset().interp( 1033 coords, method=method, kwargs=kwargs, assume_sorted=assume_sorted, -> 1034 coords_kwargs) 1035 return self._from_temp_dataset(ds) 1036

~/work/python/xarray/xarray/core/dataset.py in interp(self, coords, method, assume_sorted, kwargs, coords_kwargs) 2008 in indexers.items() if k in var.dims} 2009 variables[name] = missing.interp( -> 2010 var, var_indexers, method, kwargs) 2011 elif all(d not in indexers for d in var.dims): 2012 # keep unrelated object array

~/work/python/xarray/xarray/core/missing.py in interp(var, indexes_coords, method, *kwargs) 468 new_dims = broadcast_dims + list(destination[0].dims) 469 interped = interp_func(var.transpose(original_dims).data, --> 470 x, destination, method, kwargs) 471 472 result = Variable(new_dims, interped, attrs=var.attrs)

~/work/python/xarray/xarray/core/missing.py in interp_func(var, x, new_x, method, kwargs) 535 new_axis=new_axis, drop_axis=drop_axis) 536 --> 537 return _interpnd(var, x, new_x, func, kwargs) 538 539

~/work/python/xarray/xarray/core/missing.py in _interpnd(var, x, new_x, func, kwargs) 558 var = var.transpose(range(-len(x), var.ndim - len(x))) 559 # stack new_x to 1 vector, with reshape --> 560 xi = np.stack([x1.values.ravel() for x1 in new_x], axis=-1) 561 rslt = func(x, var, xi, **kwargs) 562 # move back the interpolation axes to the last position

~/work/python/xarray/xarray/core/missing.py in <listcomp>(.0) 558 var = var.transpose(range(-len(x), var.ndim - len(x))) 559 # stack new_x to 1 vector, with reshape --> 560 xi = np.stack([x1.values.ravel() for x1 in new_x], axis=-1) 561 rslt = func(x, var, xi, **kwargs) 562 # move back the interpolation axes to the last position

AttributeError: 'numpy.ndarray' object has no attribute 'values' ```

I think the issue is this line which returns a numpy array instead of a Variable. This was added in the coarsen PR (cc @fujiisoup) https://github.com/pydata/xarray/blob/d4c46829b283ab7e7b7db8b86dae77861ce68f3c/xarray/core/utils.py#L636

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2667/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
373955021 MDU6SXNzdWUzNzM5NTUwMjE= 2510 Dataset-wide _FillValue dcherian 2448579 closed 0     4 2018-10-25T13:44:46Z 2018-10-25T17:39:35Z 2018-10-25T17:37:26Z MEMBER      

I'm looking at a netCDF file that has the variable float T_20(time, depth, lat, lon) ; T_20:name = "T" ; T_20:long_name = "TEMPERATURE (C)" ; T_20:generic_name = "temp" ; T_20:FORTRAN_format = "f10.2" ; T_20:units = "C" ; T_20:epic_code = 20 ;

and global attributes ``` // global attributes: :platform_code = "8n90e" ; :site_code = "8n90e" ; :wmo_platform_code = 23007 ; :array = "RAMA" ; :Request_for_acknowledgement = "If you use these data in publications or presentations, please acknowledge the GTMBA Project Office of NOAA/PMEL. Also, we would appreciate receiving a preprint and/or reprint of publications utilizing the data for inclusion in our bibliography. Relevant publications should be sent to: GTMBA Project Office, NOAA/Pacific Marine Environmental Laboratory, 7600 Sand Point Way NE, Seattle, WA 98115" ; :Data_Source = "Global Tropical Moored Buoy Array Project Office/NOAA/PMEL" ; :File_info = "Contact: Dai.C.McClurg@noaa.gov" ; :missing_value = 1.e+35f ; :_FillValue = 1.e+35f ; :CREATION_DATE = "13:05 28-JUL-2017" ; :_Format = "classic" ;

```

Problem description

In this case the _FillValue and missing_value attributes are set for the entire dataset and not each individual variable. decode_cf_variable thus fails to insert NaNs.

I'm not sure that this is standards-compliant but is this something we could support?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2510/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 35.951ms · About: xarray-datasette