home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

18 rows where user = 1386642 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, closed_at, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 14
  • pull 4

state 2

  • closed 12
  • open 6

repo 1

  • xarray 18
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
416962458 MDU6SXNzdWU0MTY5NjI0NTg= 2799 Performance: numpy indexes small amounts of data 1000 faster than xarray nbren12 1386642 open 0     42 2019-03-04T19:44:17Z 2024-03-18T17:51:25Z   CONTRIBUTOR      

Machine learning applications often require iterating over every index along some of the dimensions of a dataset. For instance, iterating over all the (lat, lon) pairs in a 4D dataset with dimensions (time, level, lat, lon). Unfortunately, this is very slow with xarray objects compared to numpy (or h5py) arrays. When the Pangeo machine learning working group met today, we found that several of us have struggled with this.

I made some simplified benchmarks, which show that xarray is about 1000 times slower than numpy when repeatedly grabbing a small amount of data from an array. This is a problem with both isel or [] indexing. After doing some profiling, the main culprits seem to be xarray routines like _validate_indexers and _broadcast_indexes.

While python will always be slower than C when iterating over an array in this fashion, I would hope that xarray could be nearly as fast as numpy. I am not sure what the best way to improve this is though.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2799/reactions",
    "total_count": 9,
    "+1": 9,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
856172272 MDU6SXNzdWU4NTYxNzIyNzI= 5144 Add chunks argument to {zeros/ones/empty}_like. nbren12 1386642 closed 0     5 2021-04-12T17:01:47Z 2023-10-25T03:18:05Z 2023-10-25T03:18:05Z CONTRIBUTOR      

Describe the solution you'd like

We have started using xarray objects as "schema" for initializing zarrs that will be written to using the region argument of to_zarr. For example,

output_schema.to_zarr(path, compute=False) for region in regions: output = func(input_data.isel(region)) output.to_zarr(path, region=region)

Currently, xarray's tools for computing the output_schema Dataset are a lacking since rechunking existing datasets can be slow. dask.array.zeros_like takes a chunks argument, can we add one here too?

Describe alternatives you've considered

.chunk

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5144/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1473152374 I_kwDOAMm_X85XzoV2 7348 Using entry_points to register dataset and dataarray accessors? nbren12 1386642 open 0     4 2022-12-02T16:48:42Z 2023-09-14T19:53:46Z   CONTRIBUTOR      

Is your feature request related to a problem?

External libraries often use the dataset/dataarray accessor pattern (e.g. metpy). These accessors are not available until importing the external package where the registration occurs. This means scripts using these accessors must include an often-unused import that linters will complain about e.g.

``` import metpy # linter complains here

some data

ds: xr.Dataset = ...

ds.metpy.... ```

Describe the solution you'd like

Use importlib entrypoints to register these as entrypoints so that registration is automatically handled. This is currently enabled for the array backend, but not for accessors (e.g. metpy's setup.cfg).

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7348/reactions",
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
    xarray 13221727 issue
753852119 MDU6SXNzdWU3NTM4NTIxMTk= 4628 Lazy concatenation of arrays nbren12 1386642 open 0     5 2020-11-30T22:32:08Z 2022-05-10T17:02:34Z   CONTRIBUTOR      

Is your feature request related to a problem? Please describe. Concatenating xarray objects forces the data to load. I recently learned about this object allowing lazy indexing into an DataArrays/sets without using dask. Concatenation along a single dimension is the inverse operation of slicing, so it seems natural to also support it. Also, concatenating along dimensions (e.g. "run"/"simulation"/"ensemble") can be a common merging workflow.

Describe the solution you'd like

xr.concat([a, b], dim=...) does not load any data in a or b.

Describe alternatives you've considered One could rename the variables in a and b to allow them to be merged (e.g. a['air_temperature'] -> "air_temperature_a"), but it's more natural to make a new dimension.

Additional context

This is useful when not using dask for performance reasons (e.g. using another parallelism engine like Apache Beam).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4628/reactions",
    "total_count": 8,
    "+1": 8,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
588112617 MDU6SXNzdWU1ODgxMTI2MTc= 3894 Add public API for Dataset._copy_listed nbren12 1386642 open 0     15 2020-03-26T02:39:34Z 2022-04-18T16:41:39Z   CONTRIBUTOR      

In my data pipelines, I have been repeatedly burned using indexing notation to grab a few variables from a dataset in the following way: ds = xr.Dataset(...) vars = ('a' , 'b', 'c') ds[vars] # this errors ds[list(vars)] # this is ok Moreover, because Dataset__getitem__ is type unstable, it makes it hard to detect this kind of error using mypy, so it often appears 30 minutes into a long data pipeline. It would be great to have a type-stable method that can take any sequence of variable names and return the Dataset consisting of those variables and their coordinates only. In fact, this method already exists, but it currently not public API. Could we make it so? Thanks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3894/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
224846826 MDU6SXNzdWUyMjQ4NDY4MjY= 1387 FacetGrid with independent colorbars nbren12 1386642 open 0     7 2017-04-27T16:47:44Z 2022-04-13T11:07:49Z   CONTRIBUTOR      

Sometimes the magnitude of a variable can vary dramatically across a given coordinate, which makes 2d plots generated by xr.FacetGrid difficult to interpret. It would be useful if an option to xr.FacetGrid could be specified which allows each subplot to have its own colorbar.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1387/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1132894350 I_kwDOAMm_X85DhpiO 6269 Adding CDL Parser/`open_cdl`? nbren12 1386642 open 0     7 2022-02-11T17:31:36Z 2022-02-14T17:18:38Z   CONTRIBUTOR      

Is your feature request related to a problem?

No.

Describe the solution you'd like

It would be nice to load/generate xarray datasets from Common Data Language (CDL) descriptions. CDL is a DSL that that defines a netCDF dataset, and is quite nice for testing. We use it to build mock datasets for e.g. integration testing of plotting routines/complex data analysis etc. CDL provides a concise format for storing the schema of this data. This schema can be used for validation or generation (using the CLI ncgen).

CDL is basically the format produced by xarray.Dataset.info. It looks like this: netcdf example { // example of CDL notation dimensions: lon = 3 ; lat = 8 ; variables: float rh(lon, lat) ; rh:units = "percent" ; rh:long_name = "Relative humidity" ; // global attributes :title = "Simple example, lacks some conventions" ; data: /// optional ...ncgen will still build rh = 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89 ; }

I wrote a small pure python parser for CDL last night and it seems work! There are similar projects on github. Sadly, these projects seem to be abandoned so it would be nice to attach to an effort like xarray.

Describe alternatives you've considered

Some kind of schema object that can be used to validate or generate an xarray Dataset, but does not contain any data.

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6269/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
484863660 MDExOlB1bGxSZXF1ZXN0MzEwNjQxMzE0 3262 [WIP] Implement 1D to ND interpolation nbren12 1386642 closed 0     9 2019-08-24T21:23:21Z 2020-12-17T01:29:12Z 2020-12-17T01:29:12Z CONTRIBUTOR   0 pydata/xarray/pulls/3262
  • [x] Closes #3252
  • [ ] Tests added
  • [ ] Passes black . && mypy . && flake8
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3262/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
334366223 MDU6SXNzdWUzMzQzNjYyMjM= 2241 Slow performance with isel on stacked coordinates nbren12 1386642 closed 0     4 2018-06-21T07:13:32Z 2020-06-20T20:51:48Z 2020-06-20T20:51:48Z CONTRIBUTOR      

Code Sample

```python

a = xr.DataArray(np.random.rand(64,64,64), dims=list('xyz')).chunk({'x':8, 'y': 8}) b = a.stack(b=['x', 'y']) %timeit b.isel(b=0).load() 3.81 ms ± 24.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit a.isel(x=0, y=0).load() 822 µs ± 3.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) np.allclose(b.isel(b=0).values, a.isel(x=0, y=0).values) True ```

Problem description

I have noticed some pretty significant slow downs when using dask and stacked indices. As you can see in the example above, selecting the point x=0, y=0 takes about 4 times as long when the x and y dimensions are stacked together. This big difference only appears when .load is called. Does this mean it's a dask issue?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Darwin OS-release: 17.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 xarray: 0.10.7 pandas: 0.22.0 numpy: 1.13.3 scipy: 1.0.0 netCDF4: 1.3.0 h5netcdf: 0.4.2 h5py: 2.7.1 Nio: None zarr: 2.2.0 bottleneck: 1.2.1 cyordereddict: None dask: 0.17.1 distributed: 1.21.1 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.1.0 pip: 9.0.1 conda: None pytest: 3.5.1 IPython: 6.2.1 sphinx: 1.6.5
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2241/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
636611699 MDExOlB1bGxSZXF1ZXN0NDMyNzU0MDQ5 4144 Improve typehints of xr.Dataset.__getitem__ nbren12 1386642 closed 0     10 2020-06-10T23:33:41Z 2020-06-17T01:41:27Z 2020-06-15T11:25:53Z CONTRIBUTOR   0 pydata/xarray/pulls/4144

To resolve some common type-related errors, this PR adds some overload type hints to Dataset.__getitem__. Now mypy can correctly infer that hashable inputs return DataArrays.

  • [x] Closes #4125
  • [x] Passes isort -rc . && black . && mypy . && flake8
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4144/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
631940742 MDU6SXNzdWU2MzE5NDA3NDI= 4125 Improving typing of `xr.Dataset.__getitem__` nbren12 1386642 closed 0     2 2020-06-05T20:40:39Z 2020-06-15T11:25:53Z 2020-06-15T11:25:53Z CONTRIBUTOR      

First, I'd like the thank the xarray dev's for adding type hints to this library, not many libraries have this feature!

That said, the indexing notation of xr.Dataset does not currently play well wit mypy since it returns a Union type. This results in a lot of mypy errors like this: workflows/fine_res_budget/budget/budgets.py:284: error: Argument 6 to "compute_recoarsened_budget_field" has incompatible type "Union[DataArray, Dataset]"; expected "DataArray" workflows/fine_res_budget/budget/budgets.py:285: error: Argument 1 to "storage" has incompatible type "Union[DataArray, Dataset]"; expected "DataArray" workflows/fine_res_budget/budget/budgets.py:286: error: Argument "unresolved_flux" to "compute_recoarsened_budget_field" has incompatible type "Union[DataArray, Dataset]"; expected "DataArray" workflows/fine_res_budget/budget/budgets.py:287: error: Argument "saturation_adjustment" to "compute_recoarsened_budget_field" has incompatible type "Union[DataArray, Dataset]"; expected "DataArray"

MCVE Code Sample

``` def func(ds: xr.Dataset): pass

dataset: xr.Dataset = ...

error:

this line will give type error because mypy doesn't know

if ds[['a', 'b]] is Dataset or a DataArray

func(ds[['a', 'b']]) ```

Expected Output

Mypy should be able to infer that ds[['a', b']] is a Dataset, and that ds['a'] is a DataArray.

Problem Description

This requires any routine with type hints that consume an output of xr.Dataset.__getitem__ to require a Union[DataArray, Dataset] even if it really intends to be used with either DataArray or DataArray. Because ds[something] is a ubiquitous syntax, this behavior accounts for approximately 50% of mypy errors in my xarray heavy code.

Versions

Output of <tt>xr.show_versions()</tt> In [1]: import xarray as xr xr. In [2]: xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.3.0-1020-gcp machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.7.3 xarray: 0.15.1 pandas: 1.0.1 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.8.0 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.1.2 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.17.2 distributed: 2.17.0 matplotlib: 3.1.3 cartopy: 0.17.0 seaborn: 0.10.1 numbagg: None setuptools: 46.4.0.post20200518 pip: 20.0.2 conda: 4.8.3 pytest: 5.4.2 IPython: 7.13.0 sphinx: None

Potential solution

I think we can fix this with typing.overload. I am not too familiar with that librariy, but I think something like the following might work:

``` from typing import overload

class Dataset @overload def getitem(self, key: Hashable) -> DataArray: ...

 @overload
def __getitem__(self, key: List[Hashable]) -> "Dataset": ...

 # actual implementation
def __getitem__

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4125/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
289837692 MDU6SXNzdWUyODk4Mzc2OTI= 1839 Add simple array creation functions for easier unit testing nbren12 1386642 closed 0     3 2018-01-19T01:53:20Z 2020-01-19T04:21:10Z 2020-01-19T04:21:10Z CONTRIBUTOR      

When I am writing unit tests for routines that involve DataArray objects many lines of code are devoted to creating mock objects. Here is an example of a unit test I recently wrote to test some code which computes the fluid derivative of a field given the velocity. ```python def test_material_derivative():

dims = ['x', 'y', 'z', 'time']
coords = {dim: np.arange(10) for dim in dims}
shape = [coords[dim].shape[0] for dim in coords]

f = xr.Dataset({'f': (dims, np.ones(shape))}, coords=coords)
f = f.f

one = 0 *f +1
zero = 0*f


md = material_derivative(zero, one, zero, f.x + 0*f)
np.testing.assert_array_almost_equal(md.values, 0)

md = material_derivative(one, zero, zero, f.x + 0*f)
np.testing.assert_array_almost_equal(md.isel(x=slice(1,-1)).values,
                                     one.isel(x=slice(1,-1)).values)

md = material_derivative(zero, one, zero, f.y + 0*f)
np.testing.assert_array_almost_equal(md.isel(y=slice(1,-1)).values,
                                     one.isel(y=slice(1,-1)).values)

md = material_derivative(zero, zero, one, f.z + 0*f)
np.testing.assert_array_almost_equal(md.isel(z=slice(1,-1)).values,
                                     one.isel(z=slice(1,-1)).values)

```

As you can see, I devote many lines to initializing a 4D data array of all ones, where all the coordinates are np.arange(10) objects. It isn't too hard to do this once, but it gets pretty annoying to do many times, especially when I forget how the DataArray and Dataset constructors work. Now, I can do something like xr.DataArray(np.ones(...)), but I would still have to initialize the coordinates if I use them.

In any case, having some sort of functions like xr.ones, xr.zeros, and xr.rand which initialize the coordinates and data would be very nice.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1839/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
497427114 MDU6SXNzdWU0OTc0MjcxMTQ= 3337 Dataset.groupby reductions give "Dataset does not contain dimensions error" in v0.13 nbren12 1386642 closed 0     1 2019-09-24T03:01:00Z 2019-10-10T18:23:22Z 2019-10-10T18:23:22Z CONTRIBUTOR      

MCVE Code Sample

```python

ds = xr.DataArray(np.ones((4,5)), dims=['z', 'x']).to_dataset(name='a') ds.a.groupby('z').mean() <xarray.DataArray 'a' (z: 4)> array([1., 1., 1., 1.]) Dimensions without coordinates: z ds.groupby('z').mean() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/common.py", line 91, in wrapped_func kwargs File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/groupby.py", line 848, in reduce return self.apply(reduce_dataset) File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/groupby.py", line 796, in apply return self._combine(applied) File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/groupby.py", line 800, in _combine applied_example, applied = peek_at(applied) File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/utils.py", line 181, in peek_at peek = next(gen) File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/groupby.py", line 795, in <genexpr> applied = (func(ds, *args, kwargs) for ds in self._iter_grouped()) File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/groupby.py", line 846, in reduce_dataset return ds.reduce(func, dim, keep_attrs, **kwargs) File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/dataset.py", line 3888, in reduce "Dataset does not contain the dimensions: %s" % missing_dimensions ValueError: Dataset does not contain the dimensions: ['z'] ds.dims Frozen(SortedKeysDict({'z': 4, 'x': 5})) ```

Problem Description

Groupby reduction operations on Dataset objects no longer seem to work in xarray v0.13. In the example, above I create an xarray dataset with one dataarray called "a". The same groupby operations fails on this Dataset, but succeeds when called directly on "a". Is this a bug or an intended change?

In addition the error message is confusing since z is one of the Dataset dimensions.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 14:38:56) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.13.0 pandas: 0.25.1 numpy: 1.17.2 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: None pytest: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3337/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
261131958 MDExOlB1bGxSZXF1ZXN0MTQzNTExMTA3 1597 Add methods for combining variables of differing dimensionality nbren12 1386642 closed 0     46 2017-09-27T22:01:57Z 2019-07-05T15:59:51Z 2019-07-05T00:32:51Z CONTRIBUTOR   0 pydata/xarray/pulls/1597
  • [x] Closes #1317
  • [x] Tests added / passed
  • [x] Passes git diff upstream/master | flake8 --diff
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

While working on #1317, I settled upon combining stack and to_array to create two dimensional numpy arrays given an xarray Dataset. Unfortunately, to_array automatically broadcasts the variables of dataset, which is not always a desirable behavior. For instance, I was trying to combine precipitation (a horizontal field) and temperature (a 3D field) into one array.

This PR enables this by adding two new methods to xarray: - Dataset.stack_cat, and - DataArray.unstack_cat.

stack_cat uses stack, expand_dims, and concat to reshape a Dataset into a Dataarray with a helpful MultiIndex, and unstack_cat reverses the process.

I implemented this functionality as a new method since to_array is such a clean method already. I really appreciate your thoughts on this. Thanks!

cc @jhamman @shoyer

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1597/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
216215022 MDU6SXNzdWUyMTYyMTUwMjI= 1317 API for reshaping DataArrays as 2D "data matrices" for use in machine learning nbren12 1386642 closed 0     9 2017-03-22T21:33:07Z 2019-07-05T00:32:51Z 2019-07-05T00:32:51Z CONTRIBUTOR      

Machine learning and linear algebra problems are often expressed in terms of operations on matrices rather than arrays of arbitrary dimension, and there is currently no convenient way to turn DataArrays (or combinations of DataArrays) into a single "data matrix".

As an example, I have needed to use scikit-learn lately with data from DataArray objects. Scikit-learn requires the data to be expressed in terms of simple 2-dimensional matrices. The rows are called samples, and the columns are known as features. It is annoying and error to transpose and reshape a data array by hand to fit into this format. For instance, this gituhub repo for xarray aware sklearn-like objects devotes many lines of code to massaging data arrays into data matrices. I think that this reshaping workflow might be common enough to warrant some kind of treatment in xarray.

I have written some code in this gist, that have found pretty convenient for doing this. This gist has an XRReshaper class which can be used for reshaping data to and from a matrix format. The basic usage for an EOF analysis of a dataset A(lat, lon, time) can be done like this ```python feature_dims = ['lat', 'lon']

rs = XRReshaper(A) data_matrix, _ = rs.to(feature_dims)

Some linear algebra or machine learning

,, eofs = svd(data_matrix)

eofs_datarray = rs.get(eofs[0], ['mode'] + feature_dims) ```

I am not sure this is the best API, but it seems to work pretty well and I have used it here to implement some xarray-aware sklearn-like objects for PCA, which can be used like feature_dims = ['lat', 'lon'] pca = XPCA(feature_dims, n_components=10, weight=cos(A.lat)) pca.fit(A) pca.transform(A) eofs = pca.components_

Another syntax which might be helpful is some kind of context manager approach like ```python with XRReshaper(A) as rs, data_matrix: # do some stuff with data_matrix

use rs to restore output to a data array.

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1317/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
294089233 MDExOlB1bGxSZXF1ZXN0MTY2OTQ5Nzcw 1885 Raise when pcolormesh coordinate is not sorted nbren12 1386642 closed 0     18 2018-02-03T06:37:34Z 2018-02-18T19:26:36Z 2018-02-18T19:06:31Z CONTRIBUTOR   0 pydata/xarray/pulls/1885
  • [x] Closes #1852 (remove if there is no corresponding issue, which should only be the case for minor changes)
  • [x] Tests added (for all bug fixes or enhancements)
  • [x] Tests passed (for all non-documentation changes)

I added a simple warning to _infer_interval_breaks in xarray/plot/plot.py. The warning does not currently say the name of the coordinate, because that would requiring introducing a new function or potentially passing a name argument, which seems overly complicated for such a small dit. Hopefully, this isn't a problem because the user can easily figure out which coordinate is not sorted by process of elimination.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1885/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
291103680 MDU6SXNzdWUyOTExMDM2ODA= 1852 bug: 2D pcolormesh plots are wrong when coordinate is not ascending order nbren12 1386642 closed 0     9 2018-01-24T07:01:07Z 2018-02-18T19:06:31Z 2018-02-18T19:06:31Z CONTRIBUTOR      

Code Sample, a copy-pastable example if possible

```python import matplotlib.pyplot as plt import numpy as np import xarray as xr

x = np.arange(10) y = np.arange(20)

np.random.shuffle(x)

x = xr.DataArray(x, dims=['x'], coords={'x': x}) y = xr.DataArray(y, dims=['y'], coords={'y': y})

z = x + y z_sorted = z.isel(x=np.argsort(x.values))

make plot

fig, axs= plt.subplots(1, 2, figsize=(6,3)) z_sorted.plot(ax=axs[0]) axs[0].set_title("X is sorted")

z.plot(ax=axs[1]) axs[1].set_title("X is not unsorted") plt.tight_layout() ```

Problem description

Sometime the coordinates in an xarray dataset are not always sorted in ascending order. I recently had an issue where the time coordinate of a 2D datasets was scrambled, so calling x.plot gave very strange results. In my opinion, x.plot should probably sort the data along the coordinates, or at least provide a warning if the coordinates are unsorted.

Expected Output

Here is the image generated by the snippet above:

The left and right panels should be the same.

Paste the output here xr.show_versions() here

INSTALLED VERSIONS ------------------ commit: None python: 3.6.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.0.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.0+dev50.ga988dc2 pandas: 0.20.3 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.3.1 h5netcdf: 0.5.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.15.2 distributed: 1.18.3 matplotlib: 2.0.2 cartopy: None seaborn: 0.8.0 setuptools: 36.5.0.post20170921 pip: 9.0.1 conda: 4.3.29 pytest: 3.2.1 IPython: 6.1.0 sphinx: 1.6.3
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1852/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
258640421 MDU6SXNzdWUyNTg2NDA0MjE= 1577 Potential error in apply_ufunc docstring for input_core_dims nbren12 1386642 closed 0     5 2017-09-18T22:28:10Z 2017-10-10T04:42:21Z 2017-10-10T04:42:21Z CONTRIBUTOR      

The documentation for input_core_dims reads: ` input_core_dims : Sequence[Sequence], optional List of the same length asargs`` giving the list of core dimensions on each input argument that should be broadcast. By default, we assume there are no core dimensions on any input arguments.

    For example ,``input_core_dims=[[], ['time']]`` indicates that all
    dimensions on the first argument and all dimensions other than 'time'
    on the second argument should be broadcast.

```

The first and second paragraphs seem contradictory to me. Shouldn't the first paragraph be changed to:

List of the same length as ``args`` giving the list of core dimensions on each input argument that should *not* be broadcast. By default, we assume there are no core dimensions on any input arguments.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1577/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 29.377ms · About: xarray-datasette