home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

8 rows where repo = 13221727 and user = 868027 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, closed_at, created_at (date), updated_at (date), closed_at (date)

type 2

  • pull 6
  • issue 2

state 2

  • closed 7
  • open 1

repo 1

  • xarray · 8 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1167883842 I_kwDOAMm_X85FnH5C 6352 to_netcdf from subsetted Dataset with strings loaded from char array netCDF can sometimes fail DocOtak 868027 open 0     0 2022-03-14T04:52:38Z 2022-04-09T16:59:52Z   CONTRIBUTOR      

What happened?

Not quite sure what to actually title this, so feel free to edit it.

I have some netcdf files modeled after the Argo _prof file format (CF Discrete sampling geometry incomplete multidimensional array representation). While working on splitting these into individual profiles, I would occasionally get exceptions thrown complaining about broadcasting. I eventually narrowed this down to some string variables we maintain for historic purposes. Depending on the row split apart, the string data in each cell could be shorter which would result in a stringN having some different N (e.g. string4 = 3 in the CDL). If while serializing, a different string variable is being encoded that actually has length 4, it would reuse the now incorrect string4 dim name.

The above situation seems to only occur when a netCDF file is read back into xarray and the char_dim_name encoding key is set.

What did you expect to happen?

Successful serialization to netCDF.

Minimal Complete Verifiable Example

```Python

setup

import numpy as np import xarray as xr

one_two = xr.DataArray(np.array(["a", "aa"], dtype="object"), dims=["dim0"]) two_two = xr.DataArray(np.array(["aa", "aa"], dtype="object"), dims=["dim0"]) ds = xr.Dataset({"var0": one_two, "var1": two_two}) ds.var0.encoding["dtype"] = "S1" ds.var1.encoding["dtype"] = "S1"

need to write out and read back in

ds.to_netcdf("test.nc")

only selecting the shorter string will fail

ds1 = xr.load_dataset("test.nc") ds1[{"dim0": 1}].to_netcdf("ok.nc") ds1[{"dim0": 0}].to_netcdf("error.nc")

will work if the char dim name is removed from encoding of the now shorter arr

ds1 = xr.load_dataset("test.nc") del ds1.var0.encoding["char_dim_name"] ds1[{"dim0": 0}].to_netcdf("will_work.nc") ```

Relevant log output

```Python

IndexError Traceback (most recent call last) /var/folders/y1/63dlf4614h5d2cgr5g1t_5lh0000gn/T/ipykernel_64155/447008818.py in <module> 2 ds1 = xr.load_dataset("test.nc") 3 ds1[{"dim0": 1}].to_netcdf("ok.nc") ----> 4 ds1[{"dim0": 0}].to_netcdf("error.nc")

~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1899 from ..backends.api import to_netcdf 1900 -> 1901 return to_netcdf( 1902 self, 1903 path,

~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1070 # TODO: allow this work (setting up the file for writing array data) 1071 # to be parallelized with dask -> 1072 dump_to_store( 1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1074 )

~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1117 variables, attrs = encoder(variables, attrs) 1118 -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1120 1121

~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 263 self.set_attributes(attributes) 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims) --> 265 self.set_variables( 266 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 267 )

~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 305 ) 306 --> 307 writer.add(source, target) 308 309 def set_dimensions(self, variables, unlimited_dims=None):

~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in add(self, source, target, region) 154 target[region] = source 155 else: --> 156 target[...] = source 157 158 def sync(self, compute=True):

~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in setitem(self, key, value) 70 with self.datastore.lock: 71 data = self.get_array(needs_lock=False) ---> 72 data[key] = value 73 if self.datastore.autoclose: 74 self.datastore.close(needs_lock=False)

src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.setitem()

src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._put()

IndexError: size of data array does not conform to slice ```

Anything else we need to know?

I've been unable to recreate the specific error I'm getting in a minimal example. However, removing the char_dim_name encoding key does solve this.

When digging in the xarray issues, these looked maybe relevant: #2219 #2895

Actual traceback I get with my data ```python --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /var/folders/y1/63dlf4614h5d2cgr5g1t_5lh0000gn/T/ipykernel_64155/3328648456.py in <module> ----> 1 ds[{"N_PROF": 0}].to_netcdf("test.nc") ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1899 from ..backends.api import to_netcdf 1900 -> 1901 return to_netcdf( 1902 self, 1903 path, ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1070 # TODO: allow this work (setting up the file for writing array data) 1071 # to be parallelized with dask -> 1072 dump_to_store( 1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1074 ) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1117 variables, attrs = encoder(variables, attrs) 1118 -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1120 1121 ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 263 self.set_attributes(attributes) 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims) --> 265 self.set_variables( 266 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 267 ) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 305 ) 306 --> 307 writer.add(source, target) 308 309 def set_dimensions(self, variables, unlimited_dims=None): ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in add(self, source, target, region) 154 target[region] = source 155 else: --> 156 target[...] = source 157 158 def sync(self, compute=True): ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in __setitem__(self, key, value) 70 with self.datastore.lock: 71 data = self.get_array(needs_lock=False) ---> 72 data[key] = value 73 if self.datastore.autoclose: 74 self.datastore.close(needs_lock=False) src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__setitem__() ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/netCDF4/utils.py in _StartCountStride(elem, shape, dimensions, grp, datashape, put, use_get_vars) 354 fullslice = False 355 if fullslice and datashape and put and not hasunlim: --> 356 datashape = broadcasted_shape(shape, datashape) 357 358 # pad datashape with zeros for dimensions not being sliced (issue #906) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/netCDF4/utils.py in broadcasted_shape(shp1, shp2) 962 a = as_strided(x, shape=shp1, strides=[0] * len(shp1)) 963 b = as_strided(x, shape=shp2, strides=[0] * len(shp2)) --> 964 return np.broadcast(a, b).shape ValueError: shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (5,) and arg 1 with shape (6,). ```

Environment

INSTALLED VERSIONS

commit: None python: 3.9.9 (main, Jan 5 2022, 11:21:18) [Clang 13.0.0 (clang-1300.0.29.30)] python-bits: 64 OS: Darwin OS-release: 21.3.0 machine: arm64 processor: arm byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.13.0 libnetcdf: 4.8.1

xarray: 2022.3.0 pandas: 1.3.5 numpy: 1.22.0 scipy: None netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: 0.18 sparse: None setuptools: 58.1.0 pip: 21.2.4 conda: None pytest: 6.2.5 IPython: 7.31.0 sphinx: 4.4.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6352/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
477427854 MDExOlB1bGxSZXF1ZXN0MzA0NzU0OTc2 3187 reduce the size of example dataset in dask docs DocOtak 868027 closed 0     4 2019-08-06T14:50:27Z 2019-08-06T20:41:39Z 2019-08-06T20:41:38Z CONTRIBUTOR   0 pydata/xarray/pulls/3187

Another attempt at getting the docs to build again on RTD (#3182).

The current failure is due to high memory usage in the dask examples. I've converted the two most memory expensive code blocks into :verbatim: blocks so they don't use any additional memory. The contents of these blocks were taken from the last time the docs successfully built on RTD

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3187/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
477084478 MDExOlB1bGxSZXF1ZXN0MzA0NDgxOTY3 3186 bump rasterio to 1.0.24 in doc building environment DocOtak 868027 closed 0     2 2019-08-05T22:24:59Z 2019-08-06T01:20:16Z 2019-08-06T01:20:15Z CONTRIBUTOR   0 pydata/xarray/pulls/3186

This is hopefully a fix for #3182 but I wasn't sure how to really test this on read the docs (RTD) itself.

There may be a few things going on: * Local testing showed removing the "auto_gallery" dir would cause the failing gallery examples to actually be recognized as failing (i.e. bust the cache) * https://github.com/conda-forge/rasterio-feedstock/issues/118 and some examining of the conda list output on RTD for led me to think this was a deeper problem. Local testing with a bumped version of rasterio resulted in conda-forge being the channel source (rather than pip?), it also fixed the failing builds which relied on rasterio.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3186/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
476323960 MDExOlB1bGxSZXF1ZXN0MzAzOTAzNjg1 3180 enable sphinx.ext.napoleon DocOtak 868027 closed 0     3 2019-08-02T19:26:46Z 2019-08-02T21:17:43Z 2019-08-02T21:17:43Z CONTRIBUTOR   0 pydata/xarray/pulls/3180

Enables the napoleon extension in sphinx. This will interpret the numpydoc style parameters and types and convert them to sphinx :param type name: Note that sphinx.ext.napoleon must come before the numpydoc extension.

Eventually the numpydoc dependency might be able to be removed but currently removing it makes the wrapped ufunc documentation omit the "parameters" and "returns", see the attached screenshot for what one of these ufuncs looks like with numpydoc removed.

  • [ ] Closes #3056

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3180/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
476317653 MDExOlB1bGxSZXF1ZXN0MzAzODk4NDE4 3179 remove type annotations from autodoc method signatures DocOtak 868027 closed 0     1 2019-08-02T19:07:44Z 2019-08-02T20:17:59Z 2019-08-02T20:17:58Z CONTRIBUTOR   0 pydata/xarray/pulls/3179

This PR removes all the type hints from method signatures generated by sphinx.ext.autodoc. See http://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#confval-autodoc_typehints

The sphinx documentation doesn't say which version this was added in, but I imagine it is quite recent.

  • [ ] Closes #3178
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3179/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
354539078 MDExOlB1bGxSZXF1ZXN0MjExMjc3OTMz 2386 fix typo in uri in the docs DocOtak 868027 closed 0     1 2018-08-28T01:46:56Z 2018-08-28T01:49:02Z 2018-08-28T01:48:59Z CONTRIBUTOR   0 pydata/xarray/pulls/2386

Seems I left off a trailing > in the whats-new.rst. Here is a minor fix for that.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2386/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
345322908 MDExOlB1bGxSZXF1ZXN0MjA0NTA2MTg2 2322 BUG: modify behavior of Dataset.filter_by_attrs to match netCDF4.Data… DocOtak 868027 closed 0     5 2018-07-27T18:25:42Z 2018-08-28T01:48:00Z 2018-08-28T01:21:20Z CONTRIBUTOR   0 pydata/xarray/pulls/2322

Here is my fix for #2315 which matches the behavior of Dataset.filter_by_attrs to be that of its netCDF4 inspiration. I followed the pattern seen in the netCDF4 library which set a boolean flag while looping over attributes and short circuits the loop if any test returns false. Only if all the key=value/callable tests pass is the variable added to the output.

  • [x] Closes #2315
  • [x] Tests added (for all bug fixes or enhancements)
  • [x] Tests passed (for all non-documentation changes)
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2322/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
344631360 MDU6SXNzdWUzNDQ2MzEzNjA= 2315 Behavior of filter_by_attrs() does not match netCDF4.Dataset.get_variables_by_attributes DocOtak 868027 closed 0     8 2018-07-25T22:35:14Z 2018-08-28T01:21:20Z 2018-08-28T01:21:20Z CONTRIBUTOR      

When using the filter_by_attrs() method of a Dataset it was returning more matches than I expected. The returned match set appears to be the logical OR of all the key=value pairs passed into the method. This does not match the behavior of the netCDF4.Dataset.get_variables_by_attributes which returns the logical AND of all the key=value pairs passed into it. I couldn't tell from the documentation if this was the intended behavior.

Minimal Example

python import xarray as xr example_dataset = xr.Dataset({ "var1": xr.DataArray([], attrs={"standard_name": "example1", "priority": 0}), "var2": xr.DataArray([], attrs={"standard_name": "example2"}) }) example_dataset.filter_by_attrs(standard_name="example2", priority=0)

Example Output

python <xarray.Dataset> Dimensions: (dim_0: 0) Dimensions without coordinates: dim_0 Data variables: var1 (dim_0) float64 var2 (dim_0) float64

Expected Output

python <xarray.Dataset> Dimensions: () Data variables: *empty*

Alternatively, chaining calls to filter_by_attrs will result in the expected behavior: ```python import xarray as xr example_dataset = xr.Dataset({ "var1": xr.DataArray([], attrs={"standard_name": "example1", "priority": 0}), "var2": xr.DataArray([], attrs={"standard_name": "example2"}) }) example_dataset.filter_by_attrs(standard_name="example2").filter_by_attrs(priority=0)

<xarray.Dataset> Dimensions: () Data variables: empty ```

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.23.3 numpy: 1.14.5 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.18.2 distributed: 1.22.0 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: None setuptools: 39.0.1 pip: 10.0.1 conda: None pytest: 3.6.3 IPython: 6.4.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2315/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 55.267ms · About: xarray-datasette