home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

2 rows where state = "open", type = "issue" and user = 40218891 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue · 2 ✖

state 1

  • open · 2 ✖

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1966264258 I_kwDOAMm_X851Ms_C 8385 The method to_netcdf does not preserve chunks yt87 40218891 open 0     3 2023-10-27T22:29:45Z 2023-10-31T18:51:45Z   NONE      

What happened?

Methods to_zarr and to_netcdf behave inconsistently for chunked dataset. The latter does not preserve existing chunk information, the chunks must be specified within the encoding dictionary.

What did you expect to happen?

I expected the behaviour to be consistent for for all to_XXX() methods.

Minimal Complete Verifiable Example

```Python import xarray as xr import dask.array as da

rng = da.random.RandomState() shape = (20, 20) chunks = [10, 10] dims = ["x", "y"] z = rng.standard_normal(shape, chunks=chunks) ds = xr.DataArray(z, dims=dims, name="z").to_dataset() ds.chunks

This one is rechunked

ds.to_netcdf("/tmp/test1.nc", encoding={"z": {"chunksizes": (5, 5)}})

This one is not rechunked, also original chunks are lost

ds.chunk({"x": 5, "y": 5}).to_netcdf("/tmp/test2.nc")

This one is rechunked

ds.chunk({"x": 5, "y": 5}).to_zarr("/tmp/test2", mode="w")

Frozen({'x': (10, 10), 'y': (10, 10)}) <xarray.backends.zarr.ZarrStore at 0x7f3669f1af80>

xr.open_mfdataset("/tmp/test1.nc").chunks xr.open_mfdataset("/tmp/test2.nc").chunks xr.open_mfdataset("/tmp/test2", engine="zarr").chunks

Frozen({'x': (5, 5, 5, 5), 'y': (5, 5, 5, 5)}) Frozen({'x': (20,), 'y': (20,)}) Frozen({'x': (5, 5, 5, 5), 'y': (5, 5, 5, 5)}) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

I did get the same results for h5netcdf and scipy backends, so I am not sure whether this is a bug or not. The above code is a modified version of #2198. A suggestion: the documentation provides only examples of encoding styles. It would be helpful to provide links to a full specification.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.5.5-1-MANJARO machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2 xarray: 2023.10.1 pandas: 2.1.1 numpy: 1.24.4 scipy: 1.11.3 netCDF4: 1.6.4 pydap: None h5netcdf: 1.2.0 h5py: 3.10.0 Nio: None zarr: 2.16.1 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.10.0 distributed: 2023.10.0 matplotlib: 3.8.0 cartopy: 0.22.0 seaborn: None numbagg: 0.5.1 fsspec: 2023.10.0 cupy: None pint: None sparse: 0.14.0 flox: 0.8.1 numpy_groupies: 0.10.2 setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: None mypy: None IPython: 8.16.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8385/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
789653499 MDU6SXNzdWU3ODk2NTM0OTk= 4830 GH2550 revisited yt87 40218891 open 0     2 2021-01-20T05:40:16Z 2021-01-25T23:06:01Z   NONE      

Is your feature request related to a problem? Please describe. I am retrieving files from AWS: https://registry.opendata.aws/wrf-se-alaska-snap/. An example: ``` import s3fs import xarray as xr

s3 = s3fs.S3FileSystem(anon=True) s3path = 's3://wrf-se-ak-ar5/gfdl/hist/daily/1980/WRFDS_1980-01-0[12].nc' remote_files = s3.glob(s3path) fileset = [s3.open(file) for file in remote_files]

ds = xr.open_mfdataset(fileset, concat_dim='Time', decode_cf=False) ds ``` Data files for 1980 are missing time coordinate, so the above code fails. The time could be obtained by parsing file name, however in the current implementation the source attribute is available only when the fileset consists of strings or Paths.

Describe the solution you'd like I would suggest to return to the original suggestion in #2550 - pass filename_or_object as an argument to preprocess function, but with necessary inspection. Here is my attempt (code in open_mfdataset): ``` open_kwargs = dict( engine=engine, chunks=chunks or {}, lock=lock, autoclose=autoclose, **kwargs )

if preprocess is not None:
    # Get number of free arguments
    from inspect import signature
    parms = signature(preprocess).parameters
    num_preprocess_args = len([p for p in parms.values() if p.default == p.empty])
    if num_preprocess_args not in (1, 2):
        raise ValueError('preprocess accepts only 1 or 2 arguments')

if parallel:
    import dask

    # wrap the open_dataset, getattr, and preprocess with delayed
    open_ = dask.delayed(open_dataset)
    getattr_ = dask.delayed(getattr)
    if preprocess is not None:
        preprocess = dask.delayed(preprocess)
else:
    open_ = open_dataset
    getattr_ = getattr

datasets = [open_(p, **open_kwargs) for p in paths]
file_objs = [getattr_(ds, "_file_obj") for ds in datasets]
if preprocess is not None:
    if num_preprocess_args == 1:
        datasets = [preprocess(ds) for ds in datasets]
    else:
        datasets = [preprocess(ds, p) for (ds, p) in zip(datasets, paths)]

With this, I can define function *fix* as follows: def fix(ds, source): vtime = datetime.strptime(os.path.basename(source.path), 'WRFDS_%Y-%m-%d.nc') return ds.assign_coords(Time=[vtime])

ds = xr.open_mfdataset(fileset, preprocess=fix, concat_dim='Time', decode_cf=False) This is backward compatible, *preprocess* can accept any number of arguments: from functools import partial import xarray as xr

def fix1(ds): print('fix1') return ds

def fix2(ds, file): print('fix2:', file.as_uri()) return ds

def fix3(ds, file, arg): print('fix3:', file.as_uri(), arg) return ds

fileset = [Path('/home/george/Downloads/WRFDS_1988-04-23.nc'), Path('/home/george/Downloads/WRFDS_1988-04-24.nc') ] ds = xr.open_mfdataset(fileset, preprocess=fix1, concat_dim='Time', parallel=True) ds = xr.open_mfdataset(fileset, preprocess=fix2, concat_dim='Time') ds = xr.open_mfdataset(fileset, preprocess=partial(fix3, arg='additional argument'), concat_dim='Time') fix1 fix1 fix2: file:///home/george/Downloads/WRFDS_1988-04-23.nc fix2: file:///home/george/Downloads/WRFDS_1988-04-24.nc fix3: file:///home/george/Downloads/WRFDS_1988-04-23.nc additional argument fix3: file:///home/george/Downloads/WRFDS_1988-04-24.nc additional argument ``` Describe alternatives you've considered The simple solution would be to make xarray s3fs aware. IMHO this is not particularly elegant. Either a check for an attribute, or an import within a try/except block would be needed.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4830/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 24.213ms · About: xarray-datasette