github: issues: 2 rows where repo = 13221727, state = "open" and user = 40218891 sorted by updated

2 rows where repo = 13221727, state = "open" and user = 40218891 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1966264258	I_kwDOAMm_X851Ms_C	8385	The method to_netcdf does not preserve chunks	yt87 40218891	open	0			3	2023-10-27T22:29:45Z	2023-10-31T18:51:45Z		NONE				What happened? Methods `to_zarr` and `to_netcdf` behave inconsistently for chunked dataset. The latter does not preserve existing chunk information, the chunks must be specified within the `encoding` dictionary. What did you expect to happen? I expected the behaviour to be consistent for for all `to_XXX()` methods. Minimal Complete Verifiable Example ```Python import xarray as xr import dask.array as da rng = da.random.RandomState() shape = (20, 20) chunks = [10, 10] dims = ["x", "y"] z = rng.standard_normal(shape, chunks=chunks) ds = xr.DataArray(z, dims=dims, name="z").to_dataset() ds.chunks This one is rechunked ds.to_netcdf("/tmp/test1.nc", encoding={"z": {"chunksizes": (5, 5)}}) This one is not rechunked, also original chunks are lost ds.chunk({"x": 5, "y": 5}).to_netcdf("/tmp/test2.nc") This one is rechunked ds.chunk({"x": 5, "y": 5}).to_zarr("/tmp/test2", mode="w") Frozen({'x': (10, 10), 'y': (10, 10)}) <xarray.backends.zarr.ZarrStore at 0x7f3669f1af80> xr.open_mfdataset("/tmp/test1.nc").chunks xr.open_mfdataset("/tmp/test2.nc").chunks xr.open_mfdataset("/tmp/test2", engine="zarr").chunks Frozen({'x': (5, 5, 5, 5), 'y': (5, 5, 5, 5)}) Frozen({'x': (20,), 'y': (20,)}) Frozen({'x': (5, 5, 5, 5), 'y': (5, 5, 5, 5)}) ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output No response Anything else we need to know? I did get the same results for `h5netcdf` and `scipy` backends, so I am not sure whether this is a bug or not. The above code is a modified version of #2198. A suggestion: the documentation provides only examples of encoding styles. It would be helpful to provide links to a full specification. Environment INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 \| packaged by conda-forge \| (main, Oct 3 2023, 10:40:35) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.5.5-1-MANJARO machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2 xarray: 2023.10.1 pandas: 2.1.1 numpy: 1.24.4 scipy: 1.11.3 netCDF4: 1.6.4 pydap: None h5netcdf: 1.2.0 h5py: 3.10.0 Nio: None zarr: 2.16.1 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.10.0 distributed: 2023.10.0 matplotlib: 3.8.0 cartopy: 0.22.0 seaborn: None numbagg: 0.5.1 fsspec: 2023.10.0 cupy: None pint: None sparse: 0.14.0 flox: 0.8.1 numpy_groupies: 0.10.2 setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: None mypy: None IPython: 8.16.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8385/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			xarray 13221727	issue
789653499	MDU6SXNzdWU3ODk2NTM0OTk=	4830	GH2550 revisited	yt87 40218891	open	0			2	2021-01-20T05:40:16Z	2021-01-25T23:06:01Z		NONE				Is your feature request related to a problem? Please describe. I am retrieving files from AWS: https://registry.opendata.aws/wrf-se-alaska-snap/. An example: ``` import s3fs import xarray as xr s3 = s3fs.S3FileSystem(anon=True) s3path = 's3://wrf-se-ak-ar5/gfdl/hist/daily/1980/WRFDS_1980-01-0[12].nc' remote_files = s3.glob(s3path) fileset = [s3.open(file) for file in remote_files] ds = xr.open_mfdataset(fileset, concat_dim='Time', decode_cf=False) ds ``` Data files for 1980 are missing time coordinate, so the above code fails. The time could be obtained by parsing file name, however in the current implementation the source attribute is available only when the fileset consists of strings or Paths. Describe the solution you'd like I would suggest to return to the original suggestion in #2550 - pass filename_or_object as an argument to preprocess function, but with necessary inspection. Here is my attempt (code in open_mfdataset): ``` open_kwargs = dict( engine=engine, chunks=chunks or {}, lock=lock, autoclose=autoclose, kwargs ) if preprocess is not None: # Get number of free arguments from inspect import signature parms = signature(preprocess).parameters num_preprocess_args = len([p for p in parms.values() if p.default == p.empty]) if num_preprocess_args not in (1, 2): raise ValueError('preprocess accepts only 1 or 2 arguments') if parallel: import dask # wrap the open_dataset, getattr, and preprocess with delayed open_ = dask.delayed(open_dataset) getattr_ = dask.delayed(getattr) if preprocess is not None: preprocess = dask.delayed(preprocess) else: open_ = open_dataset getattr_ = getattr datasets = [open_(p, open_kwargs) for p in paths] file_objs = [getattr_(ds, "_file_obj") for ds in datasets] if preprocess is not None: if num_preprocess_args == 1: datasets = [preprocess(ds) for ds in datasets] else: datasets = [preprocess(ds, p) for (ds, p) in zip(datasets, paths)] `With this, I can define function fix as follows:` def fix(ds, source): vtime = datetime.strptime(os.path.basename(source.path), 'WRFDS_%Y-%m-%d.nc') return ds.assign_coords(Time=[vtime]) ds = xr.open_mfdataset(fileset, preprocess=fix, concat_dim='Time', decode_cf=False) `This is backward compatible, preprocess can accept any number of arguments:` from functools import partial import xarray as xr def fix1(ds): print('fix1') return ds def fix2(ds, file): print('fix2:', file.as_uri()) return ds def fix3(ds, file, arg): print('fix3:', file.as_uri(), arg) return ds fileset = [Path('/home/george/Downloads/WRFDS_1988-04-23.nc'), Path('/home/george/Downloads/WRFDS_1988-04-24.nc') ] ds = xr.open_mfdataset(fileset, preprocess=fix1, concat_dim='Time', parallel=True) ds = xr.open_mfdataset(fileset, preprocess=fix2, concat_dim='Time') ds = xr.open_mfdataset(fileset, preprocess=partial(fix3, arg='additional argument'), concat_dim='Time') fix1 fix1 fix2: file:///home/george/Downloads/WRFDS_1988-04-23.nc fix2: file:///home/george/Downloads/WRFDS_1988-04-24.nc fix3: file:///home/george/Downloads/WRFDS_1988-04-23.nc additional argument fix3: file:///home/george/Downloads/WRFDS_1988-04-24.nc additional argument ``` Describe alternatives you've considered The simple solution would be to make xarray s3fs aware. IMHO this is not particularly elegant. Either a check for an attribute, or an import within a try/except block would be needed.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4830/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

2 rows where repo = 13221727, state = "open" and user = 40218891 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

This one is rechunked

This one is not rechunked, also original chunks are lost

This one is rechunked

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

Advanced export