github: issues: 3 rows where user = 18426352 sorted by updated

3 rows where user = 18426352 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	draft	pull_request	body	reactions	state_reason	repo	type
1391738128	I_kwDOAMm_X85S9D0Q	7109	Multiprocessing unable to pickle Dataset opened with open_mfdataset	DanielAdriaansen 18426352	closed	4	2022-09-30T02:43:43Z	2022-10-11T16:44:36Z	2022-10-11T16:44:35Z	CONTRIBUTOR			What happened? When passing a Dataset object opened using `open_mfdataset` to a function via Python's mutliprocessing.Pool module, I received the following error: `AttributeError: Can't pickle local object 'open_mfdataset.<locals>.multi_file_closer` What did you expect to happen? I expected the Dataset to be handed off to the function via multiprocessing without error. I can remove the error by using variable subsetting or other reduction, like via `where`, so I don't understand why the original Dataset object returned from open_mfdataset cannot be used. Minimal Complete Verifiable Example ```Python !/usr/bin/env python import xarray as xr import numpy as np import glob import multiprocessing Create toy DataArrays temperature = np.array([[273.15,220.2,255.5],[221.1,260.1,270.5]]) humidity = np.array([[70.2,85.4,29.6],[30.3,55.4,100.0]]) da1 = xr.DataArray(temperature,dims=['y0','x0'],coords={'y0':np.array([0,1]),'x0':np.array([0,1,2])}) da2 = xr.DataArray(humidity,dims=['y0','x0'],coords={'y0':np.array([0,1]),'x0':np.array([0,1,2])}) Create a toy Dataset ds = xr.Dataset({'TEMP_K':da1,'RELHUM':da2}) Write the toy Dataset to disk ds.to_netcdf('xarray_pickle_dataset.nc') Function to use with open_mfdataset def preprocess(ds): ds = ds.rename({'TEMP_K':'temp_k'}) return(ds) Function for using with multiprocessing def calc_stats(ds,stat_name): if stat_name=='mean': return(ds.mean(dim=['y0']).to_dataframe()) Get a pool of workers mp = multiprocessing.Pool(5) Glob for the file ncfiles = glob.glob('xarray*.nc') Can we call open_mfdataset() on a ds in memory? datasets = [xr.open_dataset(x) for x in ncfiles] datasets = [xr.open_mfdataset([x],preprocess=preprocess) for x in ncfiles] TEST 1: ERROR results = mp.starmap(calc_stats,[(ds,'mean') for ds in datasets]) print(results) TEST 2: PASS results = mp.starmap(calc_stats,[(ds[['temp_k','RELHUM']],'mean') for ds in datasets]) print(results) TEST 3: ERROR results = mp.starmap(calc_stats,[(ds.isel(x0=0),'mean') for ds in datasets]) print(results) TEST 4: PASS results = mp.starmap(calc_stats,[(ds.where(ds.RELHUM>80.0),'mean') for ds in datasets]) print(results) TEST 5: ERROR results = mp.starmap(calc_stats,[(ds.sel(x0=slice(0,1,1)),'mean') for ds in datasets]) print(results) ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output Python Traceback (most recent call last): File "/d1/git/xarray_pickle_dataset.py", line 35, in <module> results = mp.starmap(calc_stats,[(ds,'mean') for ds in datasets]) File "/home/.conda/envs/icing/lib/python3.9/multiprocessing/pool.py", line 372, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "/home/.conda/envs/icing/lib/python3.9/multiprocessing/pool.py", line 771, in get raise self._value File "/home/.conda/envs/icing/lib/python3.9/multiprocessing/pool.py", line 537, in _handle_tasks put(task) File "/home/.conda/envs/icing/lib/python3.9/multiprocessing/connection.py", line 211, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/home/.conda/envs/icing/lib/python3.9/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) AttributeError: Can't pickle local object 'open_mfdataset.<locals>.multi_file_closer' Anything else we need to know? Not shown in the verifiable example was another way I was able to get it to work, which looked like this: `results = mp.starmap(calc_stats,[(ds.sel(x0=ds.xvalues,y0=ds.yvalues),'mean') for ds in datasets]) print(results)` I can only assume that under the hood passing `ds.xvalues` (a 1D DataArray within the Dataset) to `sel` is transforming the Dataset enough to avoid the pickling error. The error does NOT occur when using `open_dataset`, eg: `datasets = [xr.open_dataset(x) for x in ncfiles]` will work. However, in my workflow I would prefer to use `open_mfdataset` to perform some preprocessing using `preprocess` even though I am only opening one Dataset at a time. Environment xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.9.12 \| packaged by conda-forge \| (main, Mar 24 2022, 23:25:59) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 4.19.0-21-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.4.2 numpy: 1.22.3 scipy: 1.8.0 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.05.0 distributed: 2022.5.0 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: None numbagg: None fsspec: 2022.3.0 cupy: None pint: 0.19.2 sparse: None setuptools: 62.1.0 pip: 22.0.4 conda: None pytest: None IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7109/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1394947888	PR_kwDOAMm_X85AESPp	7116	Fix pickling of Datasets created using open_mfdataset	DanielAdriaansen 18426352	closed	2	2022-10-03T15:42:41Z	2022-10-11T16:44:35Z	2022-10-11T16:44:35Z	CONTRIBUTOR	0	pydata/xarray/pulls/7116	[x] Closes #7109 [x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7116/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
774553196	MDU6SXNzdWU3NzQ1NTMxOTY=	4733	Xarray with cfgrib backend errors with .where() when drop=True	DanielAdriaansen 18426352	closed	4	2020-12-24T20:26:58Z	2021-01-02T08:17:37Z	2021-01-02T08:17:37Z	CONTRIBUTOR			What happened: When loading a HRRR GRIBv2 file in this manner: `ds = xr.open_dataset(gribfile_path,engine='cfgrib',backend_kwargs={'filter_by_keys':{'typeOfLevel':'hybrid'}})` I have trouble using the `.where()` method when `drop=True`. If I set `drop=False`, it works fine. I am attempting to subset via latitude and longitude like this: `ds_sub = ds.where((ds.latitude>=40.0)&(ds.latitude<=50.0)&(ds.longitude>=252.0)&(ds.longitude<=280.0),drop=True)` However I receive the following errors: Traceback (most recent call last): File "read_interp_format_grib.py", line 77, in <module> test = ds.where(mask_lat & mask_lon,drop=True) File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/common.py", line 1268, in where return ops.where_method(self, cond, other) File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/ops.py", line 193, in where_method return apply_ufunc( File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/computation.py", line 1092, in apply_ufunc return apply_dataset_vfunc( File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/computation.py", line 410, in apply_dataset_vfunc result_vars = apply_dict_of_variables_vfunc( File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/computation.py", line 356, in apply_dict_of_variables_vfunc result_vars[name] = func(variable_args) File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/computation.py", line 606, in apply_variable_ufunc input_data = [ File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/computation.py", line 607, in <listcomp> broadcast_compat_data(arg, broadcast_dims, core_dims) File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/computation.py", line 522, in broadcast_compat_data data = variable.data File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/variable.py", line 359, in data return self.values File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/variable.py", line 510, in values return _as_array_or_item(self._data) File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/variable.py", line 272, in _as_array_or_item data = np.asarray(data) File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/numpy/core/_asarray.py", line 83, in asarray return array(a, dtype, copy=False, order=order) File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/indexing.py", line 685, in __array__ self._ensure_cached() File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/indexing.py", line 682, in _ensure_cached self.array = NumpyIndexingAdapter(np.asarray(self.array)) File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/numpy/core/_asarray.py", line 83, in asarray return array(a, dtype, copy=False, order=order) File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/indexing.py", line 655, in __array__ return np.asarray(self.array, dtype=dtype) File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/numpy/core/_asarray.py", line 83, in asarray return array(a, dtype, copy=False, order=order) File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/indexing.py", line 560, in __array__ return np.asarray(array[self.key], dtype=None) File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/backends/cfgrib_.py", line 23, in __getitem__ return indexing.explicit_indexing_adapter( File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/indexing.py", line 848, in explicit_indexing_adapter result = NumpyIndexingAdapter(np.asarray(result))[numpy_indices] File "/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/indexing.py", line 1294, in __getitem__ return array[key] IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed What you expected to happen: I expect the dataset to be reduced in the x and y (latitude/longitude) dimensions where `cond=False` from `.where()` when `drop=True`. Minimal Complete Verifiable Example: ```python Put your MCVE code here ``` Anything else we need to know?: I was able to confirm cfgrib is where the issue lies by doing the following: `ds = xr.open_dataset(gribfile_path,engine='cfgrib',backend_kwargs={'filter_by_keys':{'typeOfLevel':'hybrid'}}) ds.to_netcdf('test.nc') dsnc = xr.open_dataset('test.nc') ds_sub = dsnc.where((dsnc.latitude>=40.0)&(dsnc.latitude<=50.0)&(dsnc.longitude>=252.0)&(dsnc.longitude<=280.0),drop=True)` That correctly gives me: `Dimensions: (hybrid: 50, x: 792, y: 414)` Originally x=1799 and y=1059. Environment*: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 \| packaged by conda-forge \| (default, Oct 7 2020, 19:08:05) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.9.0-14-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.3 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.5.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.5 iris: None bottleneck: None dask: 2.30.0 distributed: 2.30.0 matplotlib: 3.3.2 cartopy: 0.17.0 seaborn: None numbagg: None pint: 0.16.1 setuptools: 49.6.0.post20200917 pip: 20.2.3 conda: None pytest: None IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4733/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

3 rows where user = 18426352 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

!/usr/bin/env python

Create toy DataArrays

Create a toy Dataset

Write the toy Dataset to disk

Function to use with open_mfdataset

Function for using with multiprocessing

Get a pool of workers

Glob for the file

Can we call open_mfdataset() on a ds in memory?

datasets = [xr.open_dataset(x) for x in ncfiles]

TEST 1: ERROR

TEST 2: PASS

results = mp.starmap(calc_stats,[(ds[['temp_k','RELHUM']],'mean') for ds in datasets])

print(results)

TEST 3: ERROR

results = mp.starmap(calc_stats,[(ds.isel(x0=0),'mean') for ds in datasets])

print(results)

TEST 4: PASS

results = mp.starmap(calc_stats,[(ds.where(ds.RELHUM>80.0),'mean') for ds in datasets])

print(results)

TEST 5: ERROR

results = mp.starmap(calc_stats,[(ds.sel(x0=slice(0,1,1)),'mean') for ds in datasets])

print(results)

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

Put your MCVE code here

Advanced export