issues
5 rows where user = 33886395 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1962680526 | I_kwDOAMm_X850_CDO | 8377 | Slow performance with groupby using a custom DataArray grouper | alessioarena 33886395 | closed | 0 | 6 | 2023-10-26T04:28:00Z | 2024-02-15T22:44:18Z | 2024-02-15T22:44:18Z | NONE | What is your issue?I have a code that calculates a per-pixel nearest neighbor match between two datasets, to then perform a groupby + aggregation. The calculation I perform is generally lazy using dask. I recently noticed a slow performance of groupby in this way, with lazy calculations taking in excess of 10 minutes for an index of approximately 4000 by 4000. I did a bit of digging around and noticed that the slow line is this: ```Python Timer unit: 1e-09 s Total time: 0.263679 s File: /env/lib/python3.10/site-packages/xarray/core/duck_array_ops.py Function: array_equiv at line 260 Line # Hits Time Per Hit % Time Line Contents260 def array_equiv(arr1, arr2): 261 """Like np.array_equal, but also allows values to be NaN in both arrays""" 262 22140 96490101.0 4358.2 36.6 arr1 = asarray(arr1) 263 22140 34155953.0 1542.7 13.0 arr2 = asarray(arr2) 264 22140 119855572.0 5413.5 45.5 lazy_equiv = lazy_array_equiv(arr1, arr2) 265 22140 7390478.0 333.8 2.8 if lazy_equiv is None: 266 with warnings.catch_warnings(): 267 warnings.filterwarnings("ignore", "In the future, 'NAT == x'") 268 flag_array = (arr1 == arr2) | (isnull(arr1) & isnull(arr2)) 269 return bool(flag_array.all()) 270 else: 271 22140 5787053.0 261.4 2.2 return lazy_equiv Total time: 242.247 s File: /env/lib/python3.10/site-packages/xarray/core/indexing.py Function: getitem at line 1419 Line # Hits Time Per Hit % Time Line Contents1419 def getitem(self, key):
1420 22140 26764337.0 1208.9 0.0 if not isinstance(key, VectorizedIndexer):
1421 # if possible, short-circuit when keys are effectively slice(None)
1422 # This preserves dask name and passes lazy array equivalence checks
1423 # (see duck_array_ops.lazy_array_equiv)
1424 22140 10513930.0 474.9 0.0 rewritten_indexer = False
1425 22140 4602305.0 207.9 0.0 new_indexer = []
1426 66420 61804870.0 930.5 0.0 for idim, k in enumerate(key.tuple):
1427 88560 78516641.0 886.6 0.0 if isinstance(k, Iterable) and (
1428 22140 151748667.0 6854.1 0.1 not is_duck_dask_array(k)
1429 22140 2e+11 1e+07 93.6 and duck_array_ops.array_equiv(k, np.arange(self.array.shape[idim]))
1430 ):
1431 new_indexer.append(slice(None))
1432 rewritten_indexer = True
1433 else:
1434 44280 40322984.0 910.6 0.0 new_indexer.append(k)
1435 22140 4847251.0 218.9 0.0 if rewritten_indexer:
1436 key = type(key)(tuple(new_indexer))
1437 The test
This would work better because, despite that test being performed by array_equiv, currently the array to test against is always created using Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 22140 225.296 0.010 225.296 0.010 {built-in method numpy.arange} 177123 3.192 0.000 3.670 0.000 inspect.py:2920(init) 110702/110701 2.180 0.000 2.180 0.000 {built-in method numpy.asarray} 11690863/11668723 2.036 0.000 5.043 0.000 {built-in method builtins.isinstance} 287827 1.876 0.000 3.768 0.000 utils.py:25(meta_from_array) 132843 1.872 0.000 7.649 0.000 inspect.py:2280(_signature_from_function) 974166 1.485 0.000 2.558 0.000 inspect.py:2637(init) ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8377/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 } |
completed | xarray 13221727 | issue | ||||||
1397532790 | I_kwDOAMm_X85TTKh2 | 7132 | Saving a DataArray of datetime objects as zarr is not a lazy operation despite compute=False | alessioarena 33886395 | closed | 0 | 2 | 2022-10-05T09:50:34Z | 2024-01-29T19:12:32Z | 2024-01-29T19:12:32Z | NONE | What happened?Trying to save a lazy xr.DataArray of datetime objects as a zarr forces a dask.compute operation and retrieves the data to the local notebook. This is generally not a problem for indices of datetime objects as that is already locally store and generally small in size. However, if the whole underlying array is a datetime object, that can be a serious problem. In my case it simply crashed the scheduler upon attempting to retrieve the data persisted on workers. I managed to isolate the problem on this call stack. The issue is in the What did you expect to happen?Storing the data in zarr format to be performed directly by dask workers bypassing the scheduler/Client if Minimal Complete Verifiable Example```Python import numpy as np import xarray as xr import dask.array as da test = xr.DataArray( data = da.full((20000, 20000), np.datetime64('2005-02-25T03:30', 'ns')), coords = {'x': range(20000), 'y': range(20000)} ).to_dataset(name='test') print(test.test.dtype) dtype('<M8[ns]')test.to_zarr('test.zarr', compute=False) this will take a while and trigger the computation of the array. No data will be actually saved though``` MVCE confirmation
Relevant log output```Python File /env/lib/python3.8/site-packages/xarray/core/dataset.py:2036, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 2033 if encoding is None: 2034 encoding = {} -> 2036 return to_zarr( 2037 self, 2038 store=store, 2039 chunk_store=chunk_store, 2040 storage_options=storage_options, 2041 mode=mode, 2042 synchronizer=synchronizer, 2043 group=group, 2044 encoding=encoding, 2045 compute=compute, 2046 consolidated=consolidated, 2047 append_dim=append_dim, 2048 region=region, 2049 safe_chunks=safe_chunks, 2050 ) File /env/lib/python3.8/site-packages/xarray/backends/api.py:1431, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 1429 writer = ArrayWriter() 1430 # TODO: figure out how to properly handle unlimited_dims -> 1431 dump_to_store(dataset, zstore, writer, encoding=encoding) 1432 writes = writer.sync(compute=compute) 1434 if compute: File /env/lib/python3.8/site-packages/xarray/backends/api.py:1119, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1116 if encoder: 1117 variables, attrs = encoder(variables, attrs) -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File /env/lib/python3.8/site-packages/xarray/backends/zarr.py:500, in ZarrStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 498 new_variables = set(variables) - existing_variable_names 499 variables_without_encoding = {vn: variables[vn] for vn in new_variables} --> 500 variables_encoded, attributes = self.encode( 501 variables_without_encoding, attributes 502 ) 504 if existing_variable_names: 505 # Decode variables directly, without going via xarray.Dataset to 506 # avoid needing to load index variables into memory. 507 # TODO: consider making loading indexes lazy again? 508 existing_vars, _, _ = conventions.decode_cf_variables( 509 self.get_variables(), self.get_attrs() 510 ) File /env/lib/python3.8/site-packages/xarray/backends/common.py:200, in AbstractWritableDataStore.encode(self, variables, attributes) 183 def encode(self, variables, attributes): 184 """ 185 Encode the variables and attributes in this store 186 (...) 198 199 """ --> 200 variables = {k: self.encode_variable(v) for k, v in variables.items()} 201 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()} 202 return variables, attributes File /env/lib/python3.8/site-packages/xarray/backends/common.py:200, in <dictcomp>(.0) 183 def encode(self, variables, attributes): 184 """ 185 Encode the variables and attributes in this store 186 (...) 198 199 """ --> 200 variables = {k: self.encode_variable(v) for k, v in variables.items()} 201 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()} 202 return variables, attributes File /env/lib/python3.8/site-packages/xarray/backends/zarr.py:459, in ZarrStore.encode_variable(self, variable) 458 def encode_variable(self, variable): --> 459 variable = encode_zarr_variable(variable) 460 return variable File /env/lib/python3.8/site-packages/xarray/backends/zarr.py:258, in encode_zarr_variable(var, needs_copy, name) 237 def encode_zarr_variable(var, needs_copy=True, name=None): 238 """ 239 Converts an Variable into an Variable which follows some 240 of the CF conventions: (...) 255 A variable which has been encoded as described above. 256 """ --> 258 var = conventions.encode_cf_variable(var, name=name) 260 # zarr allows unicode, but not variable-length strings, so it's both 261 # simpler and more compact to always encode as UTF-8 explicitly. 262 # TODO: allow toggling this explicitly via dtype in encoding. 263 coder = coding.strings.EncodedStringCoder(allows_unicode=True) File /env/lib/python3.8/site-packages/xarray/conventions.py:273, in encode_cf_variable(var, needs_copy, name) 264 ensure_not_multiindex(var, name=name) 266 for coder in [ 267 times.CFDatetimeCoder(), 268 times.CFTimedeltaCoder(), (...) 271 variables.UnsignedIntegerCoder(), 272 ]: --> 273 var = coder.encode(var, name=name) 275 # TODO(shoyer): convert all of these to use coders, too: 276 var = maybe_encode_nonstring_dtype(var, name=name) File /env/lib/python3.8/site-packages/xarray/coding/times.py:659, in CFDatetimeCoder.encode(self, variable, name) 655 dims, data, attrs, encoding = unpack_for_encoding(variable) 656 if np.issubdtype(data.dtype, np.datetime64) or contains_cftime_datetimes( 657 variable 658 ): --> 659 (data, units, calendar) = encode_cf_datetime( 660 data, encoding.pop("units", None), encoding.pop("calendar", None) 661 ) 662 safe_setitem(attrs, "units", units, name=name) 663 safe_setitem(attrs, "calendar", calendar, name=name) File /env/lib/python3.8/site-packages/xarray/coding/times.py:592, in encode_cf_datetime(dates, units, calendar)
582 def encode_cf_datetime(dates, units=None, calendar=None):
583 """Given an array of datetime objects, returns the tuple Anything else we need to know?Our system uses dask_gateway in a AWS infrastructure (S3 for storage) Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.10 (default, Jun 22 2022, 20:18:18)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.4.209-116.367.amzn2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.4
libnetcdf: 4.7.3
xarray: 2022.3.0
pandas: 1.5.0
numpy: 1.22.4
scipy: 1.9.1
netCDF4: 1.6.1
pydap: installed
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.2
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.9.2
distributed: 2022.9.2
matplotlib: 3.6.0
cartopy: 0.20.2
seaborn: 0.12.0
numbagg: None
fsspec: 2022.8.2
cupy: None
pint: None
sparse: 0.13.0
setuptools: 65.4.1
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: 8.5.0
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7132/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1307523148 | I_kwDOAMm_X85N7zhM | 6803 | Passing a distributed.Future to the kwargs of apply_ufunc should resolve the future | alessioarena 33886395 | closed | 0 | 10 | 2022-07-18T07:31:28Z | 2024-01-09T18:21:15Z | 2023-12-19T05:40:20Z | NONE | What is your issue?I am trying to scatter an large array and pass it as keyword argument to a function applied using Here is a minimal example to reproduce this issue ```python import dask.array as da import xarray as xr import numpy as np data = xr.DataArray(data=da.random.random((15, 15, 20)), coords={'x': range(15), 'y': range(15), 'z': range(20)}, dims=('x', 'y', 'z')) test = np.full((20,), 30) test_future = client.scatter(test, broadcast=True) def _copy_test(d, test=None): return test new_data_actual = xr.apply_ufunc( _copy_test, data, input_core_dims=[['z']], output_core_dims=[['new_z']], vectorize=True, dask='parallelized', output_dtypes="float64", kwargs={'test':test}, dask_gufunc_kwargs = {'output_sizes':{'new_z':20}} ) new_data_future = xr.apply_ufunc( _copy_test, data, input_core_dims=[['z']], output_core_dims=[['new_z']], vectorize=True, dask='parallelized', output_dtypes="float64", kwargs={'test':test_future}, dask_gufunc_kwargs = {'output_sizes':{'new_z':20}} ) data[0, 0].compute() [0.3034994 , 0.08172002, 0.34731092, ...]new_data_actual[0, 0].compute() [30.0, 30.0, 30.0, ...]new_data_future[0,0].compute() KilledWorker``` I tried different versions of this, going from explicitly calling Am I trying to do something completely silly? or is this an unexpected behavior? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6803/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
754789691 | MDU6SXNzdWU3NTQ3ODk2OTE= | 4637 | Support for monotonically decreasing indices in interpolate_na | alessioarena 33886395 | open | 0 | 3 | 2020-12-01T22:58:23Z | 2023-03-31T18:10:24Z | NONE | Currently The current workaround is to flip the image before and after the interpolation, however should not be a lot of effort to support all monotonic indices directly within |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4637/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
584865241 | MDU6SXNzdWU1ODQ4NjUyNDE= | 3872 | Operations resulting in np.timedelta64 are not properly coerced | alessioarena 33886395 | closed | 0 | 2 | 2020-03-20T06:14:30Z | 2020-03-23T20:55:54Z | 2020-03-23T20:55:53Z | NONE | It seems that operations that are resulting in This follows the numpy documentation describing datetime arithmentic resulting in timedelta objects (http://lagrange.univ-lyon1.fr/docs/numpy/1.11.0/reference/arrays.datetime.html#datetime-and-timedelta-arithmetic) MCVE Code Sample```python this is a DataArray of type np.datetime64da = xr.DataArray( data= pd.date_range('2020-01-01', '2020-01-30', freq='D') ) this simple arithmetic will result in np.timedelta64delta = (da - np.datetime64('2020-01-01')) type(delta.data[0]) > numpy.timedelta64type(delta.dt) > xarray.core.accessor_dt.DatetimeAccessor``` Expected Output```python type(delta.dt) > xarray.core.accessor_dt.TimedeltaAccessor``` Problem DescriptionHaving a VersionsOutput of `xr.show_versions()`INSTALLED VERSIONS ------------------ commit: None python: 3.6.10 | packaged by conda-forge | (default, Mar 5 2020, 10:05:08) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.12.14-95.32-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.14.1 pandas: 0.24.1 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.1.2 pydap: None h5netcdf: 0.8.0 h5py: 2.9.0 Nio: None zarr: 2.2.0 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.12.0 distributed: 2.12.0 matplotlib: 3.0.3 cartopy: None seaborn: 0.10.0 numbagg: None setuptools: 45.2.0.post20200209 pip: 20.0.2 conda: None pytest: None IPython: 7.13.0 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3872/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);