github: issues: 5 rows where user = 33886395 sorted by updated

5 rows where user = 33886395 sorted by updated_at descending

Search:

✖

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
1962680526	I_kwDOAMm_X850_CDO	8377	Slow performance with groupby using a custom DataArray grouper	alessioarena 33886395	closed	6	2023-10-26T04:28:00Z	2024-02-15T22:44:18Z	2024-02-15T22:44:18Z	NONE	What is your issue? I have a code that calculates a per-pixel nearest neighbor match between two datasets, to then perform a groupby + aggregation. The calculation I perform is generally lazy using dask. I recently noticed a slow performance of groupby in this way, with lazy calculations taking in excess of 10 minutes for an index of approximately 4000 by 4000. I did a bit of digging around and noticed that the slow line is this: ```Python Timer unit: 1e-09 s Total time: 0.263679 s File: /env/lib/python3.10/site-packages/xarray/core/duck_array_ops.py Function: array_equiv at line 260 Line # Hits Time Per Hit % Time Line Contents 260 def array_equiv(arr1, arr2): 261 """Like np.array_equal, but also allows values to be NaN in both arrays""" 262 22140 96490101.0 4358.2 36.6 arr1 = asarray(arr1) 263 22140 34155953.0 1542.7 13.0 arr2 = asarray(arr2) 264 22140 119855572.0 5413.5 45.5 lazy_equiv = lazy_array_equiv(arr1, arr2) 265 22140 7390478.0 333.8 2.8 if lazy_equiv is None: 266 with warnings.catch_warnings(): 267 warnings.filterwarnings("ignore", "In the future, 'NAT == x'") 268 flag_array = (arr1 == arr2) \| (isnull(arr1) & isnull(arr2)) 269 return bool(flag_array.all()) 270 else: 271 22140 5787053.0 261.4 2.2 return lazy_equiv Total time: 242.247 s File: /env/lib/python3.10/site-packages/xarray/core/indexing.py Function: getitem at line 1419 Line # Hits Time Per Hit % Time Line Contents 1419 def getitem(self, key): 1420 22140 26764337.0 1208.9 0.0 if not isinstance(key, VectorizedIndexer): 1421 # if possible, short-circuit when keys are effectively slice(None) 1422 # This preserves dask name and passes lazy array equivalence checks 1423 # (see duck_array_ops.lazy_array_equiv) 1424 22140 10513930.0 474.9 0.0 rewritten_indexer = False 1425 22140 4602305.0 207.9 0.0 new_indexer = [] 1426 66420 61804870.0 930.5 0.0 for idim, k in enumerate(key.tuple): 1427 88560 78516641.0 886.6 0.0 if isinstance(k, Iterable) and ( 1428 22140 151748667.0 6854.1 0.1 not is_duck_dask_array(k) 1429 22140 2e+11 1e+07 93.6 and duck_array_ops.array_equiv(k, np.arange(self.array.shape[idim])) 1430 ): 1431 new_indexer.append(slice(None)) 1432 rewritten_indexer = True 1433 else: 1434 44280 40322984.0 910.6 0.0 new_indexer.append(k) 1435 22140 4847251.0 218.9 0.0 if rewritten_indexer: 1436 key = type(key)(tuple(new_indexer)) 1437 1438 22140 24251221.0 1095.4 0.0 if isinstance(key, BasicIndexer): 1439 return self.array[key.tuple] 1440 22140 9613954.0 434.2 0.0 elif isinstance(key, VectorizedIndexer): 1441 return self.array.vindex[key.tuple] 1442 else: 1443 22140 8618414.0 389.3 0.0 assert isinstance(key, OuterIndexer) 1444 22140 26601491.0 1201.5 0.0 key = key.tuple 1445 22140 6010672.0 271.5 0.0 try: 1446 22140 2e+10 678487.7 6.2 return self.array[key] 1447 except NotImplementedError: 1448 # manual orthogonal indexing. 1449 # TODO: port this upstream into dask in a saner way. 1450 value = self.array 1451 for axis, subkey in reversed(list(enumerate(key))): 1452 value = value[(slice(None),) * axis + (subkey,)] 1453 return value ``` The test `duck_array_ops.array_equiv(k, np.arange(self.array.shape[idim]))` is repeated multiple times, and despite that being decently fast it amounts to a lot of time that could be potentially minimized by introducing a prior test of equal length, like `python if isinstance(k, Iterable) and ( not is_duck_dask_array(k) and len(k) == self.array.shape[idim] and duck_array_ops.array_equiv(k, np.arange(self.array.shape[idim])) ):` This would work better because, despite that test being performed by array_equiv, currently the array to test against is always created using `np.arange`, that being ultimately the bottleneck ```Python 74992059 function calls (73375414 primitive calls) in 298.934 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 22140 225.296 0.010 225.296 0.010 {built-in method numpy.arange} 177123 3.192 0.000 3.670 0.000 inspect.py:2920(init) 110702/110701 2.180 0.000 2.180 0.000 {built-in method numpy.asarray} 11690863/11668723 2.036 0.000 5.043 0.000 {built-in method builtins.isinstance} 287827 1.876 0.000 3.768 0.000 utils.py:25(meta_from_array) 132843 1.872 0.000 7.649 0.000 inspect.py:2280(_signature_from_function) 974166 1.485 0.000 2.558 0.000 inspect.py:2637(init) ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8377/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 }	completed	xarray 13221727	issue
1397532790	I_kwDOAMm_X85TTKh2	7132	Saving a DataArray of datetime objects as zarr is not a lazy operation despite compute=False	alessioarena 33886395	closed	2	2022-10-05T09:50:34Z	2024-01-29T19:12:32Z	2024-01-29T19:12:32Z	NONE	What happened? Trying to save a lazy xr.DataArray of datetime objects as a zarr forces a dask.compute operation and retrieves the data to the local notebook. This is generally not a problem for indices of datetime objects as that is already locally store and generally small in size. However, if the whole underlying array is a datetime object, that can be a serious problem. In my case it simply crashed the scheduler upon attempting to retrieve the data persisted on workers. I managed to isolate the problem on this call stack. The issue is in the `encode_cf_datetime` function What did you expect to happen? Storing the data in zarr format to be performed directly by dask workers bypassing the scheduler/Client if `compute=True`, and complete lazy operation if `compute=False` Minimal Complete Verifiable Example ```Python import numpy as np import xarray as xr import dask.array as da test = xr.DataArray( data = da.full((20000, 20000), np.datetime64('2005-02-25T03:30', 'ns')), coords = {'x': range(20000), 'y': range(20000)} ).to_dataset(name='test') print(test.test.dtype) dtype('<M8[ns]') test.to_zarr('test.zarr', compute=False) this will take a while and trigger the computation of the array. No data will be actually saved though ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [x] Complete example — the example is self-contained, including all data and the text of any traceback. [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output ```Python File /env/lib/python3.8/site-packages/xarray/core/dataset.py:2036, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 2033 if encoding is None: 2034 encoding = {} -> 2036 return to_zarr( 2037 self, 2038 store=store, 2039 chunk_store=chunk_store, 2040 storage_options=storage_options, 2041 mode=mode, 2042 synchronizer=synchronizer, 2043 group=group, 2044 encoding=encoding, 2045 compute=compute, 2046 consolidated=consolidated, 2047 append_dim=append_dim, 2048 region=region, 2049 safe_chunks=safe_chunks, 2050 ) File /env/lib/python3.8/site-packages/xarray/backends/api.py:1431, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options) 1429 writer = ArrayWriter() 1430 # TODO: figure out how to properly handle unlimited_dims -> 1431 dump_to_store(dataset, zstore, writer, encoding=encoding) 1432 writes = writer.sync(compute=compute) 1434 if compute: File /env/lib/python3.8/site-packages/xarray/backends/api.py:1119, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1116 if encoder: 1117 variables, attrs = encoder(variables, attrs) -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File /env/lib/python3.8/site-packages/xarray/backends/zarr.py:500, in ZarrStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 498 new_variables = set(variables) - existing_variable_names 499 variables_without_encoding = {vn: variables[vn] for vn in new_variables} --> 500 variables_encoded, attributes = self.encode( 501 variables_without_encoding, attributes 502 ) 504 if existing_variable_names: 505 # Decode variables directly, without going via xarray.Dataset to 506 # avoid needing to load index variables into memory. 507 # TODO: consider making loading indexes lazy again? 508 existing_vars, _, _ = conventions.decode_cf_variables( 509 self.get_variables(), self.get_attrs() 510 ) File /env/lib/python3.8/site-packages/xarray/backends/common.py:200, in AbstractWritableDataStore.encode(self, variables, attributes) 183 def encode(self, variables, attributes): 184 """ 185 Encode the variables and attributes in this store 186 (...) 198 199 """ --> 200 variables = {k: self.encode_variable(v) for k, v in variables.items()} 201 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()} 202 return variables, attributes File /env/lib/python3.8/site-packages/xarray/backends/common.py:200, in <dictcomp>(.0) 183 def encode(self, variables, attributes): 184 """ 185 Encode the variables and attributes in this store 186 (...) 198 199 """ --> 200 variables = {k: self.encode_variable(v) for k, v in variables.items()} 201 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()} 202 return variables, attributes File /env/lib/python3.8/site-packages/xarray/backends/zarr.py:459, in ZarrStore.encode_variable(self, variable) 458 def encode_variable(self, variable): --> 459 variable = encode_zarr_variable(variable) 460 return variable File /env/lib/python3.8/site-packages/xarray/backends/zarr.py:258, in encode_zarr_variable(var, needs_copy, name) 237 def encode_zarr_variable(var, needs_copy=True, name=None): 238 """ 239 Converts an Variable into an Variable which follows some 240 of the CF conventions: (...) 255 A variable which has been encoded as described above. 256 """ --> 258 var = conventions.encode_cf_variable(var, name=name) 260 # zarr allows unicode, but not variable-length strings, so it's both 261 # simpler and more compact to always encode as UTF-8 explicitly. 262 # TODO: allow toggling this explicitly via dtype in encoding. 263 coder = coding.strings.EncodedStringCoder(allows_unicode=True) File /env/lib/python3.8/site-packages/xarray/conventions.py:273, in encode_cf_variable(var, needs_copy, name) 264 ensure_not_multiindex(var, name=name) 266 for coder in [ 267 times.CFDatetimeCoder(), 268 times.CFTimedeltaCoder(), (...) 271 variables.UnsignedIntegerCoder(), 272 ]: --> 273 var = coder.encode(var, name=name) 275 # TODO(shoyer): convert all of these to use coders, too: 276 var = maybe_encode_nonstring_dtype(var, name=name) File /env/lib/python3.8/site-packages/xarray/coding/times.py:659, in CFDatetimeCoder.encode(self, variable, name) 655 dims, data, attrs, encoding = unpack_for_encoding(variable) 656 if np.issubdtype(data.dtype, np.datetime64) or contains_cftime_datetimes( 657 variable 658 ): --> 659 (data, units, calendar) = encode_cf_datetime( 660 data, encoding.pop("units", None), encoding.pop("calendar", None) 661 ) 662 safe_setitem(attrs, "units", units, name=name) 663 safe_setitem(attrs, "calendar", calendar, name=name) File /env/lib/python3.8/site-packages/xarray/coding/times.py:592, in encode_cf_datetime(dates, units, calendar) 582 def encode_cf_datetime(dates, units=None, calendar=None): 583 """Given an array of datetime objects, returns the tuple `(num, units, 584 calendar)` suitable for a CF compliant time variable. 585 (...) 590 cftime.date2num 591 """ --> 592 dates = np.asarray(dates) 594 if units is None: 595 units = infer_datetime_units(dates) ``` Anything else we need to know? Our system uses dask_gateway in a AWS infrastructure (S3 for storage) Environment INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.209-116.367.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.4 libnetcdf: 4.7.3 xarray: 2022.3.0 pandas: 1.5.0 numpy: 1.22.4 scipy: 1.9.1 netCDF4: 1.6.1 pydap: installed h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.13.2 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.2 cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.9.2 distributed: 2022.9.2 matplotlib: 3.6.0 cartopy: 0.20.2 seaborn: 0.12.0 numbagg: None fsspec: 2022.8.2 cupy: None pint: None sparse: 0.13.0 setuptools: 65.4.1 pip: 22.2.2 conda: None pytest: 7.1.3 IPython: 8.5.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7132/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1307523148	I_kwDOAMm_X85N7zhM	6803	Passing a distributed.Future to the kwargs of apply_ufunc should resolve the future	alessioarena 33886395	closed	10	2022-07-18T07:31:28Z	2024-01-09T18:21:15Z	2023-12-19T05:40:20Z	NONE	What is your issue? I am trying to scatter an large array and pass it as keyword argument to a function applied using `apply_ufunc` but that is currently not working. The same function works if providing the actual array, but if providing the Future linked to the scatter data the task fails. Here is a minimal example to reproduce this issue ```python import dask.array as da import xarray as xr import numpy as np data = xr.DataArray(data=da.random.random((15, 15, 20)), coords={'x': range(15), 'y': range(15), 'z': range(20)}, dims=('x', 'y', 'z')) test = np.full((20,), 30) test_future = client.scatter(test, broadcast=True) def _copy_test(d, test=None): return test new_data_actual = xr.apply_ufunc( _copy_test, data, input_core_dims=[['z']], output_core_dims=[['new_z']], vectorize=True, dask='parallelized', output_dtypes="float64", kwargs={'test':test}, dask_gufunc_kwargs = {'output_sizes':{'new_z':20}} ) new_data_future = xr.apply_ufunc( _copy_test, data, input_core_dims=[['z']], output_core_dims=[['new_z']], vectorize=True, dask='parallelized', output_dtypes="float64", kwargs={'test':test_future}, dask_gufunc_kwargs = {'output_sizes':{'new_z':20}} ) data[0, 0].compute() [0.3034994 , 0.08172002, 0.34731092, ...] new_data_actual[0, 0].compute() [30.0, 30.0, 30.0, ...] new_data_future[0,0].compute() KilledWorker ``` I tried different versions of this, going from explicitly calling `test.result()` to change the way the Future was passed, but nothing worked. I also tried to raise exceptions within the function and various way to print information, but that also did not work. This last issue makes me think that if passing a Future I actually don't get to the scope of that function Am I trying to do something completely silly? or is this an unexpected behavior?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6803/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
754789691	MDU6SXNzdWU3NTQ3ODk2OTE=	4637	Support for monotonically decreasing indices in interpolate_na	alessioarena 33886395	open	3	2020-12-01T22:58:23Z	2023-03-31T18:10:24Z		NONE	Currently `interpolate_na` requires all indices to be monotonically increasing. If thinking about geographical dataset, indices are generally always monotonic, however in the South hemisphere the latitude is monotonically decreasing. The current workaround is to flip the image before and after the interpolation, however should not be a lot of effort to support all monotonic indices directly within `interpolate_na`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4637/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
584865241	MDU6SXNzdWU1ODQ4NjUyNDE=	3872	Operations resulting in np.timedelta64 are not properly coerced	alessioarena 33886395	closed	2	2020-03-20T06:14:30Z	2020-03-23T20:55:54Z	2020-03-23T20:55:53Z	NONE	It seems that operations that are resulting in `timedelta64` (for example `datetime64` arithmetic) are not properly coerced. In fact, the result of that operation is a xarray object that has a dt accessor being of type `xarray.core.accessor_dt.DatetimeAccessor` instead of the expected `xarray.core.accessor_dt.TimedeltaAccessor` This follows the numpy documentation describing datetime arithmentic resulting in timedelta objects (http://lagrange.univ-lyon1.fr/docs/numpy/1.11.0/reference/arrays.datetime.html#datetime-and-timedelta-arithmetic) MCVE Code Sample ```python this is a DataArray of type np.datetime64 da = xr.DataArray( data= pd.date_range('2020-01-01', '2020-01-30', freq='D') ) this simple arithmetic will result in np.timedelta64 delta = (da - np.datetime64('2020-01-01')) type(delta.data[0]) > numpy.timedelta64 type(delta.dt) > xarray.core.accessor_dt.DatetimeAccessor ``` Expected Output ```python type(delta.dt) > xarray.core.accessor_dt.TimedeltaAccessor ``` Problem Description Having a `.data` of type `timedelta64` would benefit from having a `.dt` accessor of type `TimedeltaAccessor`. This would allow to represent such `timedelta64` using the relevant time units like `days` Versions Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.10 \| packaged by conda-forge \| (default, Mar 5 2020, 10:05:08) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.12.14-95.32-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.14.1 pandas: 0.24.1 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.1.2 pydap: None h5netcdf: 0.8.0 h5py: 2.9.0 Nio: None zarr: 2.2.0 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.12.0 distributed: 2.12.0 matplotlib: 3.0.3 cartopy: None seaborn: 0.10.0 numbagg: None setuptools: 45.2.0.post20200209 pip: 20.0.2 conda: None pytest: None IPython: 7.13.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3872/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

5 rows where user = 33886395 sorted by updated_at descending

What is your issue?

Line # Hits Time Per Hit % Time Line Contents

Line # Hits Time Per Hit % Time Line Contents

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

dtype('<M8[ns]')

this will take a while and trigger the computation of the array. No data will be actually saved though

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

What is your issue?

[0.3034994 , 0.08172002, 0.34731092, ...]

[30.0, 30.0, 30.0, ...]

KilledWorker

MCVE Code Sample

this is a DataArray of type np.datetime64

this simple arithmetic will result in np.timedelta64

> numpy.timedelta64

> xarray.core.accessor_dt.DatetimeAccessor

Expected Output

> xarray.core.accessor_dt.TimedeltaAccessor

Problem Description

Versions

Advanced export