issues
4 rows where type = "issue" and user = 102827 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date), closed_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
289342234 | MDU6SXNzdWUyODkzNDIyMzQ= | 1836 | HDF5 error when working with compressed NetCDF files and the dask multiprocessing scheduler | cchwala 102827 | open | 0 | 5 | 2018-01-17T17:05:56Z | 2022-06-21T14:50:02Z | CONTRIBUTOR | Code Sample, a copy-pastable example if possible```python import xarray as xr import numpy as np import dask.multiprocessing Generate dummy data and build xarray datasetmat = np.random.rand(10, 90, 90) ds = xr.Dataset(data_vars={'foo': (('time', 'x', 'y'), mat)}) Write dataset to netcdf without compressionds.to_netcdf('dummy_data_3d.nc') Write with zlib compersisonds.to_netcdf('dummy_data_3d_with_compression.nc', encoding={'foo': {'zlib': True}}) Write data as int16 with scale factor appliedds.to_netcdf('dummy_data_3d_with_scale_factor.nc', encoding={'foo': {'dtype': 'int16', 'scale_factor': 0.01, '_FillValue': -9999}}) Load data from netCDF filesds_vanilla = xr.open_dataset('dummy_data_3d.nc', chunks={'time': 1}) ds_scaled = xr.open_dataset('dummy_data_3d_with_scale_factor.nc', chunks={'time': 1}) ds_compressed = xr.open_dataset('dummy_data_3d_with_compression.nc', chunks={'time': 1}) Do computation using dask's multiprocessing schedulerfoo = ds_vanilla.foo.mean(dim=['x', 'y']).compute(get=dask.multiprocessing.get) foo = ds_scaled.foo.mean(dim=['x', 'y']).compute(get=dask.multiprocessing.get) foo = ds_compressed.foo.mean(dim=['x', 'y']).compute(get=dask.multiprocessing.get) The last line fails``` Problem descriptionIf NetCDF files are compressed (which is often the case) and opened with chunking enabled to use them with dask, computations using the multiprocessing scheduler fail. The above code shows this in a short example. The last line fails with a long HDF5 error log:
```
HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 140736213758912:
#000: H5Dio.c line 171 in H5Dread(): can't read data
major: Dataset
minor: Read failed
#001: H5Dio.c line 544 in H5D__read(): can't read data
major: Dataset
minor: Read failed
#002: H5Dchunk.c line 2022 in H5D__chunk_read(): error looking up chunk address
major: Dataset
minor: Can't get value
#003: H5Dchunk.c line 2768 in H5D__chunk_lookup(): can't query chunk address
major: Dataset
minor: Can't get value
#004: H5Dbtree.c line 1047 in H5D__btree_idx_get_addr(): can't get chunk info
major: Dataset
minor: Can't get value
#005: H5B.c line 341 in H5B_find(): unable to load B-tree node
major: B-Tree node
minor: Unable to protect metadata
#006: H5AC.c line 1763 in H5AC_protect(): H5C_protect() failed
major: Object cache
minor: Unable to protect metadata
#007: H5C.c line 2561 in H5C_protect(): can't load entry
major: Object cache
minor: Unable to load metadata into cache
#008: H5C.c line 6877 in H5C_load_entry(): Can't deserialize image
major: Object cache
minor: Unable to load metadata into cache
#009: H5Bcache.c line 181 in H5B__cache_deserialize(): wrong B-tree signature
major: B-Tree node
minor: Bad value
Traceback (most recent call last):
File "hdf5_bug_minimal_working_example.py", line 27, in <module>
foo = ds_compressed.foo.mean(dim=['x', 'y']).compute(get=dask.multiprocessing.get)
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/xarray/core/dataarray.py", line 658, in compute
return new.load(**kwargs)
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/xarray/core/dataarray.py", line 632, in load
ds = self._to_temp_dataset().load(**kwargs)
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/xarray/core/dataset.py", line 491, in load
evaluated_data = da.compute(*lazy_data.values(), **kwargs)
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/base.py", line 333, in compute
results = get(dsk, keys, **kwargs)
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/multiprocessing.py", line 177, in get
raise_exception=reraise, **kwargs)
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/local.py", line 521, in get_async
raise_exception(exc, tb)
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/local.py", line 290, in execute_task
result = _execute_task(task, data)
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/local.py", line 270, in _execute_task
args2 = [_execute_task(a, cache) for a in args]
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/local.py", line 270, in _execute_task
args2 = [_execute_task(a, cache) for a in args]
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/local.py", line 267, in _execute_task
return [_execute_task(a, cache) for a in arg]
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/local.py", line 271, in _execute_task
return func(*args2)
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/dask/array/core.py", line 72, in getter
c = np.asarray(c)
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/numpy/core/numeric.py", line 531, in asarray
return array(a, dtype, copy=False, order=order)
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/xarray/core/indexing.py", line 538, in __array__
return np.asarray(self.array, dtype=dtype)
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/numpy/core/numeric.py", line 531, in asarray
return array(a, dtype, copy=False, order=order)
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/xarray/core/indexing.py", line 505, in __array__
return np.asarray(array[self.key], dtype=None)
File "/Users/chwala-c/miniconda2/lib/python2.7/site-packages/xarray/backends/netCDF4_.py", line 61, in __getitem__
data = getitem(self.get_array(), key)
File "netCDF4/_netCDF4.pyx", line 3961, in netCDF4._netCDF4.Variable.__getitem__
File "netCDF4/_netCDF4.pyx", line 4798, in netCDF4._netCDF4.Variable._get
File "netCDF4/_netCDF4.pyx", line 1638, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error
```
A possible workaround, if the dataset fits into memory, is to use
Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1836/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
374279704 | MDU6SXNzdWUzNzQyNzk3MDQ= | 2514 | interpolate_na with limit argument changes size of chunks | cchwala 102827 | closed | 0 | 8 | 2018-10-26T08:31:35Z | 2021-03-26T19:50:50Z | 2021-03-26T19:50:50Z | CONTRIBUTOR | Code Sample, a copy-pastable example if possible```python import pandas as pd import xarray as xr import numpy as np t = pd.date_range(start='2018-01-01', end='2018-02-01', freq='H') foo = np.sin(np.arange(len(t))) bar = np.cos(np.arange(len(t))) foo[1] = np.NaN bar[2] = np.NaN ds_test = xr.Dataset(data_vars={'foo': ('time', foo), 'bar': ('time', bar)}, coords={'time': t}).chunk() print(ds_test)
print("\n\n### After Output of the above code. Note the different chunk sizes, depending on the value of After
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2514/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
376154741 | MDU6SXNzdWUzNzYxNTQ3NDE= | 2531 | DataArray.rolling() does not preserve chunksizes in some cases | cchwala 102827 | closed | 0 | 2 | 2018-10-31T20:50:33Z | 2021-03-26T19:50:49Z | 2021-03-26T19:50:49Z | CONTRIBUTOR | This issue was found and discussed in the related issue #2514 I open a separate issue for clarity. Code Sample, a copy-pastable example if possible```python import pandas as pd import numpy as np import xarray as xr t = pd.date_range(start='2018-01-01', end='2018-02-01', freq='H') bar = np.sin(np.arange(len(t))) baz = np.cos(np.arange(len(t))) da_test = xr.DataArray(data=np.stack([bar, baz]), coords={'time': t, 'sensor': ['one', 'two']}, dims=('sensor', 'time')) print(da_test.chunk({'time': 100}).rolling(time=60).mean().chunks) print(da_test.chunk({'time': 100}).rolling(time=60).count().chunks)
Problem descriptionDataArray.rolling() does not preserve the chunksizes, apparently depending on the applied method. Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2531/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
226549366 | MDU6SXNzdWUyMjY1NDkzNjY= | 1399 | `decode_cf_datetime()` slow because `pd.to_timedelta()` is slow if floats are passed | cchwala 102827 | closed | 0 | 6 | 2017-05-05T11:48:00Z | 2017-07-25T17:42:52Z | 2017-07-25T17:42:52Z | CONTRIBUTOR | Hi,
Here is a notebook that shows the differences. Working with integers is approx. one order of magnitude faster. Hence, it would be great to automatically do the conversion from raw time value floats to integers in nanoseconds where possible (likely limited to resolutions bellow days or hours to avoid coping with different durations numbers of nanoseconds within e.g. different months). As alternative, maybe avoid forcing the cast to floats and indicate in the docstring that the raw values should be integers to speed up the conversion. This could possibly also be resolved in |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1399/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);