github: issues: 2 rows where repo = 13221727, type = "issue" and user = 6404167 sorted by updated

2 rows where repo = 13221727, type = "issue" and user = 6404167 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
327613219	MDU6SXNzdWUzMjc2MTMyMTk=	2198	DataArray.encoding['chunksizes'] not respected in to_netcdf	Karel-van-de-Plassche 6404167	closed	0			2	2018-05-30T07:50:59Z	2019-06-06T20:35:50Z	2019-06-06T20:35:50Z	CONTRIBUTOR				This might be just a documentation issue, so sorry if this is not a problem with xarray. I'm trying to save an intermediate result of a calculation with xarray + dask to disk, but I'd like to preserve the on-disk chunking. Setting the encoding of a Dataset.data_var or DataArray using the encoding attribute seems to work for (at least) some encoding variables, but not for `chunksizes`. For example: ``` python import xarray as xr import dask.array as da from dask.distributed import Client from IPython import embed First generate a file with random numbers rng = da.random.RandomState() shape = (10, 10000) chunks = [10, 10] dims = ['x', 'y'] z = rng.standard_normal(shape, chunks=chunks) da = xr.DataArray(z, dims=dims, name='z') Set encoding of the DataArray da.encoding['chunksizes'] = chunks # Not conserved da.encoding['zlib'] = True # Conserved ds = da.to_dataset() print(ds['z'].encoding) #out: {'chunksizes': [10, 10], 'zlib': True} This one is chunked and compressed correctly ds.to_netcdf('test1.nc', encoding={'z': {'chunksizes': chunks}}) While this one is only compressed ds.to_netcdf('test2.nc') ``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.16.5-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.4 pandas: 0.22.0 numpy: 1.14.3 scipy: 0.19.0 netCDF4: 1.4.0 h5netcdf: 0.5.1 h5py: 2.7.1 Nio: None zarr: None bottleneck: None cyordereddict: None dask: 0.17.5 distributed: 1.21.8 matplotlib: 2.0.2 cartopy: None seaborn: 0.7.1 setuptools: 39.1.0 pip: 9.0.1 conda: None pytest: 3.2.2 IPython: 6.3.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2198/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	xarray 13221727	issue
327064908	MDU6SXNzdWUzMjcwNjQ5MDg=	2190	Parallel non-locked read using dask.Client crashes	Karel-van-de-Plassche 6404167	closed	0			5	2018-05-28T15:42:40Z	2019-01-14T21:09:04Z	2019-01-14T21:09:03Z	CONTRIBUTOR				I'm trying to parallelize my code using Dask. Using their `distributed.Client()` I was able to do computations in parallel. Unfortunately, it seems ~60% of the time is spend in a file lock. As I'm only reading data and doing computations in memory, I should be able to work without a lock, so I tried to pass `lock=False` to `open_dataset`. Unfortunately this crashes my code. A minimal reproducible example can be found below: ``` python import xarray as xr import dask.array as da from dask.distributed import Client from IPython import embed First generate a file with random numbers rng = da.random.RandomState() shape = (10, 10000) chunks = (10, 10) dims = ['y', 'z'] x = rng.standard_normal(shape, chunks=chunks) da = xr.DataArray(x, dims=dims, name='x') da.to_netcdf('test.nc') Open file without a lock client = Client(processes=False) ds = xr.open_dataset('test.nc', chunks=dict(zip(dims, chunks)), lock=False) This will crash! print((ds['x'] * ds['x']).compute()) `Crashes with (sometimes)` python distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x7ffb69033c50>, key=BasicIndexer((slice(None, None, None), slice(None, None, None)))))), (slice(0, 10, None), slice(5710, 5720, None))) kwargs: {} Exception: RuntimeError('NetCDF: HDF error',) `` And usually just withterminated by signal SIGSEGV (Address boundary error)` Output of `xr.show_versions()` ``` python INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.16.9-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.2 pandas: 0.20.3 numpy: 1.14.0 scipy: 0.19.1 netCDF4: 1.4.0 h5netcdf: None h5py: 2.7.1 Nio: None zarr: None bottleneck: None cyordereddict: None dask: 0.17.5 distributed: 1.21.8 matplotlib: 2.1.2 cartopy: None seaborn: 0.8.1 setuptools: 38.5.1 pip: 10.0.1 conda: None pytest: 3.4.0 IPython: 6.3.1 sphinx: 1.6.4 ``` A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports ```python Your code here ``` Problem description [this should explain why the current behavior is a problem and why the expected output is a better solution.] Expected Output Output of `xr.show_versions()` # Paste the output here xr.show_versions() here	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2190/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

2 rows where repo = 13221727, type = "issue" and user = 6404167 sorted by updated_at descending

First generate a file with random numbers

Set encoding of the DataArray

This one is chunked and compressed correctly

While this one is only compressed

First generate a file with random numbers

Open file without a lock

This will crash!

Output of `xr.show_versions()`

Your code here

Problem description

Expected Output

Output of `xr.show_versions()`

Advanced export

issues

2 rows where repo = 13221727, type = "issue" and user = 6404167 sorted by updated_at descending

First generate a file with random numbers

Set encoding of the DataArray

This one is chunked and compressed correctly

While this one is only compressed

First generate a file with random numbers

Open file without a lock

This will crash!

Output of xr.show_versions()

Your code here

Problem description

Expected Output

Output of xr.show_versions()

Advanced export

Output of `xr.show_versions()`

Output of `xr.show_versions()`