github: issues: 5 rows where user = 1554921 sorted by updated

5 rows where user = 1554921 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	draft	pull_request	body	reactions	state_reason	repo	type
334633212	MDU6SXNzdWUzMzQ2MzMyMTI=	2242	to_netcdf(compute=False) can be slow	neishm 1554921	closed	5	2018-06-21T19:50:36Z	2019-01-13T21:13:28Z	2019-01-13T21:13:28Z	CONTRIBUTOR			Code Sample ```python import xarray as xr from dask.array import ones import dask from dask.diagnostics import ProgressBar ProgressBar().register() Define a mock DataSet dset = {} for i in range(5): name = 'var'+str(i) data = i*ones((8,79,200,401),dtype='f4',chunks=(1,1,200,401)) var = xr.DataArray(data=data, dims=('time','level','lat','lon'), name=name) dset[name] = var dset = xr.Dataset(dset) Single thread to facilitate debugging. (may require dask < 0.18) with dask.set_options(get=dask.get): # This works fine. print ("Testing immediate netCDF4 writing") dset.to_netcdf("test1.nc") # This can be twice as slow as the version above. # Can be even slower (like 10x slower) on a shared filesystem. print ("Testing delayed netCDF4 writing") dset.to_netcdf("test2.nc",compute=False).compute() ``` Problem description Using the delayed version of `to_netcdf` can cause a slowdown in writing the file. Running through cProfile, I see `_open_netcdf4_group` is called many times, suggesting the file is opened and closed for each chunk written. In my scripts (which dump to an NFS filesystem), writes can take 10 times longer than they should. Is there a reason for the repeated open/close cycles (e.g. #1198?), or can this behaviour be fixed so the file stays open for the duration of the `compute()` call? Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 2.7.6.final.0 python-bits: 64 OS: Linux OS-release: 3.13.0-135-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None xarray: 0.10.7 pandas: 0.23.0 numpy: 1.14.4 scipy: None netCDF4: 1.4.0 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: 0.17.5 distributed: None matplotlib: 1.3.1 cartopy: None seaborn: None setuptools: 39.2.0 pip: None conda: None pytest: None IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2242/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
336729475	MDExOlB1bGxSZXF1ZXN0MTk4MTEzNTQ2	2257	Write inconsistent chunks to netcdf	neishm 1554921	closed	2	2018-06-28T18:23:55Z	2018-06-29T13:52:15Z	2018-06-29T05:07:27Z	CONTRIBUTOR	0	pydata/xarray/pulls/2257	[x] Closes #2254 [x] Tests added [x] Tests passed	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2257/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
336273865	MDU6SXNzdWUzMzYyNzM4NjU=	2254	Writing Datasets to netCDF4 with "inconsistent" chunks	neishm 1554921	closed	3	2018-06-27T15:15:02Z	2018-06-29T05:07:27Z	2018-06-29T05:07:27Z	CONTRIBUTOR			Code Sample ```python import xarray as xr from dask.array import zeros, ones Construct two variables with the same dimensions, but different chunking x = zeros((100,100),dtype='f4',chunks=(50,100)) x = xr.DataArray(data=x, dims=('lat','lon'), name='x') y = ones((100,100),dtype='f4',chunks=(100,50)) y = xr.DataArray(data=y, dims=('lat','lon'), name='y') Put them both into the same dataset dset = xr.Dataset({'x':x,'y':y}) Save to a netCDF4 file. dset.to_netcdf("test.nc") ``` The last line results in `ValueError: inconsistent chunks` Problem description This error is triggered by `xarray.backends.api.to_netcdf`'s use of the `dataset.chunks` property in two places: https://github.com/pydata/xarray/blob/bb581ca206c80eea80270ba508ec80ae0cd3941f/xarray/backends/api.py#L703 https://github.com/pydata/xarray/blob/bb581ca206c80eea80270ba508ec80ae0cd3941f/xarray/backends/api.py#L709 I'm assuming `to_netcdf` only needs to know if chunks are being used, not necessarily if they're consistent? If I define a more general check `python have_chunks = any(v.chunks for v in dataset.variables.values())` and replace the instances of `dataset.chunks` with `have_chunks`, then the netCDF4 file gets written without any problems (although the data seems to be stored contiguously instead of chunked). Is this change as straight-forward as I think, or Is there something intrinsic about `xarray.Dataset` objects or writing to netCDF4 that require consistent chunks? Output of `xr.show_versions()` commit: bb581ca206c80eea80270ba508ec80ae0cd3941f python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-128-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None xarray: 0.10.7 pandas: 0.23.1 numpy: 1.14.5 scipy: None netCDF4: 1.4.0 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: 0.17.5 distributed: None matplotlib: None cartopy: None seaborn: None setuptools: 39.2.0 pip: 10.0.1 conda: None pytest: None IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2254/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
279832457	MDU6SXNzdWUyNzk4MzI0NTc=	1763	Multi-dimensional coordinate mixup when writing to netCDF	neishm 1554921	closed	4	2017-12-06T17:05:36Z	2018-01-11T16:54:48Z	2018-01-11T16:54:48Z	CONTRIBUTOR			Problem description Under certain conditions, the netCDF files produced by `Dataset.to_netcdf()` have the wrong coordinates attributed to the variables. This seems to happen if there are multiple multi-dimensional coordinates, which share some (but not all) dimensions. Test Dataset Some sample code to generate a problematic Dataset: ```python import xarray as xr import numpy as np zeros1 = np.zeros((5,3)) zeros2 = np.zeros((6,3)) zeros3 = np.zeros((5,4)) d = xr.Dataset({ 'lon1': (['x1','y1'], zeros1, {}), 'lon2': (['x2','y1'], zeros2, {}), 'lon3': (['x1','y2'], zeros3, {}), 'lat1': (['x1','y1'], zeros1, {}), 'lat2': (['x2','y1'], zeros2, {}), 'lat3': (['x1','y2'], zeros3, {}), 'foo1': (['x1','y1'], zeros1, {'coordinates': 'lon1 lat1'}), 'foo2': (['x2','y1'], zeros2, {'coordinates': 'lon2 lat2'}), 'foo3': (['x1','y2'], zeros3, {'coordinates': 'lon3 lat3'}), }) d = xr.conventions.decode_cf(d) `Here, the coordinates lat1,lat2,lat3 (and lon1,lon2,lon3) share one dimension with each other. The Dataset itself gets created properly:` print(d) <xarray.Dataset> Dimensions: (x1: 5, x2: 6, y1: 3, y2: 4) Coordinates: lat1 (x1, y1) float64 ... lat3 (x1, y2) float64 ... lat2 (x2, y1) float64 ... lon1 (x1, y1) float64 ... lon3 (x1, y2) float64 ... lon2 (x2, y1) float64 ... Dimensions without coordinates: x1, x2, y1, y2 Data variables: foo1 (x1, y1) float64 ... foo2 (x2, y1) float64 ... foo3 (x1, y2) float64 ... `and each DataArray does have the right coordinates associated with them:` print (d.foo1) <xarray.DataArray 'foo1' (x1: 5, y1: 3)> array([[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.]]) Coordinates: lat1 (x1, y1) float64 ... lon1 (x1, y1) float64 ... Dimensions without coordinates: x1, y1 print (d.foo2) <xarray.DataArray 'foo2' (x2: 6, y1: 3)> array([[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.]]) Coordinates: lat2 (x2, y1) float64 ... lon2 (x2, y1) float64 ... Dimensions without coordinates: x2, y1 print (d.foo3) <xarray.DataArray 'foo3' (x1: 5, y2: 4)> array([[ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.]]) Coordinates: lat3 (x1, y2) float64 ... lon3 (x1, y2) float64 ... Dimensions without coordinates: x1, y2 ``` The problem The problem happens when I try to write this to netCDF (using either the netCDF4 or scipy engines): `python d.to_netcdf("test.nc")` The resulting file has extra coordinates on the variables: ``` ~$ ncdump -h test.nc netcdf test { dimensions: x1 = 5 ; y1 = 3 ; y2 = 4 ; x2 = 6 ; variables: double lat1(x1, y1) ; lat1:_FillValue = NaN ; double lat3(x1, y2) ; lat3:_FillValue = NaN ; double lat2(x2, y1) ; lat2:_FillValue = NaN ; double lon1(x1, y1) ; lon1:_FillValue = NaN ; double lon3(x1, y2) ; lon3:_FillValue = NaN ; double lon2(x2, y1) ; lon2:_FillValue = NaN ; double foo1(x1, y1) ; foo1:_FillValue = NaN ; foo1:coordinates = "lat1 lat3 lat2 lon1 lon3 lon2" ; double foo2(x2, y1) ; foo2:_FillValue = NaN ; foo2:coordinates = "lon1 lon2 lat1 lat2" ; double foo3(x1, y2) ; foo3:_FillValue = NaN ; foo3:coordinates = "lon1 lon3 lat1 lat3" ; // global attributes: :_NCProperties = "version=1\|netcdflibversion=4.4.1.1\|hdf5libversion=1.8.18" ; } ``` Here, foo1, foo2, and foo3 have extra coordinates associated with them. Interestingly, if I re-open this netCDF file with `xarray.open_dataset`, I get the correct coordinates back for each DataArray. However, other netCDF utilities may not be so forgiving. Expected Output I would expect the netCDF file to have a single pair of lat/lon for each variable: `... double foo1(x1, y1) ; foo1:_FillValue = NaN ; foo1:coordinates = "lat1 lon1" ; double foo2(x2, y1) ; foo2:_FillValue = NaN ; foo2:coordinates = "lon2 lat2" ; double foo3(x1, y2) ; foo3:_FillValue = NaN ; foo3:coordinates = "lon3 lat3" ; ... }` Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: c2b205f29467a4431baa80b5c07fe31bda67fbef python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-101-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None xarray: 0.10.0-5-gc2b205f pandas: 0.21.0 numpy: 1.13.3 scipy: None netCDF4: 1.3.1 h5netcdf: None Nio: None bottleneck: None cyordereddict: None dask: None matplotlib: None cartopy: None seaborn: None setuptools: 38.2.4 pip: 9.0.1 conda: None pytest: 3.3.1 IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1763/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
280274296	MDExOlB1bGxSZXF1ZXN0MTU3MDk4NTY0	1768	Fix multidimensional coordinates	neishm 1554921	closed	2	2017-12-07T20:50:33Z	2018-01-11T16:54:48Z	2018-01-11T16:54:48Z	CONTRIBUTOR	0	pydata/xarray/pulls/1768	[x] Closes #1763 [x] Tests added [x] Tests passed [x] Passes `git diff upstream/master */py \| flake8 --diff` [x] Fully documented, including `whats-new.rst` for all changes	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1768/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

5 rows where user = 1554921 sorted by updated_at descending

Code Sample

Define a mock DataSet

Single thread to facilitate debugging.

(may require dask < 0.18)

Problem description

Output of `xr.show_versions()`

Code Sample

Construct two variables with the same dimensions, but different chunking

Put them both into the same dataset

Save to a netCDF4 file.

Problem description

Output of `xr.show_versions()`

Problem description

Test Dataset

The problem

Expected Output

Output of `xr.show_versions()`

Advanced export

issues

5 rows where user = 1554921 sorted by updated_at descending

Code Sample

Define a mock DataSet

Single thread to facilitate debugging.

(may require dask < 0.18)

Problem description

Output of xr.show_versions()

Code Sample

Construct two variables with the same dimensions, but different chunking

Put them both into the same dataset

Save to a netCDF4 file.

Problem description

Output of xr.show_versions()

Problem description

Test Dataset

The problem

Expected Output

Output of xr.show_versions()

Advanced export

Output of `xr.show_versions()`

Output of `xr.show_versions()`

Output of `xr.show_versions()`