github: issues: 16 rows where type = "issue" and user = 7799184 sorted by updated

16 rows where type = "issue" and user = 7799184 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
216626776	MDU6SXNzdWUyMTY2MjY3NzY=	1324	Choose time units in output netcdf	rafa-guedes 7799184	closed	10	2017-03-24T02:25:22Z	2023-08-09T08:01:43Z	2019-12-04T14:25:59Z	CONTRIBUTOR	Is there any way to define the units in output netcdf created from `to_netcdf` method without having to manually convert the time objects into floats for example? It could maybe be handy if the units attribute of time coordinate could be modified and the modified value (if CF-compliant) used for encoding time in the output netcdf. Currently, `to_netcdf` raises `ValueError: Failed hard to prevent overwriting key 'units'` if the attribute is modified. Thanks	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1324/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1268630439	I_kwDOAMm_X85LncOn	6688	2D extrapolation not working	rafa-guedes 7799184	open	3	2022-06-12T16:11:04Z	2022-06-14T06:19:20Z		CONTRIBUTOR	What happened? Extrapolation does not seem to be working on 2D data arrays. The area outside the input grid is NaN in the interpolated data when using `kwargs={"fill_value": None}` as arguments to the `interp` function (the extrapolation does work when using `scipy.interpolate.interpn` and passing `fill_value=None` along with `bounds_error=False`). This figure shows the example data arrays from the code snippet provided here: What did you expect to happen? Area outside the input grid filled with extrapolated data. Minimal Complete Verifiable Example ```Python import xarray as xr da = xr.DataArray( data=[[1, 2, 3], [3, 4, 5]], coords=dict(y=[0, 1], x=[10, 20, 30]), dims=("y", "x") ) dai = da.interp(x=[25, 30, 35], y=[0, 1], kwargs={"fill_value": None}) ``` MVCE confirmation [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [ ] Complete example — the example is self-contained, including all data and the text of any traceback. [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.7.12 \| packaged by conda-forge \| (default, Oct 26 2021, 06:08:53) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.13.0-1031-gcp machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.20.2 pandas: 1.3.5 numpy: 1.19.5 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: 2.11.3 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.02.0 distributed: None matplotlib: 3.5.2 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: 0.18 sparse: None setuptools: 59.8.0 pip: 22.1.1 conda: 4.12.0 pytest: 7.1.2 IPython: 7.33.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6688/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
220533356	MDU6SXNzdWUyMjA1MzMzNTY=	1366	Setting attributes to multi-index coordinate	rafa-guedes 7799184	closed	5	2017-04-10T04:11:12Z	2022-03-17T17:11:40Z	2022-03-17T17:11:40Z	CONTRIBUTOR	I can't seem to be able to define attributes to "virtual" coordinates from multi-index coordinates. Taking from the exemple from the docs: ```python In [1]: import numpy as np In [2]: import pandas as pd In [3]: import xarray as xr In [4]: midx = pd.MultiIndex.from_arrays([['R','R','V','V'], [.1,.2,.7,.9]], names=('band','wn')) In [5]: mda = xr.DataArray(np.random.rand(4), coords={'spec': midx}, dims='spec') Setting up attrs to the full coordinate works: In [6]: mda['spec'].attrs Out[6]: OrderedDict() In [7]: mda['spec'].attrs = {'spec_attr': 'some_attr'} In [8]: mda['spec'].attrs Out[8]: OrderedDict([('spec_attr', 'some_attr')]) Setting attrs to the virtual coordinate does not produce any effect: In [9]: mda['band'].attrs Out[9]: OrderedDict() In [10]: mda['band'].attrs = {'band_attr': 'another_attr'} In [11]: mda['band'].attrs Out[11]: OrderedDict() ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1366/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
595492608	MDU6SXNzdWU1OTU0OTI2MDg=	3942	Time dtype encoding defaulting to `int64` when writing netcdf or zarr	rafa-guedes 7799184	open	8	2020-04-06T23:36:37Z	2021-11-11T12:32:06Z		CONTRIBUTOR	Time `dtype` encoding defaults to `"int64"` for datasets with only zero-hour times when writing to netcdf or zarr. This results in these datasets having a precision constrained by how the time units are defined (in the example below `daily` precision, given units are defined as `'days since ...'`). If we for instance create a zarr dataset using this default encoding option with such datasets, and subsequently append some non-zero times onto it, we loose the hour/minute/sec information from the appended bits. MCVE Code Sample ```python In [1]: ds = xr.DataArray( ...: data=[0.5], ...: coords={"time": [datetime.datetime(2012,1,1)]}, ...: dims=("time",), ...: name="x", ...: ).to_dataset() In [2]: ds Out[2]: <xarray.Dataset> Dimensions: (time: 1) Coordinates: * time (time) datetime64[ns] 2012-01-01 Data variables: x (time) float64 0.5 In [3]: ds.to_zarr("/tmp/x.zarr") In [4]: ds1 = xr.open_zarr("/tmp/x.zarr") In [5]: ds1.time.encoding Out[5]: {'chunks': (1,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'units': 'days since 2012-01-01 00:00:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')} In [6]: dsnew = xr.DataArray( ...: data=[1.5], ...: coords={"time": [datetime.datetime(2012,1,1,3,0,0)]}, ...: dims=("time",), ...: name="x", ...: ).to_dataset() In [7]: dsnew.to_zarr("/tmp/x.zarr", append_dim="time") In [8]: ds1 = xr.open_zarr("/tmp/x.zarr") In [9]: ds1.time.values Out[9]: array(['2012-01-01T00:00:00.000000000', '2012-01-01T00:00:00.000000000'], dtype='datetime64[ns]') ``` Expected Output `In [9]: ds1.time.values Out[9]: array(['2012-01-01T00:00:00.000000000', '2012-01-01T03:00:00.000000000'], dtype='datetime64[ns]')` Problem Description Perhaps it would be useful defaulting time `dtype` to `"float64"`. Another option could be using a finer time resolution by default than that automatically defined from xarray based on the dataset times (for instance, if the units are automatically defined as "days since ...", use "seconds since...". ``` Versions Output of `xr.show_versions()` In [10]: xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.5 (default, Nov 20 2019, 09:21:52) [GCC 9.2.1 20191008] python-bits: 64 OS: Linux OS-release: 5.3.0-45-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_NZ.UTF-8 LOCALE: en_NZ.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.15.0 pandas: 1.0.1 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.8.0 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.1.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.3 cfgrib: None iris: None bottleneck: None dask: 2.14.0 distributed: 2.12.0 matplotlib: 3.2.0 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 45.3.0 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.13.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3942/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
517799069	MDU6SXNzdWU1MTc3OTkwNjk=	3486	Should performance be equivalent when opening with chunks or re-chunking a dataset?	rafa-guedes 7799184	open	2	2019-11-05T14:14:58Z	2021-08-31T15:28:04Z		CONTRIBUTOR	I was wondering if the chunking behaviour would be expected to be equivalent under two different use cases: (1) When opening a dataset using the `chunks` option; (2) When re-chunking an existing dataset using `Dataset.chunk` method. I'm interested in performance for slicing across different dimensions. In my case the performance is quite different, please see the example below: Open dataset with one single chunk along `station` dimension (fast for slicing one time) ``` In [1]: import xarray as xr In [2]: dset = xr.open_dataset( ...: "/source/wavespectra/tests/sample_files/spec20170101T00_spec.nc", ...: chunks={"station": None} ...: ) In [3]: dset Out[3]: <xarray.Dataset> Dimensions: (direction: 24, frequency: 25, station: 14048, time: 249) Coordinates: * time (time) datetime64[ns] 2017-01-01 ... 2017-02-01 * station (station) float64 1.0 2.0 3.0 ... 1.405e+04 1.405e+04 * frequency (frequency) float32 0.04118 0.045298003 ... 0.40561208 * direction (direction) float32 90.0 75.0 60.0 45.0 ... 135.0 120.0 105.0 Data variables: longitude (time, station) float32 dask.array<chunksize=(249, 14048), meta=np.ndarray> latitude (time, station) float32 dask.array<chunksize=(249, 14048), meta=np.ndarray> efth (time, station, frequency, direction) float32 dask.array<chunksize=(249, 14048, 25, 24), meta=np.ndarray> In [4]: %time lats = dset.latitude.isel(time=0).values CPU times: user 171 ms, sys: 49.2 ms, total: 220 ms Wall time: 219 ms ``` Open dataset with many size=1 chunks along `station` dimension (fast for slicing one station, slow for slicing one time) ``` In [5]: dset = xr.open_dataset( ...: "/source/wavespectra/tests/sample_files/spec20170101T00_spec.nc", ...: chunks={"station": 1} ...: ) In [6]: dset Out[6]: <xarray.Dataset> Dimensions: (direction: 24, frequency: 25, station: 14048, time: 249) Coordinates: * time (time) datetime64[ns] 2017-01-01 ... 2017-02-01 * station (station) float64 1.0 2.0 3.0 ... 1.405e+04 1.405e+04 * frequency (frequency) float32 0.04118 0.045298003 ... 0.40561208 * direction (direction) float32 90.0 75.0 60.0 45.0 ... 135.0 120.0 105.0 Data variables: longitude (time, station) float32 dask.array<chunksize=(249, 1), meta=np.ndarray> latitude (time, station) float32 dask.array<chunksize=(249, 1), meta=np.ndarray> efth (time, station, frequency, direction) float32 dask.array<chunksize=(249, 1, 25, 24), meta=np.ndarray> In [7]: %time lats = dset.latitude.isel(time=0).values CPU times: user 13.1 s, sys: 1.94 s, total: 15 s Wall time: 11.1 s ``` Try rechunk `station` into one single chunk (still slow to slice one time) ``` In [8]: dset = dset.chunk({"station": None}) In [8]: dset Out[8]: <xarray.Dataset> Dimensions: (direction: 24, frequency: 25, station: 14048, time: 249) Coordinates: * time (time) datetime64[ns] 2017-01-01 ... 2017-02-01 * station (station) float64 1.0 2.0 3.0 ... 1.405e+04 1.405e+04 * frequency (frequency) float32 0.04118 0.045298003 ... 0.40561208 * direction (direction) float32 90.0 75.0 60.0 45.0 ... 135.0 120.0 105.0 Data variables: longitude (time, station) float32 dask.array<chunksize=(249, 14048), meta=np.ndarray> latitude (time, station) float32 dask.array<chunksize=(249, 14048), meta=np.ndarray> efth (time, station, frequency, direction) float32 dask.array<chunksize=(249, 14048, 25, 24), meta=np.ndarray> In [9]: %time lats = dset.latitude.isel(time=0).values CPU times: user 9.06 s, sys: 1.13 s, total: 10.2 s Wall time: 7.7 s ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3486/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
223231729	MDU6SXNzdWUyMjMyMzE3Mjk=	1379	xr.concat consuming too much resources	rafa-guedes 7799184	open	4	2017-04-20T23:33:52Z	2021-07-08T17:42:18Z		CONTRIBUTOR	Hi, I am reading in several (~1000) small ascii files into Dataset objects and trying to concatenate them over one specific dimension but I eventually blow my memory up. The file glob is not huge (~700M, my computer has ~16G) and I can do it fine if I only read in the Datasets appending them to a list without concatenating them (my memory increases by 5% only or so by the time I had read them all). However, when trying to concatenate each file into one single Dataset upon reading over a loop, the processing speeds drastically reduce before I have read 10% of the files or so and my memory usage keeps going up until it eventually blows up before I read and concatenate 30% of these files (the screenshot below was taken before it blew up, the memory usage was under 20% by the start of the processing). I was wondering if this is expected, or if there something that could be improved to make that work more efficiently please. I'm changing my approach now by extracting numpy arrays from the individual Datasets, concatenating these numpy arrays and defining the final Dataset only at the end. Thanks.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1379/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
129630652	MDU6SXNzdWUxMjk2MzA2NTI=	733	coordinate variable not written in netcdf file in some cases	rafa-guedes 7799184	open	5	2016-01-29T00:55:54Z	2020-12-25T16:49:54Z		CONTRIBUTOR	I came across a situation where my coordinate variable was not dumped as a variable in the output netcdf file using `dataset.to_netcdf`. In my case I managed to fix it by simply adding variable attributes to this coordinate variable (which didn't have any). The situation where that happened was while creating a sliced dataset with `dataset.isel_points` method which automatically defines a new coordinate called `points` in the sliced dataset. If I dump that dataset as is, the coordinate isn't written as a variable in the netcdf. adding attributes to `points` however changes that. Here is an example: ``` In [1]: import xarray as xr In [2]: ds = xr.open_dataset('netcdf_file_with_longitude_and_latitude.nc') In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (latitude: 576, longitude: 1152, time: 745) Coordinates: * latitude (latitude) float64 -89.76 -89.45 -89.14 -88.83 -88.52 -88.2 ... * longitude (longitude) float64 0.0 0.3125 0.625 0.9375 1.25 1.562 1.875 ... * time (time) datetime64[ns] 1979-01-01 1979-01-01T01:00:00 ... Data variables: ugrd10m (time, latitude, longitude) float64 0.2094 0.25 0.2799 0.3183 ... vgrd10m (time, latitude, longitude) float64 -5.929 -5.918 -5.918 ... In [4]: ds2 = ds.isel_points(longitude=[0], latitude=[0]).reset_coords() In [5]: ds2 Out[5]: <xarray.Dataset> Dimensions: (points: 1, time: 745) Coordinates: * time (time) datetime64[ns] 1979-01-01 1979-01-01T01:00:00 ... * points (points) int64 0 Data variables: latitude (points) float64 -89.76 vgrd10m (points, time) float64 -5.929 -6.078 -6.04 -5.958 -5.858 ... ugrd10m (points, time) float64 0.2094 0.109 0.008546 -0.09828 -0.2585 ... longitude (points) float64 0.0 In [6]: ds2['points'].attrs Out[6]: OrderedDict() In [7]: ds2.to_netcdf('/home/rafael/ncout1.nc') In [8]: ds2['points'].attrs.update({'standard_name': 'site'}) In [9]: ds2['points'].attrs Out[9]: OrderedDict([('standard_name', 'site')]) In [10]: ds2.to_netcdf('/home/rafael/ncout2.nc') ``` Here is the ncdump output for these two files: $ ncdump -h /home/rafael/ncout1.nc netcdf ncout1 { dimensions: time = 745 ; points = 1 ; variables: double time(time) ; time:_FillValue = 9.999e+20 ; string time:long_name = "verification time generated by wgrib2 function verftime()" ; time:reference_time = 283996800. ; time:reference_time_type = 0 ; string time:reference_date = "1979.01.01 00:00:00 UTC" ; string time:reference_time_description = "kind of product unclear, reference date is variable, min found reference date is given" ; string time:time_step_setting = "auto" ; time:time_step = 3600. ; string time:units = "seconds since 1970-01-01" ; time:calendar = "proleptic_gregorian" ; double latitude(points) ; string latitude:units = "degrees_north" ; string latitude:long_name = "latitude" ; double vgrd10m(points, time) ; string vgrd10m:short_name = "vgrd10m" ; string vgrd10m:long_name = "V-Component of Wind" ; string vgrd10m:level = "10 m above ground" ; string vgrd10m:units = "m/s" ; double ugrd10m(points, time) ; string ugrd10m:short_name = "ugrd10m" ; string ugrd10m:long_name = "U-Component of Wind" ; string ugrd10m:level = "10 m above ground" ; string ugrd10m:units = "m/s" ; double longitude(points) ; string longitude:units = "degrees_east" ; string longitude:long_name = "longitude" ; } $ ncdump -h /home/rafael/ncout2.nc netcdf ncout2 { dimensions: time = 745 ; points = 1 ; variables: double time(time) ; time:_FillValue = 9.999e+20 ; string time:long_name = "verification time generated by wgrib2 function verftime()" ; time:reference_time = 283996800. ; time:reference_time_type = 0 ; string time:reference_date = "1979.01.01 00:00:00 UTC" ; string time:reference_time_description = "kind of product unclear, reference date is variable, min found reference date is given" ; string time:time_step_setting = "auto" ; time:time_step = 3600. ; string time:units = "seconds since 1970-01-01" ; time:calendar = "proleptic_gregorian" ; double latitude(points) ; string latitude:units = "degrees_north" ; string latitude:long_name = "latitude" ; double vgrd10m(points, time) ; string vgrd10m:short_name = "vgrd10m" ; string vgrd10m:long_name = "V-Component of Wind" ; string vgrd10m:level = "10 m above ground" ; string vgrd10m:units = "m/s" ; double ugrd10m(points, time) ; string ugrd10m:short_name = "ugrd10m" ; string ugrd10m:long_name = "U-Component of Wind" ; string ugrd10m:level = "10 m above ground" ; string ugrd10m:units = "m/s" ; double longitude(points) ; string longitude:units = "degrees_east" ; string longitude:long_name = "longitude" ; int64 points(points) ; points:standard_name = "site" ; }	{ "url": "https://api.github.com/repos/pydata/xarray/issues/733/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
518966560	MDU6SXNzdWU1MTg5NjY1NjA=	3490	Dataset global attributes dropped when performing operations against numpy data type	rafa-guedes 7799184	closed	2	2019-11-07T00:22:04Z	2020-10-14T16:29:51Z	2020-10-14T16:29:51Z	CONTRIBUTOR	Operations against numpy data types seem cause global attributes in dataset to be dropped, example below. I also noticed in a real dataset with multiple dimensions that the order of `dset.coords` was swapped. ``` In [1]: import numpy as np In [2]: import pandas as pd In [3]: import xarray as xr In [4]: dset = xr.DataArray( ...: np.random.rand(4, 3), ...: [("time", pd.date_range("2000-01-01", periods=4)), ("space", ["IA", "IL", "IN"])], ...: name="test", ...: ).to_dataset() ...: dset.attrs = {"attr1": "val1", "attr2": "val2"} In [5]: 1.0 * dset Out[5]: <xarray.Dataset> Dimensions: (space: 3, time: 4) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 * space (space) <U2 'IA' 'IL' 'IN' Data variables: test (time, space) float64 0.3114 0.8757 0.4467 ... 0.2784 0.8502 0.581 Attributes: attr1: val1 attr2: val2 In [6]: np.float64(1.0) * dset Out[6]: <xarray.Dataset> Dimensions: (space: 3, time: 4) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 * space (space) <U2 'IA' 'IL' 'IN' Data variables: test (time, space) float64 0.3114 0.8757 0.4467 ... 0.2784 0.8502 0.581 In [7]: xr.version Out[7]: '0.14.0' ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3490/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
117018372	MDU6SXNzdWUxMTcwMTgzNzI=	658	are there methods to abstract coordinate variables?	rafa-guedes 7799184	closed	2	2015-11-15T21:15:30Z	2019-01-30T02:21:03Z	2019-01-30T02:21:03Z	CONTRIBUTOR	Hi guys, just wondering if there are, or if you plan to implement, some methods similar to cdms2's getLongitude(), getLatitude(), getTime(), getLevel(), which allow reading these coordinate variables without knowing a prior how they are called in the netcdf files? Thanks	{ "url": "https://api.github.com/repos/pydata/xarray/issues/658/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
298839307	MDU6SXNzdWUyOTg4MzkzMDc=	1932	Not able to slice dataset using its own coordinate value	rafa-guedes 7799184	closed	2	2018-02-21T04:35:01Z	2018-02-27T01:13:45Z	2018-02-27T01:13:45Z	CONTRIBUTOR	Code Sample, a copy-pastable example if possible `python In [1]: import xarray as xr In [2]: ds = xr.open_dataset('test.nc') In [3]: ds.sel(time=ds.time[0]) #works In [4]: ds.sel(time=ds.time[1], method='nearest') #works In [5]: ds.sel(time=ds.time[1]) #does not work` ```python In [6]: ds.time[0] Out[6]: <xarray.DataArray 'time' ()> array('2018-02-12T06:00:00.000000000', dtype='datetime64[ns]') Coordinates: time datetime64[ns] 2018-02-12T06:00:00 site float64 ... Attributes: standard_name: time In [7]: ds.time[1] Out[7]: <xarray.DataArray 'time' ()> array('2018-02-12T06:59:59.999986000', dtype='datetime64[ns]') Coordinates: time datetime64[ns] 2018-02-12T06:59:59.999986 site float64 ... Attributes: standard_name: time ``` Problem description xarray sometimes fails to slice using its own coordinate values. It looks like it may have to do with precision. Traceback below, test file attached. ```python In [7]: ds.sel(time=ds.time[1]) KeyError Traceback (most recent call last) <ipython-input-7-371d2f896b4a> in <module>() ----> 1 ds.sel(time=ds.time[1]) /usr/lib/python2.7/site-packages/xarray/core/dataset.pyc in sel(self, method, tolerance, drop, indexers) 1444 1445 pos_indexers, new_indexes = indexing.remap_label_indexers( -> 1446 self, v_indexers, method=method, tolerance=tolerance 1447 ) 1448 # attach indexer's coordinate to pos_indexers /usr/lib/python2.7/site-packages/xarray/core/indexing.pyc in remap_label_indexers(data_obj, indexers, method, tolerance) 234 else: 235 idxr, new_idx = convert_label_indexer(index, label, --> 236 dim, method, tolerance) 237 pos_indexers[dim] = idxr 238 if new_idx is not None: /usr/lib/python2.7/site-packages/xarray/core/indexing.pyc in convert_label_indexer(index, label, index_name, method, tolerance) 163 indexer, new_index = index.get_loc_level(label.item(), level=0) 164 else: --> 165 indexer = get_loc(index, label.item(), method, tolerance) 166 elif label.dtype.kind == 'b': 167 indexer = label /usr/lib/python2.7/site-packages/xarray/core/indexing.pyc in get_loc(index, label, method, tolerance) 93 def get_loc(index, label, method=None, tolerance=None): 94 kwargs = _index_method_kwargs(method, tolerance) ---> 95 return index.get_loc(label, kwargs) 96 97 /usr/lib/python2.7/site-packages/pandas/core/indexes/datetimes.pyc in get_loc(self, key, method, tolerance) 1444 return Index.get_loc(self, stamp, method, tolerance) 1445 except KeyError: -> 1446 raise KeyError(key) 1447 except ValueError as e: 1448 # list-like tolerance size must match target index size KeyError: 1518418799999986000L ``` Expected Output Output of `xr.show_versions()` ```python In [9]: xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 2.7.14.final.0 python-bits: 64 OS: Linux OS-release: 4.14.15-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_NZ.UTF-8 LOCALE: en_NZ.UTF-8 xarray: 0.10.0 pandas: 0.22.0 numpy: 1.14.0 scipy: 0.17.1 netCDF4: 1.2.9 h5netcdf: None Nio: None bottleneck: None cyordereddict: None dask: 0.11.1 matplotlib: 2.1.0 cartopy: 0.14.2 seaborn: None setuptools: 34.2.0 pip: 9.0.1 conda: None pytest: 3.3.1 IPython: 5.2.2 sphinx: None ``` [test.zip](https://github.com/pydata/xarray/files/1742872/test.zip)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1932/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
128980804	MDU6SXNzdWUxMjg5ODA4MDQ=	728	Cannot inherit DataArray anymore in 0.7 release	rafa-guedes 7799184	closed	15	2016-01-26T23:57:03Z	2017-05-24T17:30:35Z	2016-01-29T02:48:57Z	CONTRIBUTOR	I understand from @shoyer that inheriting from DataArray may not be the best approach to extend DataArray with other specific methods but this was working before the latest release, and it is not working anymore. Just wondering if this would be some issue caused by new internal structure of DataArray, or maybe something I'm doing wrong? For example, the code below works using xray.0.6.1: ``` import numpy as np import xarray as xr #xarray.0.7.0 import xray as xr #xray.0.6.1 class NewArray(xr.DataArray): def init(self, darray): super(NewArray, self).init(darray, name='spec') data = np.random.randint(0,10,12).reshape(4,3) x = [10,20,30] y = [1,2,3,4] darray = xr.DataArray(data, coords={'y': y, 'x': x}, dims=['y','x']) narray = NewArray(darray) print 'xr version: %s\n' % xr.version print 'DataArray object:\n%s\n' % darray print 'NewArray object:\n%s' % narray ``` but it does not work anymore when using the new xarray release. The NewArray instance is actually created, but if I try to access this object, or its narray.coords attribute, I get the traceback below. I can however access some other attributes from narray such as narray.values or narray.dims. ``` TypeError Traceback (most recent call last) /source/pymsl/pymsl/core/tests/inherit_test.py in <module>() 16 print 'xr version: %s\n' % xr.version 17 print 'DataArray object:\n%s\n' % darray ---> 18 print 'NewArray object:\n%s' % narray /usr/local/lib/python2.7/site-packages/xarray/core/common.pyc in repr(self) 76 77 def repr(self): ---> 78 return formatting.array_repr(self) 79 80 def _iter(self): /usr/local/lib/python2.7/site-packages/xarray/core/formatting.pyc in array_repr(arr) 254 if hasattr(arr, 'coords'): 255 if arr.coords: --> 256 summary.append(repr(arr.coords)) 257 258 if arr.attrs: /usr/local/lib/python2.7/site-packages/xarray/core/coordinates.pyc in repr(self) 64 65 def repr(self): ---> 66 return formatting.coords_repr(self) 67 68 @property /usr/local/lib/python2.7/site-packages/xarray/core/formatting.pyc in _mapping_repr(mapping, title, summarizer, col_width) 208 summary = ['%s:' % title] 209 if mapping: --> 210 summary += [summarizer(k, v, col_width) for k, v in mapping.items()] 211 else: 212 summary += [EMPTY_REPR] /usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_abcoll.pyc in items(self) 412 def items(self): 413 "D.items() -> list of D's (key, value) pairs, as 2-tuples" --> 414 return [(key, self[key]) for key in self] 415 416 def values(self): /usr/local/lib/python2.7/site-packages/xarray/core/coordinates.pyc in getitem(self, key) 44 key.split('.')[0] in self._names)): 45 # allow indexing current coordinates or components ---> 46 return self._data[key] 47 else: 48 raise KeyError(key) /usr/local/lib/python2.7/site-packages/xarray/core/dataarray.pyc in getitem(self, key) 395 _, key, var = _get_virtual_variable(self._coords, key) 396 --> 397 return self._replace_maybe_drop_dims(var, name=key) 398 else: 399 # orthogonal array indexing /usr/local/lib/python2.7/site-packages/xarray/core/dataarray.pyc in _replace_maybe_drop_dims(self, variable, name) 234 coords = OrderedDict((k, v) for k, v in self._coords.items() 235 if set(v.dims) <= allowed_dims) --> 236 return self._replace(variable, coords, name) 237 238 __this_array = _ThisArray() /usr/local/lib/python2.7/site-packages/xarray/core/dataarray.pyc in _replace(self, variable, coords, name) 225 if name is self.__default: 226 name = self.name --> 227 return type(self)(variable, coords, name=name, fastpath=True) 228 229 def _replace_maybe_drop_dims(self, variable, name=__default): TypeError: init() takes exactly 2 arguments (5 given) ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/728/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
124915222	MDU6SXNzdWUxMjQ5MTUyMjI=	706	Subclassing Dataset and DataArray	rafa-guedes 7799184	closed	8	2016-01-05T07:55:03Z	2016-05-24T22:14:30Z	2016-05-13T16:48:37Z	CONTRIBUTOR	Hi guys, I have started writing some SpecArray class which inherits from DataArray and defines some methods useful for dealing with wave spectra, such as calculating spectral wave statistics like significant wave height, peak wave period, etc, interpolating, splitting, and performing some other tasks. I'd like to ask please if: - Is this something you guys would maybe be interested to add to your library? - Is there a simple way to ensure the methods I am defining are preserved when creating a Dataset out of this SpecArray object? currently I can create / add to a Dataset using this new object, but all new methods get lost by doing that. Thanks, Rafael	{ "url": "https://api.github.com/repos/pydata/xarray/issues/706/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
123384529	MDU6SXNzdWUxMjMzODQ1Mjk=	682	to_netcdf: not able to set dtype encoding with netCDF4 backend	rafa-guedes 7799184	closed	1	2015-12-21T23:57:56Z	2016-01-08T01:27:58Z	2016-01-08T01:27:58Z	CONTRIBUTOR	I'm trying to set `dtype` as an encoding option when saving a netcdf file using to_netcdf with netCDF4 backend. However there is a checking in `netCDF4_._extract_nc4_encoding` that currently doesn't seem to allow it. `dtype` is not defined in the valid_encoding set. When I include `dtype` there, I am able to use dtype as an encoding option, as long as the `dtype` key is a tuple in the encoding nested dictionary argument: `dset_nearest.to_netcdf('/home/rafael/tmp/test.nc', format='netcdf4_CLASSIC', encoding={'specden': {('dtype',): 'float32'}})` When `dtype` key is a string, this does not work either because the invalid list comprehension opens up the string into characters: ``` ValueError Traceback (most recent call last) <ipython-input-3-72122e207569> in <module>() ----> 1 dset_nearest.to_netcdf('/home/rafael/tmp/test3.nc', format='netcdf3_CLASSIC', encoding={'specden': {'dtype': 'float32'}}) /source/xray/xray/core/dataset.pyc in to_netcdf(self, path, mode, format, group, engine, encoding) 880 from ..backends.api import to_netcdf 881 return to_netcdf(self, path, mode, format=format, group=group, --> 882 engine=engine, encoding=encoding) 883 884 dump = utils.function_alias(to_netcdf, 'dump') /source/xray/xray/backends/api.pyc in to_netcdf(dataset, path, mode, format, group, engine, writer, encoding) 352 store = store_cls(path, mode, format, group, writer) 353 try: --> 354 dataset.dump_to_store(store, sync=sync, encoding=encoding) 355 if isinstance(path, BytesIO): 356 return path.getvalue() /source/xray/xray/core/dataset.pyc in dump_to_store(self, store, encoder, sync, encoding) 826 variables, attrs = encoder(variables, attrs) 827 --> 828 store.store(variables, attrs, check_encoding) 829 if sync: 830 store.sync() /source/xray/xray/backends/common.pyc in store(self, variables, attributes, check_encoding_set) 226 cf_variables, cf_attrs = cf_encoder(variables, attributes) 227 AbstractWritableDataStore.store(self, cf_variables, cf_attrs, --> 228 check_encoding_set) /source/xray/xray/backends/common.pyc in store(self, variables, attributes, check_encoding_set) 201 if not (k in neccesary_dims and 202 is_trivial_index(v))) --> 203 self.set_variables(variables, check_encoding_set) 204 205 def set_attributes(self, attributes): /source/xray/xray/backends/common.pyc in set_variables(self, variables, check_encoding_set) 211 name = _encode_variable_name(vn) 212 check = vn in check_encoding_set --> 213 target, source = self.prepare_variable(name, v, check) 214 self.writer.add(source, target) 215 /source/xray/xray/backends/netCDF4_.py in prepare_variable(self, name, variable, check_encoding) 260 261 encoding = _extract_nc4_encoding(variable, --> 262 raise_on_invalid=check_encoding) 263 nc4_var = self.ds.createVariable( 264 varname=name, /source/xray/xray/backends/netCDF4_.py in _extract_nc4_encoding(variable, raise_on_invalid, lsd_okay, backend) 157 if raise_on_invalid: 158 import pdb; pdb.set_trace() --> 159 invalid = [k for enc in encoding for k in enc 160 if k not in valid_encodings] 161 if invalid: ValueError: unexpected encoding parameters for 'netCDF4' backend: ['d', 't', 'y', 'p', 'e'] ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/682/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
117262604	MDU6SXNzdWUxMTcyNjI2MDQ=	660	time slice cannot be list	rafa-guedes 7799184	closed	3	2015-11-17T01:53:15Z	2015-11-18T02:25:36Z	2015-11-18T02:25:29Z	CONTRIBUTOR	Not sure this is a problem or expected behaviour. When slicing a variable from dataset using sel() method, if I only want one time, the time slice cannot be in a list (in my case, it failed when level slice had more than one value). However, with a scalar float, it works. Please see example below where I try to slice from CFSR currents file. This does not work: `dset['uo'].sel(latitude=-49, longitude=162.75, lev=[0,100,1000], time=[1.41747840e+09], method='nearest')` * IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load(). Original traceback: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/xray-0.6.1_15_g5109f4f-py2.7.egg/xray/backends/netCDF4_.py", line 47, in getitem** data = getitem(self.array, key) File "netCDF4.pyx", line 2991, in netCDF4.Variable.getitem (netCDF4.c:36676) File "/usr/lib64/python2.7/site-packages/netCDF4_utils.py", line 245, in _StartCountStride raise IndexError("Indice mismatch. Indices must have the same length.") IndexError: Indice mismatch. Indices must have the same length. This works: `dset['uo'].sel(latitude=-49, longitude=162.75, lev=[0,100,1000], time=1.41747840e+09, method='nearest')` <xray.DataArray 'uo' (lev: 3)> array([ 0.024, 0.012, -0.008]) Coordinates: latitude float64 -48.75 lev (lev) float64 5.0 105.0 949.0 longitude float64 162.8 time float64 1.417e+09 Attributes: short_name: uo long_name: U-Component of Current level: Depth below sea surface units: m/s	{ "url": "https://api.github.com/repos/pydata/xarray/issues/660/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
117478779	MDU6SXNzdWUxMTc0Nzg3Nzk=	662	Problem with checking in Variable._parse_dimensions() (xray.core.variable)	rafa-guedes 7799184	closed	12	2015-11-17T23:53:26Z	2015-11-18T02:18:43Z	2015-11-18T02:18:43Z	CONTRIBUTOR	I have had some problem with some dataset I have created by slicing from an existing dataset. Some operations I’m trying to perform on that new dataset fail because it doesn't pass some checking that goes on in xray.core.variable.py (Variable._parse_dimensions()). This is the original dataset: <xray.Dataset> Dimensions: (depth: 40, lat: 2001, lon: 4500, time: 1) Coordinates: * time (time) float64 3.502e+04 * depth (depth) float64 0.0 2.0 4.0 6.0 8.0 10.0 12.0 15.0 20.0 25.0 ... * lat (lat) float64 -80.0 -79.92 -79.84 -79.76 -79.68 -79.6 -79.52 ... * lon (lon) float64 -180.0 -179.9 -179.8 -179.8 -179.7 -179.6 ... Data variables: tau (time) float64 0.0 water_u (time, depth, lat, lon) float64 nan nan nan nan nan nan nan ... water_v (time, depth, lat, lon) float64 nan nan nan nan nan nan nan ... water_temp (time, depth, lat, lon) float64 nan nan nan nan nan nan nan ... salinity (time, depth, lat, lon) float64 nan nan nan nan nan nan nan ... surf_el (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ... Attributes: classification_level: UNCLASSIFIED distribution_statement: Approved for public release. Distribution unlimited. downgrade_date: not applicable classification_authority: not applicable institution: Naval Oceanographic Office source: HYCOM archive file history: archv2ncdf3z field_type: instantaneous Conventions: CF-1.0 NAVO_netcdf_v1.0 This is how I have sliced it to create my new dataset, `dset_new = dset['water_u'].sel(lat=[-30], depth=[0, 10, 20], lon=[0], method='nearest').to_dataset(name='water_u')` And this is how the new dataset looks like: `<xray.Dataset> Dimensions: (depth: 3, lat: 1, lon: 1, time: 1) Coordinates: * lat (lat) float64 -30.0 * depth (depth) float64 0.0 10.0 20.0 * lon (lon) float64 0.0 * time (time) float64 3.502e+04 Data variables: water_u (time, depth, lat, lon) float64 0.104 0.138 0.144` This only has one variable, but in my case I also add some others. I could not identify anything obviously wrong with this new dataset. I was trying to concatenate similar datasets sliced from multiple files. But the same error happens if for example I try to dump it as a netcdf using to_netcdf(). This is the most recent call of the Traceback: ``` /usr/lib/python2.7/site-packages/xray-0.6.1_15_g5109f4f-py2.7.egg/xray/core/variable.py in _parse_dimensions(self, dims) 302 raise ValueError('dimensions %s must have the same length as the ' 303 'number of data dimensions, ndim=%s' --> 304 % (dims, self.ndim)) 305 return dims 306 ValueError: dimensions (u'time', u'depth', u'lat', u'lon') must have the same length as the number of data dimensions, ndim=2 ``` I’m not sure what the “number of data dimensions” ndim represents, but my new dataset is not passing that check (len(dims)==4 but self.ndim==2). However if I comment out that check, everything works - I can concatenate datasets, and dump them to netcdf files. Thanks, Rafael	{ "url": "https://api.github.com/repos/pydata/xarray/issues/662/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
95788263	MDU6SXNzdWU5NTc4ODI2Mw==	479	Define order of coordinates / variables of netcdf created from dset	rafa-guedes 7799184	closed	1	2015-07-18T04:49:56Z	2015-07-20T03:18:25Z	2015-07-20T03:18:25Z	CONTRIBUTOR	Hi guys, I'm saving this dataset as a netcdf file: `<xray.Dataset> Dimensions: (lat: 41, lon: 41, month: 12) Coordinates: * lat (lat) float32 -45.5 -45.4 -45.3 -45.2 -45.1 -45.0 -44.9 -44.8 ... * lon (lon) float32 182.0 182.1 182.2 182.3 182.4 182.5 182.6 182.7 ... * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12 Data variables: wspd_mean (month, lat, lon) float64 9.068 9.082 9.097 9.111 9.129 9.149 ... wdir_mean (month, lat, lon) float64 112.3 112.4 112.2 112.3 112.5 112.6 ...` However I'm not sure how to preserve the order of the coordinates in the output netcdf: ``` netcdf test { dimensions: lat = 41 ; month = 12 ; lon = 41 ; variables: float lat(lat) ; double wdir_mean(month, lat, lon) ; double wspd_mean(month, lat, lon) ; float lon(lon) ; int64 month(month) ; // global attributes: :date_created = "2015-07-18 16:20:02.603378" ; } ``` Is there a way to make sure coordinates / variables are written to netcdf at some specific order please? Thanks	{ "url": "https://api.github.com/repos/pydata/xarray/issues/479/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

16 rows where type = "issue" and user = 7799184 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

Setting up attrs to the full coordinate works:

Setting attrs to the virtual coordinate does not produce any effect:

MCVE Code Sample

Expected Output

Problem Description

Versions

Open dataset with one single chunk along `station` dimension (fast for slicing one time)

Open dataset with many size=1 chunks along `station` dimension (fast for slicing one station, slow for slicing one time)

Try rechunk `station` into one single chunk (still slow to slice one time)

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of `xr.show_versions()`

import xarray as xr #xarray.0.7.0

```

```

This does not work:

This works:

Advanced export

issues

16 rows where type = "issue" and user = 7799184 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

Setting up attrs to the full coordinate works:

Setting attrs to the virtual coordinate does not produce any effect:

MCVE Code Sample

Expected Output

Problem Description

Versions

Open dataset with one single chunk along station dimension (fast for slicing one time)

Open dataset with many size=1 chunks along station dimension (fast for slicing one station, slow for slicing one time)

Try rechunk station into one single chunk (still slow to slice one time)

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of xr.show_versions()

import xarray as xr #xarray.0.7.0

```

```

This does not work:

This works:

Advanced export

Open dataset with one single chunk along `station` dimension (fast for slicing one time)

Open dataset with many size=1 chunks along `station` dimension (fast for slicing one station, slow for slicing one time)

Try rechunk `station` into one single chunk (still slow to slice one time)

Output of `xr.show_versions()`