id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1268630439,I_kwDOAMm_X85LncOn,6688,2D extrapolation not working,7799184,open,0,,,3,2022-06-12T16:11:04Z,2022-06-14T06:19:20Z,,CONTRIBUTOR,,,,"### What happened? Extrapolation does not seem to be working on 2D data arrays. The area outside the input grid is NaN in the interpolated data when using `kwargs={""fill_value"": None}` as arguments to the `interp` function (the extrapolation does work when using `scipy.interpolate.interpn` and passing `fill_value=None` along with `bounds_error=False`). This figure shows the example data arrays from the code snippet provided here: ![Screenshot from 2022-06-12 13-10-08](https://user-images.githubusercontent.com/7799184/173242484-e1be9c56-bb28-417b-a29c-54babcde96da.png) ### What did you expect to happen? Area outside the input grid filled with extrapolated data. ### Minimal Complete Verifiable Example ```Python import xarray as xr da = xr.DataArray( data=[[1, 2, 3], [3, 4, 5]], coords=dict(y=[0, 1], x=[10, 20, 30]), dims=(""y"", ""x"") ) dai = da.interp(x=[25, 30, 35], y=[0, 1], kwargs={""fill_value"": None}) ``` ### MVCE confirmation - [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [ ] Complete example — the example is self-contained, including all data and the text of any traceback. - [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.13.0-1031-gcp machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.20.2 pandas: 1.3.5 numpy: 1.19.5 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: 2.11.3 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.02.0 distributed: None matplotlib: 3.5.2 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: 0.18 sparse: None setuptools: 59.8.0 pip: 22.1.1 conda: 4.12.0 pytest: 7.1.2 IPython: 7.33.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6688/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 595492608,MDU6SXNzdWU1OTU0OTI2MDg=,3942,Time dtype encoding defaulting to `int64` when writing netcdf or zarr,7799184,open,0,,,8,2020-04-06T23:36:37Z,2021-11-11T12:32:06Z,,CONTRIBUTOR,,,," Time `dtype` encoding defaults to `""int64""` for datasets with only zero-hour times when writing to netcdf or zarr. This results in these datasets having a precision constrained by how the time units are defined (in the example below `daily` precision, given units are defined as `'days since ...'`). If we for instance create a zarr dataset using this default encoding option with such datasets, and subsequently append some non-zero times onto it, we loose the hour/minute/sec information from the appended bits. #### MCVE Code Sample ```python In [1]: ds = xr.DataArray( ...: data=[0.5], ...: coords={""time"": [datetime.datetime(2012,1,1)]}, ...: dims=(""time"",), ...: name=""x"", ...: ).to_dataset() In [2]: ds Out[2]: Dimensions: (time: 1) Coordinates: * time (time) datetime64[ns] 2012-01-01 Data variables: x (time) float64 0.5 In [3]: ds.to_zarr(""/tmp/x.zarr"") In [4]: ds1 = xr.open_zarr(""/tmp/x.zarr"") In [5]: ds1.time.encoding Out[5]: {'chunks': (1,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'units': 'days since 2012-01-01 00:00:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')} In [6]: dsnew = xr.DataArray( ...: data=[1.5], ...: coords={""time"": [datetime.datetime(2012,1,1,3,0,0)]}, ...: dims=(""time"",), ...: name=""x"", ...: ).to_dataset() In [7]: dsnew.to_zarr(""/tmp/x.zarr"", append_dim=""time"") In [8]: ds1 = xr.open_zarr(""/tmp/x.zarr"") In [9]: ds1.time.values Out[9]: array(['2012-01-01T00:00:00.000000000', '2012-01-01T00:00:00.000000000'], dtype='datetime64[ns]') ``` #### Expected Output ``` In [9]: ds1.time.values Out[9]: array(['2012-01-01T00:00:00.000000000', '2012-01-01T03:00:00.000000000'], dtype='datetime64[ns]') ``` #### Problem Description Perhaps it would be useful defaulting time `dtype` to `""float64""`. Another option could be using a finer time resolution by default than that automatically defined from xarray based on the dataset times (for instance, if the units are automatically defined as ""days since ..."", use ""seconds since..."". ``` #### Versions
Output of `xr.show_versions()` In [10]: xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.5 (default, Nov 20 2019, 09:21:52) [GCC 9.2.1 20191008] python-bits: 64 OS: Linux OS-release: 5.3.0-45-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_NZ.UTF-8 LOCALE: en_NZ.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.15.0 pandas: 1.0.1 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.8.0 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.1.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.3 cfgrib: None iris: None bottleneck: None dask: 2.14.0 distributed: 2.12.0 matplotlib: 3.2.0 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 45.3.0 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.13.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3942/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 517799069,MDU6SXNzdWU1MTc3OTkwNjk=,3486,Should performance be equivalent when opening with chunks or re-chunking a dataset? ,7799184,open,0,,,2,2019-11-05T14:14:58Z,2021-08-31T15:28:04Z,,CONTRIBUTOR,,,,"I was wondering if the chunking behaviour would be expected to be equivalent under two different use cases: (1) When opening a dataset using the `chunks` option; (2) When re-chunking an existing dataset using `Dataset.chunk` method. I'm interested in performance for slicing across different dimensions. In my case the performance is quite different, please see the example below: ### Open dataset with one single chunk along `station` dimension (fast for slicing one time) ``` In [1]: import xarray as xr In [2]: dset = xr.open_dataset( ...: ""/source/wavespectra/tests/sample_files/spec20170101T00_spec.nc"", ...: chunks={""station"": None} ...: ) In [3]: dset Out[3]: Dimensions: (direction: 24, frequency: 25, station: 14048, time: 249) Coordinates: * time (time) datetime64[ns] 2017-01-01 ... 2017-02-01 * station (station) float64 1.0 2.0 3.0 ... 1.405e+04 1.405e+04 * frequency (frequency) float32 0.04118 0.045298003 ... 0.40561208 * direction (direction) float32 90.0 75.0 60.0 45.0 ... 135.0 120.0 105.0 Data variables: longitude (time, station) float32 dask.array latitude (time, station) float32 dask.array efth (time, station, frequency, direction) float32 dask.array In [4]: %time lats = dset.latitude.isel(time=0).values CPU times: user 171 ms, sys: 49.2 ms, total: 220 ms Wall time: 219 ms ``` ### Open dataset with many size=1 chunks along `station` dimension (fast for slicing one station, slow for slicing one time) ``` In [5]: dset = xr.open_dataset( ...: ""/source/wavespectra/tests/sample_files/spec20170101T00_spec.nc"", ...: chunks={""station"": 1} ...: ) In [6]: dset Out[6]: Dimensions: (direction: 24, frequency: 25, station: 14048, time: 249) Coordinates: * time (time) datetime64[ns] 2017-01-01 ... 2017-02-01 * station (station) float64 1.0 2.0 3.0 ... 1.405e+04 1.405e+04 * frequency (frequency) float32 0.04118 0.045298003 ... 0.40561208 * direction (direction) float32 90.0 75.0 60.0 45.0 ... 135.0 120.0 105.0 Data variables: longitude (time, station) float32 dask.array latitude (time, station) float32 dask.array efth (time, station, frequency, direction) float32 dask.array In [7]: %time lats = dset.latitude.isel(time=0).values CPU times: user 13.1 s, sys: 1.94 s, total: 15 s Wall time: 11.1 s ``` ### Try rechunk `station` into one single chunk (still slow to slice one time) ``` In [8]: dset = dset.chunk({""station"": None}) In [8]: dset Out[8]: Dimensions: (direction: 24, frequency: 25, station: 14048, time: 249) Coordinates: * time (time) datetime64[ns] 2017-01-01 ... 2017-02-01 * station (station) float64 1.0 2.0 3.0 ... 1.405e+04 1.405e+04 * frequency (frequency) float32 0.04118 0.045298003 ... 0.40561208 * direction (direction) float32 90.0 75.0 60.0 45.0 ... 135.0 120.0 105.0 Data variables: longitude (time, station) float32 dask.array latitude (time, station) float32 dask.array efth (time, station, frequency, direction) float32 dask.array In [9]: %time lats = dset.latitude.isel(time=0).values CPU times: user 9.06 s, sys: 1.13 s, total: 10.2 s Wall time: 7.7 s ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3486/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 223231729,MDU6SXNzdWUyMjMyMzE3Mjk=,1379,xr.concat consuming too much resources,7799184,open,0,,,4,2017-04-20T23:33:52Z,2021-07-08T17:42:18Z,,CONTRIBUTOR,,,,"Hi, I am reading in several (~1000) small ascii files into Dataset objects and trying to concatenate them over one specific dimension but I eventually blow my memory up. The file glob is not huge (~700M, my computer has ~16G) and I can do it fine if I only read in the Datasets appending them to a list without concatenating them (my memory increases by 5% only or so by the time I had read them all). However, when trying to concatenate each file into one single Dataset upon reading over a loop, the processing speeds drastically reduce before I have read 10% of the files or so and my memory usage keeps going up until it eventually blows up before I read and concatenate 30% of these files (the screenshot below was taken before it blew up, the memory usage was under 20% by the start of the processing). I was wondering if this is expected, or if there something that could be improved to make that work more efficiently please. I'm changing my approach now by extracting numpy arrays from the individual Datasets, concatenating these numpy arrays and defining the final Dataset only at the end. Thanks. ![screenshot from 2017-04-21 11-14-27](https://cloud.githubusercontent.com/assets/7799184/25256452/e7cdd4b4-2684-11e7-9c27-e28c76317a77.png) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1379/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 129630652,MDU6SXNzdWUxMjk2MzA2NTI=,733,coordinate variable not written in netcdf file in some cases,7799184,open,0,,,5,2016-01-29T00:55:54Z,2020-12-25T16:49:54Z,,CONTRIBUTOR,,,,"I came across a situation where my coordinate variable was not dumped as a variable in the output netcdf file using `dataset.to_netcdf`. In my case I managed to fix it by simply adding variable attributes to this coordinate variable (which didn't have any). The situation where that happened was while creating a sliced dataset with `dataset.isel_points` method which automatically defines a new coordinate called `points` in the sliced dataset. If I dump that dataset as is, the coordinate isn't written as a variable in the netcdf. adding attributes to `points` however changes that. Here is an example: ``` In [1]: import xarray as xr In [2]: ds = xr.open_dataset('netcdf_file_with_longitude_and_latitude.nc') In [3]: ds Out[3]: Dimensions: (latitude: 576, longitude: 1152, time: 745) Coordinates: * latitude (latitude) float64 -89.76 -89.45 -89.14 -88.83 -88.52 -88.2 ... * longitude (longitude) float64 0.0 0.3125 0.625 0.9375 1.25 1.562 1.875 ... * time (time) datetime64[ns] 1979-01-01 1979-01-01T01:00:00 ... Data variables: ugrd10m (time, latitude, longitude) float64 0.2094 0.25 0.2799 0.3183 ... vgrd10m (time, latitude, longitude) float64 -5.929 -5.918 -5.918 ... In [4]: ds2 = ds.isel_points(longitude=[0], latitude=[0]).reset_coords() In [5]: ds2 Out[5]: Dimensions: (points: 1, time: 745) Coordinates: * time (time) datetime64[ns] 1979-01-01 1979-01-01T01:00:00 ... * points (points) int64 0 Data variables: latitude (points) float64 -89.76 vgrd10m (points, time) float64 -5.929 -6.078 -6.04 -5.958 -5.858 ... ugrd10m (points, time) float64 0.2094 0.109 0.008546 -0.09828 -0.2585 ... longitude (points) float64 0.0 In [6]: ds2['points'].attrs Out[6]: OrderedDict() In [7]: ds2.to_netcdf('/home/rafael/ncout1.nc') In [8]: ds2['points'].attrs.update({'standard_name': 'site'}) In [9]: ds2['points'].attrs Out[9]: OrderedDict([('standard_name', 'site')]) In [10]: ds2.to_netcdf('/home/rafael/ncout2.nc') ``` Here is the ncdump output for these two files: ``` $ ncdump -h /home/rafael/ncout1.nc netcdf ncout1 { dimensions: time = 745 ; points = 1 ; variables: double time(time) ; time:_FillValue = 9.999e+20 ; string time:long_name = ""verification time generated by wgrib2 function verftime()"" ; time:reference_time = 283996800. ; time:reference_time_type = 0 ; string time:reference_date = ""1979.01.01 00:00:00 UTC"" ; string time:reference_time_description = ""kind of product unclear, reference date is variable, min found reference date is given"" ; string time:time_step_setting = ""auto"" ; time:time_step = 3600. ; string time:units = ""seconds since 1970-01-01"" ; time:calendar = ""proleptic_gregorian"" ; double latitude(points) ; string latitude:units = ""degrees_north"" ; string latitude:long_name = ""latitude"" ; double vgrd10m(points, time) ; string vgrd10m:short_name = ""vgrd10m"" ; string vgrd10m:long_name = ""V-Component of Wind"" ; string vgrd10m:level = ""10 m above ground"" ; string vgrd10m:units = ""m/s"" ; double ugrd10m(points, time) ; string ugrd10m:short_name = ""ugrd10m"" ; string ugrd10m:long_name = ""U-Component of Wind"" ; string ugrd10m:level = ""10 m above ground"" ; string ugrd10m:units = ""m/s"" ; double longitude(points) ; string longitude:units = ""degrees_east"" ; string longitude:long_name = ""longitude"" ; } ``` ``` $ ncdump -h /home/rafael/ncout2.nc netcdf ncout2 { dimensions: time = 745 ; points = 1 ; variables: double time(time) ; time:_FillValue = 9.999e+20 ; string time:long_name = ""verification time generated by wgrib2 function verftime()"" ; time:reference_time = 283996800. ; time:reference_time_type = 0 ; string time:reference_date = ""1979.01.01 00:00:00 UTC"" ; string time:reference_time_description = ""kind of product unclear, reference date is variable, min found reference date is given"" ; string time:time_step_setting = ""auto"" ; time:time_step = 3600. ; string time:units = ""seconds since 1970-01-01"" ; time:calendar = ""proleptic_gregorian"" ; double latitude(points) ; string latitude:units = ""degrees_north"" ; string latitude:long_name = ""latitude"" ; double vgrd10m(points, time) ; string vgrd10m:short_name = ""vgrd10m"" ; string vgrd10m:long_name = ""V-Component of Wind"" ; string vgrd10m:level = ""10 m above ground"" ; string vgrd10m:units = ""m/s"" ; double ugrd10m(points, time) ; string ugrd10m:short_name = ""ugrd10m"" ; string ugrd10m:long_name = ""U-Component of Wind"" ; string ugrd10m:level = ""10 m above ground"" ; string ugrd10m:units = ""m/s"" ; double longitude(points) ; string longitude:units = ""degrees_east"" ; string longitude:long_name = ""longitude"" ; int64 points(points) ; points:standard_name = ""site"" ; } ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/733/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue