home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

5 rows where state = "open" and user = 7799184 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue 5

state 1

  • open · 5 ✖

repo 1

  • xarray 5
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1268630439 I_kwDOAMm_X85LncOn 6688 2D extrapolation not working rafa-guedes 7799184 open 0     3 2022-06-12T16:11:04Z 2022-06-14T06:19:20Z   CONTRIBUTOR      

What happened?

Extrapolation does not seem to be working on 2D data arrays. The area outside the input grid is NaN in the interpolated data when using kwargs={"fill_value": None} as arguments to the interp function (the extrapolation does work when using scipy.interpolate.interpn and passing fill_value=None along with bounds_error=False).

This figure shows the example data arrays from the code snippet provided here:

What did you expect to happen?

Area outside the input grid filled with extrapolated data.

Minimal Complete Verifiable Example

```Python import xarray as xr

da = xr.DataArray( data=[[1, 2, 3], [3, 4, 5]], coords=dict(y=[0, 1], x=[10, 20, 30]), dims=("y", "x") )

dai = da.interp(x=[25, 30, 35], y=[0, 1], kwargs={"fill_value": None}) ```

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.13.0-1031-gcp machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.20.2 pandas: 1.3.5 numpy: 1.19.5 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: 2.11.3 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.02.0 distributed: None matplotlib: 3.5.2 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: 0.18 sparse: None setuptools: 59.8.0 pip: 22.1.1 conda: 4.12.0 pytest: 7.1.2 IPython: 7.33.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6688/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
595492608 MDU6SXNzdWU1OTU0OTI2MDg= 3942 Time dtype encoding defaulting to `int64` when writing netcdf or zarr rafa-guedes 7799184 open 0     8 2020-04-06T23:36:37Z 2021-11-11T12:32:06Z   CONTRIBUTOR      

Time dtype encoding defaults to "int64" for datasets with only zero-hour times when writing to netcdf or zarr.

This results in these datasets having a precision constrained by how the time units are defined (in the example below daily precision, given units are defined as 'days since ...'). If we for instance create a zarr dataset using this default encoding option with such datasets, and subsequently append some non-zero times onto it, we loose the hour/minute/sec information from the appended bits.

MCVE Code Sample

```python In [1]: ds = xr.DataArray( ...: data=[0.5], ...: coords={"time": [datetime.datetime(2012,1,1)]}, ...: dims=("time",), ...: name="x", ...: ).to_dataset()

In [2]: ds
Out[2]: <xarray.Dataset> Dimensions: (time: 1) Coordinates: * time (time) datetime64[ns] 2012-01-01 Data variables: x (time) float64 0.5

In [3]: ds.to_zarr("/tmp/x.zarr")

In [4]: ds1 = xr.open_zarr("/tmp/x.zarr")

In [5]: ds1.time.encoding
Out[5]: {'chunks': (1,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'units': 'days since 2012-01-01 00:00:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')}

In [6]: dsnew = xr.DataArray( ...: data=[1.5], ...: coords={"time": [datetime.datetime(2012,1,1,3,0,0)]}, ...: dims=("time",), ...: name="x", ...: ).to_dataset()

In [7]: dsnew.to_zarr("/tmp/x.zarr", append_dim="time")

In [8]: ds1 = xr.open_zarr("/tmp/x.zarr")

In [9]: ds1.time.values
Out[9]: array(['2012-01-01T00:00:00.000000000', '2012-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

```

Expected Output

In [9]: ds1.time.values Out[9]: array(['2012-01-01T00:00:00.000000000', '2012-01-01T03:00:00.000000000'], dtype='datetime64[ns]')

Problem Description

Perhaps it would be useful defaulting time dtype to "float64". Another option could be using a finer time resolution by default than that automatically defined from xarray based on the dataset times (for instance, if the units are automatically defined as "days since ...", use "seconds since...".

```

Versions

Output of `xr.show_versions()` In [10]: xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.5 (default, Nov 20 2019, 09:21:52) [GCC 9.2.1 20191008] python-bits: 64 OS: Linux OS-release: 5.3.0-45-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_NZ.UTF-8 LOCALE: en_NZ.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.15.0 pandas: 1.0.1 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.8.0 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.1.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.3 cfgrib: None iris: None bottleneck: None dask: 2.14.0 distributed: 2.12.0 matplotlib: 3.2.0 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 45.3.0 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.13.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3942/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
517799069 MDU6SXNzdWU1MTc3OTkwNjk= 3486 Should performance be equivalent when opening with chunks or re-chunking a dataset? rafa-guedes 7799184 open 0     2 2019-11-05T14:14:58Z 2021-08-31T15:28:04Z   CONTRIBUTOR      

I was wondering if the chunking behaviour would be expected to be equivalent under two different use cases:

(1) When opening a dataset using the chunks option; (2) When re-chunking an existing dataset using Dataset.chunk method.

I'm interested in performance for slicing across different dimensions. In my case the performance is quite different, please see the example below:

Open dataset with one single chunk along station dimension (fast for slicing one time)

``` In [1]: import xarray as xr

In [2]: dset = xr.open_dataset( ...: "/source/wavespectra/tests/sample_files/spec20170101T00_spec.nc", ...: chunks={"station": None} ...: )

In [3]: dset Out[3]: <xarray.Dataset> Dimensions: (direction: 24, frequency: 25, station: 14048, time: 249) Coordinates: * time (time) datetime64[ns] 2017-01-01 ... 2017-02-01 * station (station) float64 1.0 2.0 3.0 ... 1.405e+04 1.405e+04 * frequency (frequency) float32 0.04118 0.045298003 ... 0.40561208 * direction (direction) float32 90.0 75.0 60.0 45.0 ... 135.0 120.0 105.0 Data variables: longitude (time, station) float32 dask.array<chunksize=(249, 14048), meta=np.ndarray> latitude (time, station) float32 dask.array<chunksize=(249, 14048), meta=np.ndarray> efth (time, station, frequency, direction) float32 dask.array<chunksize=(249, 14048, 25, 24), meta=np.ndarray>

In [4]: %time lats = dset.latitude.isel(time=0).values CPU times: user 171 ms, sys: 49.2 ms, total: 220 ms Wall time: 219 ms ```

Open dataset with many size=1 chunks along station dimension (fast for slicing one station, slow for slicing one time)

``` In [5]: dset = xr.open_dataset( ...: "/source/wavespectra/tests/sample_files/spec20170101T00_spec.nc", ...: chunks={"station": 1} ...: )

In [6]: dset Out[6]: <xarray.Dataset> Dimensions: (direction: 24, frequency: 25, station: 14048, time: 249) Coordinates: * time (time) datetime64[ns] 2017-01-01 ... 2017-02-01 * station (station) float64 1.0 2.0 3.0 ... 1.405e+04 1.405e+04 * frequency (frequency) float32 0.04118 0.045298003 ... 0.40561208 * direction (direction) float32 90.0 75.0 60.0 45.0 ... 135.0 120.0 105.0 Data variables: longitude (time, station) float32 dask.array<chunksize=(249, 1), meta=np.ndarray> latitude (time, station) float32 dask.array<chunksize=(249, 1), meta=np.ndarray> efth (time, station, frequency, direction) float32 dask.array<chunksize=(249, 1, 25, 24), meta=np.ndarray>

In [7]: %time lats = dset.latitude.isel(time=0).values CPU times: user 13.1 s, sys: 1.94 s, total: 15 s Wall time: 11.1 s ```

Try rechunk station into one single chunk (still slow to slice one time)

``` In [8]: dset = dset.chunk({"station": None})

In [8]: dset Out[8]: <xarray.Dataset> Dimensions: (direction: 24, frequency: 25, station: 14048, time: 249) Coordinates: * time (time) datetime64[ns] 2017-01-01 ... 2017-02-01 * station (station) float64 1.0 2.0 3.0 ... 1.405e+04 1.405e+04 * frequency (frequency) float32 0.04118 0.045298003 ... 0.40561208 * direction (direction) float32 90.0 75.0 60.0 45.0 ... 135.0 120.0 105.0 Data variables: longitude (time, station) float32 dask.array<chunksize=(249, 14048), meta=np.ndarray> latitude (time, station) float32 dask.array<chunksize=(249, 14048), meta=np.ndarray> efth (time, station, frequency, direction) float32 dask.array<chunksize=(249, 14048, 25, 24), meta=np.ndarray>

In [9]: %time lats = dset.latitude.isel(time=0).values CPU times: user 9.06 s, sys: 1.13 s, total: 10.2 s Wall time: 7.7 s ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3486/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
223231729 MDU6SXNzdWUyMjMyMzE3Mjk= 1379 xr.concat consuming too much resources rafa-guedes 7799184 open 0     4 2017-04-20T23:33:52Z 2021-07-08T17:42:18Z   CONTRIBUTOR      

Hi, I am reading in several (~1000) small ascii files into Dataset objects and trying to concatenate them over one specific dimension but I eventually blow my memory up. The file glob is not huge (~700M, my computer has ~16G) and I can do it fine if I only read in the Datasets appending them to a list without concatenating them (my memory increases by 5% only or so by the time I had read them all).

However, when trying to concatenate each file into one single Dataset upon reading over a loop, the processing speeds drastically reduce before I have read 10% of the files or so and my memory usage keeps going up until it eventually blows up before I read and concatenate 30% of these files (the screenshot below was taken before it blew up, the memory usage was under 20% by the start of the processing).

I was wondering if this is expected, or if there something that could be improved to make that work more efficiently please. I'm changing my approach now by extracting numpy arrays from the individual Datasets, concatenating these numpy arrays and defining the final Dataset only at the end.

Thanks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1379/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
129630652 MDU6SXNzdWUxMjk2MzA2NTI= 733 coordinate variable not written in netcdf file in some cases rafa-guedes 7799184 open 0     5 2016-01-29T00:55:54Z 2020-12-25T16:49:54Z   CONTRIBUTOR      

I came across a situation where my coordinate variable was not dumped as a variable in the output netcdf file using dataset.to_netcdf. In my case I managed to fix it by simply adding variable attributes to this coordinate variable (which didn't have any).

The situation where that happened was while creating a sliced dataset with dataset.isel_points method which automatically defines a new coordinate called points in the sliced dataset. If I dump that dataset as is, the coordinate isn't written as a variable in the netcdf. adding attributes to points however changes that. Here is an example:

``` In [1]: import xarray as xr

In [2]: ds = xr.open_dataset('netcdf_file_with_longitude_and_latitude.nc')

In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (latitude: 576, longitude: 1152, time: 745) Coordinates: * latitude (latitude) float64 -89.76 -89.45 -89.14 -88.83 -88.52 -88.2 ... * longitude (longitude) float64 0.0 0.3125 0.625 0.9375 1.25 1.562 1.875 ... * time (time) datetime64[ns] 1979-01-01 1979-01-01T01:00:00 ... Data variables: ugrd10m (time, latitude, longitude) float64 0.2094 0.25 0.2799 0.3183 ... vgrd10m (time, latitude, longitude) float64 -5.929 -5.918 -5.918 ...

In [4]: ds2 = ds.isel_points(longitude=[0], latitude=[0]).reset_coords()

In [5]: ds2 Out[5]: <xarray.Dataset> Dimensions: (points: 1, time: 745) Coordinates: * time (time) datetime64[ns] 1979-01-01 1979-01-01T01:00:00 ... * points (points) int64 0 Data variables: latitude (points) float64 -89.76 vgrd10m (points, time) float64 -5.929 -6.078 -6.04 -5.958 -5.858 ... ugrd10m (points, time) float64 0.2094 0.109 0.008546 -0.09828 -0.2585 ... longitude (points) float64 0.0

In [6]: ds2['points'].attrs Out[6]: OrderedDict()

In [7]: ds2.to_netcdf('/home/rafael/ncout1.nc')

In [8]: ds2['points'].attrs.update({'standard_name': 'site'})

In [9]: ds2['points'].attrs Out[9]: OrderedDict([('standard_name', 'site')])

In [10]: ds2.to_netcdf('/home/rafael/ncout2.nc') ```

Here is the ncdump output for these two files:

$ ncdump -h /home/rafael/ncout1.nc netcdf ncout1 { dimensions: time = 745 ; points = 1 ; variables: double time(time) ; time:_FillValue = 9.999e+20 ; string time:long_name = "verification time generated by wgrib2 function verftime()" ; time:reference_time = 283996800. ; time:reference_time_type = 0 ; string time:reference_date = "1979.01.01 00:00:00 UTC" ; string time:reference_time_description = "kind of product unclear, reference date is variable, min found reference date is given" ; string time:time_step_setting = "auto" ; time:time_step = 3600. ; string time:units = "seconds since 1970-01-01" ; time:calendar = "proleptic_gregorian" ; double latitude(points) ; string latitude:units = "degrees_north" ; string latitude:long_name = "latitude" ; double vgrd10m(points, time) ; string vgrd10m:short_name = "vgrd10m" ; string vgrd10m:long_name = "V-Component of Wind" ; string vgrd10m:level = "10 m above ground" ; string vgrd10m:units = "m/s" ; double ugrd10m(points, time) ; string ugrd10m:short_name = "ugrd10m" ; string ugrd10m:long_name = "U-Component of Wind" ; string ugrd10m:level = "10 m above ground" ; string ugrd10m:units = "m/s" ; double longitude(points) ; string longitude:units = "degrees_east" ; string longitude:long_name = "longitude" ; }

$ ncdump -h /home/rafael/ncout2.nc netcdf ncout2 { dimensions: time = 745 ; points = 1 ; variables: double time(time) ; time:_FillValue = 9.999e+20 ; string time:long_name = "verification time generated by wgrib2 function verftime()" ; time:reference_time = 283996800. ; time:reference_time_type = 0 ; string time:reference_date = "1979.01.01 00:00:00 UTC" ; string time:reference_time_description = "kind of product unclear, reference date is variable, min found reference date is given" ; string time:time_step_setting = "auto" ; time:time_step = 3600. ; string time:units = "seconds since 1970-01-01" ; time:calendar = "proleptic_gregorian" ; double latitude(points) ; string latitude:units = "degrees_north" ; string latitude:long_name = "latitude" ; double vgrd10m(points, time) ; string vgrd10m:short_name = "vgrd10m" ; string vgrd10m:long_name = "V-Component of Wind" ; string vgrd10m:level = "10 m above ground" ; string vgrd10m:units = "m/s" ; double ugrd10m(points, time) ; string ugrd10m:short_name = "ugrd10m" ; string ugrd10m:long_name = "U-Component of Wind" ; string ugrd10m:level = "10 m above ground" ; string ugrd10m:units = "m/s" ; double longitude(points) ; string longitude:units = "degrees_east" ; string longitude:long_name = "longitude" ; int64 points(points) ; points:standard_name = "site" ; }

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/733/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 22.028ms · About: xarray-datasette