home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

6 rows where user = 15570875 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 5
  • open 1

type 1

  • issue 6

repo 1

  • xarray 6
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
690624634 MDU6SXNzdWU2OTA2MjQ2MzQ= 4401 problem with time axis values in line plot klindsay28 15570875 closed 0     4 2020-09-02T01:15:14Z 2021-10-23T16:27:32Z 2021-08-10T22:45:20Z NONE      

When I run the following code inside a jupyter notebook, the values on the x axis (time) in the generated plot, inserted after the code, appear to run from 0 to ~4. I expect them to run from 1 to 4, like the time values do. I can't tell if this is a problem with what xarray passes to nc_time_axis, or if it's a problem with nc_time_axis itself. Could this be looked into please?

``` import cftime import xarray as xr

time_vals = [cftime.DatetimeNoLeap(1+year, 1+month, 15) for year in range(3) for month in range(12)]

x_vals = [time_val.year + time_val.dayofyr / 365.0 for time_val in time_vals]

x_da = xr.DataArray(x_vals, coords=[time_vals], dims=["time"])

x_da.plot.line("-o"); ```

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1127.13.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.1.1 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.4.0 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.14.0 distributed: 2.14.0 matplotlib: 3.3.1 cartopy: 0.18.0 seaborn: 0.10.1 numbagg: None pint: 0.15 setuptools: 49.6.0.post20200814 pip: 20.2.2 conda: None pytest: 6.0.1 IPython: 7.17.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4401/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
535043825 MDU6SXNzdWU1MzUwNDM4MjU= 3606 confused by reference to dataset in docs for xarray.DataArray.copy klindsay28 15570875 closed 0     2 2019-12-09T16:30:23Z 2020-07-24T19:20:45Z 2020-07-24T19:20:45Z NONE      

The documentation for xarray.DataArray.copy

If deep=True, a deep copy is made of the data array. Otherwise, a shallow copy is made, so each variable in the new array’s dataset is also a variable in this array’s dataset.

I do not understand what dataset is being referred to here. In particular, there are no xarray datasets in the examples provided in this documentation. Could someone provide clarification?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3606/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
554376164 MDU6SXNzdWU1NTQzNzYxNjQ= 3718 losing shallowness of ds.copy() on ds from xr.open_dataset klindsay28 15570875 open 0     1 2020-01-23T20:02:55Z 2020-01-23T21:39:05Z   NONE      

MCVE Code Sample

``` import numpy as np import xarray as xr

xlen = 4 x = xr.DataArray(np.linspace(0.0, 1.0, xlen), dims=('x'))

varname = 'foo' xr.Dataset({varname: xr.DataArray(np.arange(xlen, dtype='float64'), dims=('x'), coords={'x': x})}).to_netcdf('ds.nc')

with xr.open_dataset('ds.nc') as ds: ds2 = ds.copy() ds2[varname][0] = 11.0 print(f'ds.equals = {ds.equals(ds2)}')

with xr.open_dataset('ds.nc') as ds: ds2 = ds.copy() print(f'ds.equals = {ds.equals(ds2)}') ds2[varname][0] = 11.0 print(f'ds.equals = {ds.equals(ds2)}') ```

Expected Output

I expect the code to write out ds.equals = True ds.equals = True ds.equals = True

However, when I run it, the last line is ds.equals = False

Problem Description

The code above writes a small xr.Dataset to a netCDF file. There are 2 context managers opening the netCDF file as ds. Both context manager blocks start by setting ds2 to a shallow copy of ds.

In the first context manager block, a value in ds2 is modified, and ds2 is compared to ds. The Datasets are still equal, confirming that the copy is shallow.

The second context manager block is the same as the first, except that ds2 is compared to ds prior changing the value the value in ds2. When this is done, the Datasets are no longer equal, indicating that ds2 is no longer a shallow copy of ds.

I don't understand how evaluating ds.equals(ds2), prior to changing a value in ds2, could decouple ds2 from ds.

I only observe this behavior when ds is set via xr.open_dataset. I don't see it when I create ds directly using xr.Dataset.

I'm rather perplexed by this.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-693.21.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.14.1 pandas: 0.25.3 numpy: 1.17.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.1 dask: 2.9.2 distributed: 2.9.3 matplotlib: 3.1.2 cartopy: None seaborn: None numbagg: None setuptools: 45.1.0.post20200119 pip: 19.3.1 conda: None pytest: 5.3.4 IPython: 7.11.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3718/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
437418525 MDU6SXNzdWU0Mzc0MTg1MjU= 2921 to_netcdf with decoded time can create file with inconsistent time:units and time_bounds:units klindsay28 15570875 closed 0     4 2019-04-25T22:08:52Z 2019-06-25T00:24:42Z 2019-06-25T00:24:42Z NONE      

Code Sample, a copy-pastable example if possible

```python import numpy as np import xarray as xr

create time and time_bounds DataArrays for Jan-1850 and Feb-1850

time_bounds_vals = np.array([[0.0, 31.0], [31.0, 59.0]]) time_vals = time_bounds_vals.mean(axis=1)

time_var = xr.DataArray(time_vals, dims='time', coords={'time':time_vals}) time_bounds_var = xr.DataArray(time_bounds_vals, dims=('time', 'd2'), coords={'time':time_vals})

create Dataset of time and time_bounds

ds = xr.Dataset(coords={'time':time_var}, data_vars={'time_bounds':time_bounds_var}) ds.time.attrs = {'bounds':'time_bounds', 'calendar':'noleap', 'units':'days since 1850-01-01'}

write Jan-1850 values to file

ds.isel(time=slice(0,1)).to_netcdf('Jan-1850.nc', unlimited_dims='time')

write Feb-1850 values to file

ds.isel(time=slice(1,2)).to_netcdf('Feb-1850.nc', unlimited_dims='time')

use open_mfdataset to read in files, combining into 1 Dataset

decode_times = True decode_cf = True ds = xr.open_mfdataset(['Jan-1850.nc', 'Feb-1850.nc'], decode_cf=decode_cf, decode_times=decode_times)

write combined Dataset out

ds.to_netcdf('JanFeb-1850.nc', unlimited_dims='time') ```

Problem description

The above code initially creates 2 netCDF files, for Jan-1850 and Feb-1850, that have the variables time and time_bounds, and time:bounds='time_bounds'. It then reads the 2 files back in as a single Dataset, using open_mfdataset, and this Dataset is written back out to a netCDF file. ncdump of this final file is ``` netcdf JanFeb-1850 { dimensions: time = UNLIMITED ; // (2 currently) d2 = 2 ; variables: int64 time(time) ; time:bounds = "time_bounds" ; time:units = "hours since 1850-01-16 12:00:00.000000" ; time:calendar = "noleap" ; double time_bounds(time, d2) ; time_bounds:_FillValue = NaN ; time_bounds:units = "days since 1850-01-01" ; time_bounds:calendar = "noleap" ; data:

time = 0, 708 ;

time_bounds = 0, 31, 31, 59 ; } `` The problem is that the units attribute fortimeandtime_bounds` are different in this file, contrary to what CF conventions requires.

The final call to to_netcdf is creating a file where time's units (and type) differ from what they are in the intermediate files. These transformations are not being applied to time_bounds.

While the change to time's type is not necessarily an issue, I do find it surprising.

This inconsistency goes away if either of decode_times or decode_cf is set to False in the python code above. In particular, the transformations to time's units and type do not happen.

The inconsistency also goes away if open_mfdataset opens a single file. In this scenario also, the transformations to time's units and type do not happen.

I think that the desired behavior is to either not apply the units and type transformations to time, or to also apply them to time_bounds. The first option would be consistent with the current single-file behavior.

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.12.62-60.64.8-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.2 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 1.1.5 distributed: 1.26.1 matplotlib: 3.0.3 cartopy: None seaborn: None setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: 4.3.1 IPython: 7.4.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2921/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
433916353 MDU6SXNzdWU0MzM5MTYzNTM= 2902 DataArray sum().values depends on chunk size klindsay28 15570875 closed 0     1 2019-04-16T18:09:33Z 2019-04-17T02:01:55Z 2019-04-17T02:01:55Z NONE      

Hi,

The code below creates a Dataset with an NxNxN DataArray that is equal to a constant val. For various re-chunked copies of the Dataset, the code computes the sum of the array, and compares it to the exact value N*N*N*val. I find that the printed values are different, at the round-off level, for different chunk sizes.

While I'm not surprised at these round-off differences, I could not find mention of such behavior in the xarray documentation.

Is this feature known to xarray developers? Do xarray developers consider it a feature or a bug?

Either way, I think it would be useful if the xarray documentation would mention that the results of some operations depends on chunk size.

code: ```import numpy as np import xarray as xr

N = 128

val = 1.9 val_array = np.full((N, N, N), val) exact_sum = N * N * N * val

ds = xr.DataArray(val_array, name='val_array', dims=['x', 'y', 'z']).to_dataset()

rel_diff = (ds['val_array'].sum().values - exact_sum) / exact_sum print('no chunking, rel_diff = %e' % rel_diff)

for chunk_x in [N//16, N//4, N]: for chunk_y in [N//16, N//4, N]: for chunk_z in [N//16, N//4, N]: ds2 = ds.chunk({'x':chunk_x, 'y':chunk_y, 'z':chunk_z}) rel_diff = (ds2['val_array'].sum().values - exact_sum) / exact_sum print('chunk_x = %3d, chunk_y = %3d, chunk_z = %3d, rel_diff = %e' \ % (chunk_x, chunk_y, chunk_z, rel_diff)) ```

results: no chunking, rel_diff = -4.557758e-15 chunk_x = 8, chunk_y = 8, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 8, chunk_z = 32, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 8, chunk_z = 128, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 32, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 32, chunk_z = 32, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 32, chunk_z = 128, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 128, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 128, chunk_z = 32, rel_diff = -2.337312e-16 chunk_x = 8, chunk_y = 128, chunk_z = 128, rel_diff = -5.843279e-16 chunk_x = 32, chunk_y = 8, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 32, chunk_y = 8, chunk_z = 32, rel_diff = -2.337312e-16 chunk_x = 32, chunk_y = 8, chunk_z = 128, rel_diff = -2.337312e-16 chunk_x = 32, chunk_y = 32, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 32, chunk_y = 32, chunk_z = 32, rel_diff = -2.337312e-16 chunk_x = 32, chunk_y = 32, chunk_z = 128, rel_diff = -5.843279e-16 chunk_x = 32, chunk_y = 128, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 32, chunk_y = 128, chunk_z = 32, rel_diff = -5.843279e-16 chunk_x = 32, chunk_y = 128, chunk_z = 128, rel_diff = 1.168656e-15 chunk_x = 128, chunk_y = 8, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 128, chunk_y = 8, chunk_z = 32, rel_diff = -2.337312e-16 chunk_x = 128, chunk_y = 8, chunk_z = 128, rel_diff = -5.843279e-16 chunk_x = 128, chunk_y = 32, chunk_z = 8, rel_diff = -2.337312e-16 chunk_x = 128, chunk_y = 32, chunk_z = 32, rel_diff = -5.843279e-16 chunk_x = 128, chunk_y = 32, chunk_z = 128, rel_diff = 1.168656e-15 chunk_x = 128, chunk_y = 128, chunk_z = 8, rel_diff = -5.843279e-16 chunk_x = 128, chunk_y = 128, chunk_z = 32, rel_diff = 1.168656e-15 chunk_x = 128, chunk_y = 128, chunk_z = 128, rel_diff = -4.557758e-15

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-693.21.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.2 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 1.1.5 distributed: 1.26.1 matplotlib: 3.0.3 cartopy: None seaborn: None setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: 4.3.1 IPython: 7.4.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2902/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
407750967 MDU6SXNzdWU0MDc3NTA5Njc= 2752 document defaults for optional arguments to open_dataset, open_mfdataset klindsay28 15570875 closed 0     5 2019-02-07T15:19:05Z 2019-02-07T18:28:57Z 2019-02-07T17:22:49Z NONE      

It would be useful if the docs for open_dataset and open_mfdataset listed the default values of optional arguments (where there is a default).

For example, the docs for open_dataset do not list the defaults for decode_times and decode_coords, and the docs for open_mfdataset do not list the defaults for data_vars and coords.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2752/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 3135.741ms · About: xarray-datasette