home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where user = 15570875 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 6

  • time decoding error with "days since" 2
  • Keep the original ordering of the coordinates 2
  • propagation of `encoding` 2
  • document defaults for optional arguments to open_dataset, open_mfdataset 1
  • to_netcdf with decoded time can create file with inconsistent time:units and time_bounds:units 1
  • confused by reference to dataset in docs for xarray.DataArray.copy 1

user 1

  • klindsay28 · 9 ✖

author_association 1

  • NONE 9
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1497455189 https://github.com/pydata/xarray/issues/6323#issuecomment-1497455189 https://api.github.com/repos/pydata/xarray/issues/6323 IC_kwDOAMm_X85ZQVpV klindsay28 15570875 2023-04-05T13:06:12Z 2023-04-05T13:06:12Z NONE

In a future where encoding has been removed from Xarray's data model entirely, would open_dataset_with_encoding, or whatever name gets settled on, still exist? It's not clear to me if removal from the data model means just removing it from Xarray's data structures, or if it also means removing it from Xarray's APIs.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  propagation of `encoding` 1158378382
1496845498 https://github.com/pydata/xarray/issues/6323#issuecomment-1496845498 https://api.github.com/repos/pydata/xarray/issues/6323 IC_kwDOAMm_X85ZOAy6 klindsay28 15570875 2023-04-05T02:46:02Z 2023-04-05T02:46:02Z NONE

In the hypothetical invocation open_dataset(..., return_encoding=True), do you envision the returned encoding as being a separate returned object, or would it still be an attribute on the Dataset object? I'm guessing the latter, because the subsequent statement 'disable all encoding propagation by discarding encoding attributes once a Dataset has been modified' doesn't make much sense to me for the former. If so, after encoding attributes are discarded, would there still be an encoding attribute on the Dataset object that the user could reset to the values prior to the Dataset modification? This would enable the user to propagate encoding values through their workflow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  propagation of `encoding` 1158378382
691146816 https://github.com/pydata/xarray/pull/4409#issuecomment-691146816 https://api.github.com/repos/pydata/xarray/issues/4409 MDEyOklzc3VlQ29tbWVudDY5MTE0NjgxNg== klindsay28 15570875 2020-09-11T15:00:01Z 2020-09-11T15:00:01Z NONE

I disagree that this is deterministic. If I run the script multiple times, the plot title varies, and I consider the plot title part of the output.

I have jupyter notebooks that create figures and use this code idiom. If I refactor code of mine that is used by these notebooks, I would like to rerun the notebooks to confirm that the notebook results don't change. Having the plot titles change at random complicates this comparison.

I think sorting the coordinates would avoid this difficulty that I encounter.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Keep the original ordering of the coordinates 694448177
691120654 https://github.com/pydata/xarray/pull/4409#issuecomment-691120654 https://api.github.com/repos/pydata/xarray/issues/4409 MDEyOklzc3VlQ29tbWVudDY5MTEyMDY1NA== klindsay28 15570875 2020-09-11T14:15:42Z 2020-09-11T14:15:42Z NONE

Here's another example that yields non-deterministic coordinate order, which propagates into a plot title when selection is done on the coordinates. When I run the code below, the title is sometimes x = 0.0, y = 0.0 and sometimes y = 0.0, x = 0.0.

This is in a new conda environment that I created using the command conda create -n title_order python=3.7 matplotlib xarray. Output from xr.show_versions() is below.

I think the non-determinism is coming from the command ds_subset = ds[['var']].

``` import numpy as np import xarray as xr

xlen = 3 ylen = 4 zlen = 5

x = xr.DataArray(np.linspace(0.0, 1.0, xlen), dims=('x')) y = xr.DataArray(np.linspace(0.0, 1.0, ylen), dims=('y')) z = xr.DataArray(np.linspace(0.0, 1.0, zlen), dims=('z'))

vals = np.arange(xlenylenzlen, dtype='float64').reshape((xlen, ylen, zlen)) da = xr.DataArray(vals, dims=('x', 'y', 'z'), coords={'x': x, 'y': y, 'z': z})

ds = xr.Dataset({'var': da}) print('coords for var in original Dataset') print(ds['var'].coords)

print('****')

ds_subset = ds[['var']] print('coords for var after subsetting') print(ds_subset['var'].coords)

print('****')

p = ds_subset['var'].isel(x=0,y=0).plot() print('title for plot() with dim selection') print(p[0].axes.get_title()) ```

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1127.13.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.16.0 pandas: 1.1.2 numpy: 1.19.1 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.3.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20200814 pip: 20.2.3 conda: None pytest: None IPython: None sphinx: None
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Keep the original ordering of the coordinates 694448177
563480104 https://github.com/pydata/xarray/issues/3606#issuecomment-563480104 https://api.github.com/repos/pydata/xarray/issues/3606 MDEyOklzc3VlQ29tbWVudDU2MzQ4MDEwNA== klindsay28 15570875 2019-12-09T23:02:24Z 2019-12-09T23:02:24Z NONE

How about

If deep=True, a deep copy is made of the data array. Otherwise, a shallow copy is made, and the returned data array's values are a new view of this data array's values.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  confused by reference to dataset in docs for xarray.DataArray.copy 535043825
489752548 https://github.com/pydata/xarray/issues/2921#issuecomment-489752548 https://api.github.com/repos/pydata/xarray/issues/2921 MDEyOklzc3VlQ29tbWVudDQ4OTc1MjU0OA== klindsay28 15570875 2019-05-06T19:52:47Z 2019-05-06T19:52:47Z NONE

It looks like ds.time.encoding is not getting set when open_mfdataset is opening multiple files. I suspect that this is leading to the surprising unit for time when the ds is written out. The code below demonstrates that ds.time.encoding is set by open_mfdataset in the single-file case and is not set in the multi-file case. However, ds.time_bounds.encoding is set in both the single- and multi-file cases.

The possibility of this is alluded to the in a comment in #2436, which relates the issue to #1614.

``` import numpy as np import xarray as xr

create time and time_bounds DataArrays for Jan-1850 and Feb-1850

time_bounds_vals = np.array([[0.0, 31.0], [31.0, 59.0]]) time_vals = time_bounds_vals.mean(axis=1)

time_var = xr.DataArray(time_vals, dims='time', coords={'time':time_vals}) time_bounds_var = xr.DataArray(time_bounds_vals, dims=('time', 'd2'), coords={'time':time_vals})

create Dataset of time and time_bounds

ds = xr.Dataset(coords={'time':time_var}, data_vars={'time_bounds':time_bounds_var}) ds.time.attrs = {'bounds':'time_bounds', 'calendar':'noleap', 'units':'days since 1850-01-01'}

write Jan-1850 values to file

ds.isel(time=slice(0,1)).to_netcdf('Jan-1850.nc', unlimited_dims='time')

write Feb-1850 values to file

ds.isel(time=slice(1,2)).to_netcdf('Feb-1850.nc', unlimited_dims='time')

use open_mfdataset to read in files, combining into 1 Dataset

decode_times = True decode_cf = True ds = xr.open_mfdataset(['Jan-1850.nc'], decode_cf=decode_cf, decode_times=decode_times)

print('time and time_bounds encoding, single-file open_mfdataset') print(ds.time.encoding) print(ds.time_bounds.encoding)

use open_mfdataset to read in files, combining into 1 Dataset

decode_times = True decode_cf = True ds = xr.open_mfdataset(['Jan-1850.nc', 'Feb-1850.nc'], decode_cf=decode_cf, decode_times=decode_times)

print('--------------------') print('time and time_bounds encoding, multi-file open_mfdataset') print(ds.time.encoding) print(ds.time_bounds.encoding) ```

produces

``` time and time_bounds encoding, single-file open_mfdataset {'zlib': False, 'shuffle': False, 'complevel': 0, 'fletcher32': False, 'contiguous': False, 'chunksizes': (512,), 'source': '/gpfs/fs1/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/Jan-1850.nc', 'original_shape': (1,), 'dtype': dtype('float64'), '_FillValue': nan, 'units': 'days since 1850-01-01', 'calendar': 'noleap'} {'zlib': False, 'shuffle': False, 'complevel': 0, 'fletcher32': False, 'contiguous': False, 'chunksizes': (1, 2), 'source': '/gpfs/fs1/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/Jan-1850.nc', 'original_shape': (1, 2), 'dtype': dtype('float64'), '_FillValue': nan, 'units': 'days since 1850-01-01', 'calendar': 'noleap'}


time and time_bounds encoding, multi-file open_mfdataset {} {'zlib': False, 'shuffle': False, 'complevel': 0, 'fletcher32': False, 'contiguous': False, 'chunksizes': (1, 2), 'source': '/gpfs/fs1/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/Jan-1850.nc', 'original_shape': (1, 2), 'dtype': dtype('float64'), '_FillValue': nan, 'units': 'days since 1850-01-01', 'calendar': 'noleap'} ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf with decoded time can create file with inconsistent time:units and time_bounds:units 437418525
474906014 https://github.com/pydata/xarray/issues/521#issuecomment-474906014 https://api.github.com/repos/pydata/xarray/issues/521 MDEyOklzc3VlQ29tbWVudDQ3NDkwNjAxNA== klindsay28 15570875 2019-03-20T16:09:54Z 2019-03-20T16:09:54Z NONE

@AJueling , do you know the provenance of the file with time.attrs.bounds /= 'time_bound'? If that file is being produced by an NCAR or CESM supplied workflow, then I am willing to see if the workflow can be corrected to keep time.attrs.bounds = 'time_bound'. With this mismatch, it seems hopeless for xarray to automatically figure out how to handle this file as it was intended to be handled.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  time decoding error with "days since"  99836561
474897336 https://github.com/pydata/xarray/issues/521#issuecomment-474897336 https://api.github.com/repos/pydata/xarray/issues/521 MDEyOklzc3VlQ29tbWVudDQ3NDg5NzMzNg== klindsay28 15570875 2019-03-20T15:52:45Z 2019-03-20T15:52:45Z NONE

@rabernat , it is not clear to me that issue 2 is an objective error in the metadata.

The CF conventions section on the bounds attribute states:

Since a boundary variable is considered to be part of a coordinate variable’s metadata, it is not necessary to provide it with attributes such as long_name and units.

Boundary variable attributes which determine the coordinate type (units, standard_name, axis and positive) or those which affect the interpretation of the array values (units, calendar, leap_month, leap_year and month_lengths) must always agree exactly with the same attributes of its associated coordinate, scalar coordinate or auxiliary coordinate variable. To avoid duplication, however, it is recommended that these are not provided to a boundary variable.

I conclude from this that software parsing CF metadata should have the variable identified by the bounds attribute inherit the attributes mentioned above from the variable with the bounds attribute. @spencerkclark describes this as a work around. One could argue that based on the CF conventions text, xarray would be justified in dong that automatically.

However, this is confounded by issue 3, that time.attrs.bounds /= 'time_bound', which I agree is an error in the metadata. As a CESM-POP developer, I'm surprised to see that. Raw model output from CESM-POP has time.attrs.bounds = 'time_bound'. So it seems like something in a post-processing workflow has the net effect of changing time.attrs.bounds, but is preserving the name of the variable bounds. That is problematic.

If CESM-POP were to adhere more closely to the CF recommendation in this section, I think it would drop time_bound.attrs.units, not add time_bound.attrs.calendar. But I don't think that is what you are suggesting.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  time decoding error with "days since"  99836561
461519515 https://github.com/pydata/xarray/issues/2752#issuecomment-461519515 https://api.github.com/repos/pydata/xarray/issues/2752 MDEyOklzc3VlQ29tbWVudDQ2MTUxOTUxNQ== klindsay28 15570875 2019-02-07T17:22:49Z 2019-02-07T17:22:49Z NONE

Thanks for the quick responses @jhamman and @dcherian. I had focused on the descriptive text below the function signature, which mentions defaults for some, but not all arguments. I now realize that I need to also examine the function signature in the docs. Sorry for the noise.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  document defaults for optional arguments to open_dataset, open_mfdataset 407750967

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.364ms · About: xarray-datasette