home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

2 rows where user = 6079398 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 1
  • pull 1

state 1

  • closed 2

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
161435547 MDExOlB1bGxSZXF1ZXN0NzQ2MTA1NjY= 889 Add concat_dimensions kwarg to decode_cf mogismog 6079398 closed 0     3 2016-06-21T13:23:45Z 2023-09-14T02:54:30Z 2023-09-14T02:54:05Z NONE   0 pydata/xarray/pulls/889

Addressing #862 (and maybe others), this adds a new keyword argument concat_dimensions to decode_cf. This allows the user to explicitly specify which dimensions to concatenate over for string arrays. The dimensions that are concatenated over are left in the resulting Dataset. Also added tests!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/889/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
157545837 MDU6SXNzdWUxNTc1NDU4Mzc= 862 decode_cf not concatenating string arrays mogismog 6079398 closed 0     5 2016-05-30T19:05:49Z 2019-02-26T19:51:17Z 2019-02-26T19:51:17Z NONE      

TL;DR: xarray.conventions.decode_cf() doesn't seem to want to concatenate string arrays after opening up dataset with decode_cf=False.

OS: Tried on both OS X 11.10 and 11.11 xarray version: 0.7.2 installed via conda Python version: 2.7.11

Hey all,

I'm not sure if this is a bug or the intended behavior, but running xarray.conventions.decode_cf doesn't seem to concatenate 2D string arrays as promised given certain circumstances.

Specifically, MADIS netCDF files have _FillValue/missing_value conflicts. When opening up the file in xarray, the exception gives this suggestion:

ValueError: ('Discovered conflicting _FillValue and missing_value. Considering opening the offending dataset using decode_cf=False, corrected the attributes', 'and decoding explicitly using xarray.conventions.decode_cf(ds)')

Doing this, though, doesn't result in 2D string arrays being concatenated:

``` In [50]: import xarray as xr

In [51]: fname = '20160518_1200'

In [52]: ds = xr.open_dataset(fname, decode_cf=False)

In [53]: ds.stationId Out[53]: <xarray.DataArray 'stationId' (recNum: 126154, maxStaIdLen: 6)> [756924 values with dtype=|S1] Coordinates: * maxStaIdLen (maxStaIdLen) int64 0 1 2 3 4 5 * recNum (recNum) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... Attributes: long_name: alphanumeric station Id reference: station table

In [54]: for _, v in ds.variables.iteritems(): _fix_fillval_conflict(v) # You can find this function in the linked gist ....:

In [55]: decoded_ds = xr.conventions.decode_cf(ds, concat_characters=True)

In [56]: decoded_ds.stationId Out[56]: <xarray.DataArray 'stationId' (recNum: 126154, maxStaIdLen: 6)> [756924 values with dtype=|S1] Coordinates: * maxStaIdLen (maxStaIdLen) int64 0 1 2 3 4 5 * recNum (recNum) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... Attributes: long_name: alphanumeric station Id reference: station table ```

That said, if you pass decode_cf=True and option to not do things like mask_and_scale and decode_times (due to the aforementioned conflict), the string arrays get concatenated:

``` In [57]: ds = xr.open_dataset(fname, decode_cf=True, mask_and_scale=False, decode_times=False)

In [58]: ds.stationId Out[58]: <xarray.DataArray 'stationId' (recNum: 126154)> [126154 values with dtype=|S6] Coordinates: * recNum (recNum) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... Attributes: long_name: alphanumeric station Id reference: station table ```

We then can fix the conflict and run decode_cf without issue. This is an acceptable (albeit not immediately obvious or intuitive) workaround, of course, but I'm not sure if this behavior is known or intended. I'll fully admit that it may be an issue with the specific netCDF data I'm working with (NOAA MADIS data, which is kind of a train wreck w/r/t CF convention to begin with), but I don't have any other datasets with which to test.

To that extent, I've coded up tests and uploaded a gzipped MADIS netCDF file to DropBox if you're interested in reproducing this behavior.

Thanks!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/862/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 20.492ms · About: xarray-datasette