home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where issue = 91109966 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • shoyer 5
  • razcore-rad 4

author_association 2

  • MEMBER 5
  • NONE 4

issue 1

  • multiple files - variable X not equal across datasets · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
115933651 https://github.com/pydata/xarray/issues/443#issuecomment-115933651 https://api.github.com/repos/pydata/xarray/issues/443 MDEyOklzc3VlQ29tbWVudDExNTkzMzY1MQ== shoyer 1217238 2015-06-27T01:22:25Z 2015-06-27T22:44:02Z MEMBER

OK, I understand now. One of these files looks like:

<xray.Dataset> Dimensions: (lat: 39, lon: 59, mean_height_agl: 50, time: 1) Coordinates: * time (time) datetime64[ns] 2011-05-21T13:00:00 * lon (lon) float64 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 ... * lat (lat) float64 32.0 33.0 34.0 35.0 36.0 37.0 38.0 39.0 ... * mean_height_agl (mean_height_agl) float64 28.28 97.21 191.1 310.7 ... Data variables: ash_concentration (mean_height_agl, lat, lon) float64 9.583e-16 ... ash_mass_loading (lat, lon) float64 1.091e-11 1.091e-11 1.091e-11 ... ash_drydep (lat, lon) float64 4.086e-10 4.084e-10 4.08e-10 ... ash_wetdep (lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... so2_concentration (mean_height_agl, lat, lon) float64 3.199e-13 ... so2_mass_loading (lat, lon) float64 2.602e-09 2.602e-09 2.602e-09 ...

The problem is that the mean_height_agl coordinate changes between each file.

Another interesting aspect of this file, which relates to how I as hoping to fix https://github.com/xray/xray/issues/438, is that it includes a time coordinate with length 1, but none of the other dataset variables use that coordinate.

This suggests to me that we need some sort of hook that can allow you to transform a single dataset before they are joined with open_mfdataset. Perhaps a preprocess argument? Then you could write, e.g.,:

``` python def fix_my_data(ds): return (ds.assign_coords( agl=('mean_height_agl', range(ds.dims['mean_height_agl']))) .swap_dims({'mean_height_agl': 'agl'}) .squeeze('time'))

ds = xray.open_mfdataset('*.nc', preprocess=fix_my_data) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multiple files - variable X not equal across datasets 91109966
115906809 https://github.com/pydata/xarray/issues/443#issuecomment-115906809 https://api.github.com/repos/pydata/xarray/issues/443 MDEyOklzc3VlQ29tbWVudDExNTkwNjgwOQ== razcore-rad 1177508 2015-06-26T22:14:09Z 2015-06-26T22:14:09Z NONE

I try concatenating on an existing axis, the 'time' axis. I uploaded a couple of files here. It's just easier and you can try experimenting.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multiple files - variable X not equal across datasets 91109966
115903258 https://github.com/pydata/xarray/issues/443#issuecomment-115903258 https://api.github.com/repos/pydata/xarray/issues/443 MDEyOklzc3VlQ29tbWVudDExNTkwMzI1OA== shoyer 1217238 2015-06-26T22:02:50Z 2015-06-26T22:02:50Z MEMBER

Do you concatenate these files along one of the existing axes or a new axis? This might require new API but should probably be supported.

Could you print two of these netCDF files that you want to automatically combine with open_mfdataset? I know they have the same structure but it's useful to see how/if the values differ.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multiple files - variable X not equal across datasets 91109966
115901388 https://github.com/pydata/xarray/issues/443#issuecomment-115901388 https://api.github.com/repos/pydata/xarray/issues/443 MDEyOklzc3VlQ29tbWVudDExNTkwMTM4OA== razcore-rad 1177508 2015-06-26T21:57:43Z 2015-06-26T21:57:43Z NONE

Well, I'm not sure if it's a bug, I would say it's more like a missing feature... in my case, each netCDF file has a different mean_height_agl coordinate, that is, they have the same length (it's 1D), but different values in each file. I can understand why it can't concatenate, but I would argue that a better way to handle this is to create a dummy coordinate (as I did) and replace the troublesome coordinate with that dummy coordinate.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multiple files - variable X not equal across datasets 91109966
115887945 https://github.com/pydata/xarray/issues/443#issuecomment-115887945 https://api.github.com/repos/pydata/xarray/issues/443 MDEyOklzc3VlQ29tbWVudDExNTg4Nzk0NQ== shoyer 1217238 2015-06-26T21:26:19Z 2015-06-26T21:26:19Z MEMBER

Marking this as a bug, I'll see if I can reproduce this with a similar dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multiple files - variable X not equal across datasets 91109966
115555123 https://github.com/pydata/xarray/issues/443#issuecomment-115555123 https://api.github.com/repos/pydata/xarray/issues/443 MDEyOklzc3VlQ29tbWVudDExNTU1NTEyMw== razcore-rad 1177508 2015-06-26T07:13:00Z 2015-06-26T07:13:00Z NONE

I get this with open_mfdataset:

Traceback (most recent call last): File "box.py", line 59, in <module> concat_dim='time')) File "/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/backends/api.py", line 205, in open_mfdataset combined = auto_combine(datasets, concat_dim=concat_dim) File "/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/core/alignment.py", line 352, in auto_combine concatenated = [_auto_concat(ds, dim=concat_dim) for ds in grouped] File "/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/core/alignment.py", line 352, in <listcomp> concatenated = [_auto_concat(ds, dim=concat_dim) for ds in grouped] File "/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/core/alignment.py", line 303, in _auto_concat return concat(datasets, dim=dim) File "/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/core/alignment.py", line 278, in concat return cls._concat(objs, dim, indexers, mode, concat_over, compat) File "/ichec/home/users/razvan/.local/lib/python3.4/site-packages/xray/core/dataset.py", line 1712, in _concat 'variable %r not %s across datasets' % (k, verb)) ValueError: variable 'mean_height_agl' not equal across datasets

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multiple files - variable X not equal across datasets 91109966
115492420 https://github.com/pydata/xarray/issues/443#issuecomment-115492420 https://api.github.com/repos/pydata/xarray/issues/443 MDEyOklzc3VlQ29tbWVudDExNTQ5MjQyMA== shoyer 1217238 2015-06-26T03:48:08Z 2015-06-26T03:48:08Z MEMBER

Do you get the error message if you specify the full path to this file in open_mfdataset?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multiple files - variable X not equal across datasets 91109966
115485265 https://github.com/pydata/xarray/issues/443#issuecomment-115485265 https://api.github.com/repos/pydata/xarray/issues/443 MDEyOklzc3VlQ29tbWVudDExNTQ4NTI2NQ== razcore-rad 1177508 2015-06-26T03:16:23Z 2015-06-26T03:16:23Z NONE

So I don't know if this is what you're asking for (I only have one dataset with this problem) but here's how it looks like:

<xray.Dataset> Dimensions: (agl: 50, lat: 39, lon: 59, time: 192) Coordinates: * lon (lon) float64 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 ... * lat (lat) float64 32.0 33.0 34.0 35.0 36.0 37.0 38.0 39.0 ... * agl (agl) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ... * time (time) datetime64[ns] 2011-05-21T13:00:00 ... mean_height_agl (time, agl) float64 28.28 97.21 191.1 310.7 460.9 ... Data variables: so2_concentration (time, agl, lat, lon) float64 3.199e-13 3.199e-13 ... ash_wetdep (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... ash_concentration (time, agl, lat, lon) float64 9.583e-16 9.581e-16 ... ash_mass_loading (time, lat, lon) float64 1.091e-11 1.091e-11 ... so2_mass_loading (time, lat, lon) float64 2.602e-09 2.602e-09 ... ash_drydep (time, lat, lon) float64 4.086e-10 4.084e-10 4.08e-10 ... Attributes:

This is read in with get_ds from above. I wouldn't be able to read it normally with xray.open_mfdataset because it would give me 'Variable mean_height_agl not equal across datasets'. But 'mean_height_agl' is indeed a coordinate per individual file, so I had to create the dummy 'agl' coordinate and convert 'mean_height_agl' to a variable basically. This way I can still treat the data as being part of one file.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multiple files - variable X not equal across datasets 91109966
115449452 https://github.com/pydata/xarray/issues/443#issuecomment-115449452 https://api.github.com/repos/pydata/xarray/issues/443 MDEyOklzc3VlQ29tbWVudDExNTQ0OTQ1Mg== shoyer 1217238 2015-06-26T01:01:45Z 2015-06-26T01:01:45Z MEMBER

Could you print two of the incompatible datasets? I'm not sure if there is a general pattern here (or not).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multiple files - variable X not equal across datasets 91109966

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.294ms · About: xarray-datasette