home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

12 rows where author_association = "MEMBER" and issue = 201428093 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • shoyer 5
  • fmaussion 5
  • jhamman 2

issue 1

  • to_netcdf() fails to append to an existing file · 12 ✖

author_association 1

  • MEMBER · 12 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
336224586 https://github.com/pydata/xarray/issues/1215#issuecomment-336224586 https://api.github.com/repos/pydata/xarray/issues/1215 MDEyOklzc3VlQ29tbWVudDMzNjIyNDU4Ng== shoyer 1217238 2017-10-12T18:25:46Z 2017-10-12T18:25:46Z MEMBER

I will give xarray.open_mfdataset a shot. Just one question - is this approach memory conservative? My reasoning for chunking in the first place is large file size.

Yes, open_mfdataset uses dask, which allows for streaming computation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() fails to append to an existing file 201428093
336213748 https://github.com/pydata/xarray/issues/1215#issuecomment-336213748 https://api.github.com/repos/pydata/xarray/issues/1215 MDEyOklzc3VlQ29tbWVudDMzNjIxMzc0OA== shoyer 1217238 2017-10-12T17:46:42Z 2017-10-12T17:46:42Z MEMBER

Is it now possible to append to a netCDF file using xarray?

No, it is not. This issue is about appending new variables to an existing netCDF file.

I think what you are looking for is to append along existing dimensions to a netCDF file. This is possible in the netCDF data model, but not yet supported by xarray. See https://github.com/pydata/xarray/issues/1398 for some discussion.

For these types of use cases, I would generally recommend writing a new netCDF file, and then loading everything afterwards using xarray.open_mfdataset.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() fails to append to an existing file 201428093
336213403 https://github.com/pydata/xarray/issues/1215#issuecomment-336213403 https://api.github.com/repos/pydata/xarray/issues/1215 MDEyOklzc3VlQ29tbWVudDMzNjIxMzQwMw== jhamman 2443309 2017-10-12T17:45:30Z 2017-10-12T17:45:30Z MEMBER

@TWellman - not yet, see #1215.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() fails to append to an existing file 201428093
334269106 https://github.com/pydata/xarray/issues/1215#issuecomment-334269106 https://api.github.com/repos/pydata/xarray/issues/1215 MDEyOklzc3VlQ29tbWVudDMzNDI2OTEwNg== shoyer 1217238 2017-10-04T19:48:21Z 2017-10-04T19:48:21Z MEMBER

+1, we probably don't want to read coordinates back from disk On Wed, Oct 4, 2017 at 12:09 PM Fabien Maussion notifications@github.com wrote:

@jhamman https://github.com/jhamman no I haven't looked into this any further (and I also forgot what my workaround at that time actually was).

I also think your example should work, and that we should never check for values on disk: if the dims and coordinates names match, write the variable and assume the coordinates are ok.

If the variable already exists on file, match the behavior of netCDF4 (I actually don't know what netCDF4 does in that case)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1215#issuecomment-334259359, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1u6et-S2R4rynbf2vr8bM1Y4WKdjks5so9gEgaJpZM4LmQme .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() fails to append to an existing file 201428093
334259359 https://github.com/pydata/xarray/issues/1215#issuecomment-334259359 https://api.github.com/repos/pydata/xarray/issues/1215 MDEyOklzc3VlQ29tbWVudDMzNDI1OTM1OQ== fmaussion 10050469 2017-10-04T19:09:55Z 2017-10-04T19:09:55Z MEMBER

@jhamman no I haven't looked into this any further (and I also forgot what my workaround at that time actually was).

I also think your example should work, and that we should never check for values on disk: if the dims and coordinates names match, write the variable and assume the coordinates are ok.

If the variable already exists on file, match the behavior of netCDF4 (I actually don't know what netCDF4 does in that case)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() fails to append to an existing file 201428093
334251264 https://github.com/pydata/xarray/issues/1215#issuecomment-334251264 https://api.github.com/repos/pydata/xarray/issues/1215 MDEyOklzc3VlQ29tbWVudDMzNDI1MTI2NA== jhamman 2443309 2017-10-04T18:40:39Z 2017-10-04T18:40:39Z MEMBER

@fmaussion and @shoyer - I have a use case that could use this. I'm wondering if either of you have looked at this any further since January?

If not, I'll propose a path forward that fits my use case and we can iterate on the details until we're satisfied:

Do we load existing variable values to check them for equality with the new values, or alternatively always skip or override them?

I don't think loading variables already written to disk is practical. My preference would be to only append missing variables/coordinates.

How do we handle cases where dims, attrs or encoding differs from the exiting variable? Do we attempt to delete and replace the existing variable, update it inplace or error?

differing dims: raise an error

I'd like to implement this but to keep it as simple as possible. A trivial use case like this should work:

```Python fname = 'out.nc' dates = pd.date_range('2016-01-01', freq='1D', periods=45) ds = xr.Dataset() for var in ['A', 'B', 'C']: ds[var] = xr.DataArray(np.random.random((len(dates), 4, 5)), dims=('time', 'x', 'y'), coords={'time': dates})

for var in ds.data_vars: ds[[var]].to_netcdf(fname, mode='a') ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() fails to append to an existing file 201428093
274319422 https://github.com/pydata/xarray/issues/1215#issuecomment-274319422 https://api.github.com/repos/pydata/xarray/issues/1215 MDEyOklzc3VlQ29tbWVudDI3NDMxOTQyMg== fmaussion 10050469 2017-01-22T09:22:52Z 2017-01-22T12:50:58Z MEMBER

I see.

but perhaps we don't need to fix this for v0.9.

Agreed, but it would be good to get this working some day. For now I can see an easy workaround for my purposes.

Another possibility would be to give the user control on whether existing variables should be ignored, overwritten or raise an error when appending to a file.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() fails to append to an existing file 201428093
274296715 https://github.com/pydata/xarray/issues/1215#issuecomment-274296715 https://api.github.com/repos/pydata/xarray/issues/1215 MDEyOklzc3VlQ29tbWVudDI3NDI5NjcxNQ== shoyer 1217238 2017-01-22T00:01:12Z 2017-01-22T00:01:12Z MEMBER

OK, I understand what's going on now.

Previously, we had a hack that disabled writing variables along dimensions of the form [0, 1, ..., n-1] to disk, because these corresponded to default coordinates and would get created automatically. We disable this hack as part of #1017, because it was no longer necessary.

So although your example worked in v0.8.2, this small variation did not, because we call netCDF4.Dataset.createVariable twice with a dimension of the name 'dim': ```python ds = xr.Dataset() ds['dim'] = ('dim', [1, 2, 3]) ds['var1'] = ('dim', [10, 11, 12]) ds.to_netcdf(path)

ds = xr.Dataset() ds['dim'] = ('dim', [1, 2, 3]) ds['var2'] = ('dim', [10, 11, 12]) ds.to_netcdf(path, 'a') ```

I find it reassuring that this only worked in limited cases before, so it unlikely that many users are depending on this functionality. It would be nice if mode='a' worked to append new variables to an existing netCDF file in the case of overlapping variables, but perhaps we don't need to fix this for v0.9.

My main concern with squeezing this in is that the proper behavior is not entirely clear and will need to go through some review:

  • Do we load existing variable values to check them for equality with the new values, or alternatively always skip or override them?
  • How do we handle cases where dims, attrs or encoding differs from the exiting variable? Do we attempt to delete and replace the existing variable, update it inplace or error?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() fails to append to an existing file 201428093
273731575 https://github.com/pydata/xarray/issues/1215#issuecomment-273731575 https://api.github.com/repos/pydata/xarray/issues/1215 MDEyOklzc3VlQ29tbWVudDI3MzczMTU3NQ== fmaussion 10050469 2017-01-19T10:05:59Z 2017-01-19T10:05:59Z MEMBER

I did a few tests: the regression happened in https://github.com/pydata/xarray/pull/1017

Something in the way coordinates variables have changes implies that the writing is happening differently now. The question is whether this should now be handled downstream (in the netcdf backend) or upstream (at the dataset level)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() fails to append to an existing file 201428093
273681045 https://github.com/pydata/xarray/issues/1215#issuecomment-273681045 https://api.github.com/repos/pydata/xarray/issues/1215 MDEyOklzc3VlQ29tbWVudDI3MzY4MTA0NQ== shoyer 1217238 2017-01-19T04:45:01Z 2017-01-19T04:45:01Z MEMBER

Good catch! Marking this as a bug.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() fails to append to an existing file 201428093
273441335 https://github.com/pydata/xarray/issues/1215#issuecomment-273441335 https://api.github.com/repos/pydata/xarray/issues/1215 MDEyOklzc3VlQ29tbWVudDI3MzQ0MTMzNQ== fmaussion 10050469 2017-01-18T10:36:41Z 2017-01-18T10:36:41Z MEMBER

Note that the problems occurs because the backend wants to write the 'dim' coordinate each time. At the second call, the coordinate variable already exists and this raises the error.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() fails to append to an existing file 201428093
273333969 https://github.com/pydata/xarray/issues/1215#issuecomment-273333969 https://api.github.com/repos/pydata/xarray/issues/1215 MDEyOklzc3VlQ29tbWVudDI3MzMzMzk2OQ== fmaussion 10050469 2017-01-17T23:25:30Z 2017-01-17T23:25:30Z MEMBER

An even simpler example:

```python import os import xarray as xr

path = 'test.nc' if os.path.exists(path): os.remove(path)

ds = xr.Dataset() ds['dim'] = ('dim', [0, 1, 2]) ds['var1'] = ('dim', [10, 11, 12]) ds['var2'] = ('dim', [13, 14, 15])

ds[['var1']].to_netcdf(path) ds[['var2']].to_netcdf(path, 'a') ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() fails to append to an existing file 201428093

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.458ms · About: xarray-datasette