home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where issue = 449706080 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 9

  • DocOtak 2
  • NicWayand 1
  • rabernat 1
  • shoyer 1
  • euyuil 1
  • NowanIlfideme 1
  • mullenkamp 1
  • fmaussion 1
  • rebeccaringuette 1

author_association 3

  • NONE 5
  • MEMBER 3
  • CONTRIBUTOR 2

issue 1

  • Remote writing NETCDF4 files to Amazon S3 · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1516635334 https://github.com/pydata/xarray/issues/2995#issuecomment-1516635334 https://api.github.com/repos/pydata/xarray/issues/2995 IC_kwDOAMm_X85aZgTG rebeccaringuette 49281118 2023-04-20T16:38:46Z 2023-04-20T16:38:46Z NONE

Related issue: https://github.com/pydata/xarray/issues/4122

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remote writing NETCDF4 files to Amazon S3 449706080
723528226 https://github.com/pydata/xarray/issues/2995#issuecomment-723528226 https://api.github.com/repos/pydata/xarray/issues/2995 MDEyOklzc3VlQ29tbWVudDcyMzUyODIyNg== mullenkamp 2656596 2020-11-08T04:13:39Z 2020-11-08T04:13:39Z NONE

Hi all,

I'd love to have an effective method to save a netcdf4 Dataset to a bytes object (for the S3 purpose specifically). I'm currently using netcdf3 through scipy as described earlier which works fine, but I'm just missing out on some newer netcdf4 options as a consequence.

Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remote writing NETCDF4 files to Amazon S3 449706080
659441282 https://github.com/pydata/xarray/issues/2995#issuecomment-659441282 https://api.github.com/repos/pydata/xarray/issues/2995 MDEyOklzc3VlQ29tbWVudDY1OTQ0MTI4Mg== euyuil 1539596 2020-07-16T14:15:28Z 2020-07-16T14:15:28Z NONE

It looks like #23 is related. Do we have a plan about this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remote writing NETCDF4 files to Amazon S3 449706080
658540125 https://github.com/pydata/xarray/issues/2995#issuecomment-658540125 https://api.github.com/repos/pydata/xarray/issues/2995 MDEyOklzc3VlQ29tbWVudDY1ODU0MDEyNQ== shoyer 1217238 2020-07-15T04:35:35Z 2020-07-15T04:35:35Z MEMBER

That's because it falls back to the 'scipy' engine. Would be nice to have a non-hacky way to write netcdf4 files to byte streams. 😃

I agree, this would be a welcome improvement!

Currently Dataset.to_netcdf() without a path argument always using the SciPy netCDF writer, which only supports netCDF3. This is mostly because support for bytestreams is a relatively new feature in netCDF4-Python and h5py.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remote writing NETCDF4 files to Amazon S3 449706080
657798184 https://github.com/pydata/xarray/issues/2995#issuecomment-657798184 https://api.github.com/repos/pydata/xarray/issues/2995 MDEyOklzc3VlQ29tbWVudDY1Nzc5ODE4NA== NowanIlfideme 2067093 2020-07-13T21:17:06Z 2020-07-13T21:17:06Z NONE

I ran into this issue, here's a simple workaround that seems to work:

```python def dataset_to_bytes(ds: xr.Dataset, name: str = "my-dataset") -> bytes: """Converts datset to bytes."""

nc4_ds = netCDF4.Dataset(name, mode="w", diskless=True, memory=ds.nbytes)
nc4_store = NetCDF4DataStore(nc4_ds)
dump_to_store(ds, nc4_store)
res_mem = nc4_ds.close()
res_bytes = res_mem.tobytes()
return res_bytes

```

I tested this using the following:

```python import BytesIO

fname = "REDACTED.nc" ds = xr.load_dataset(fname) ds_bytes = dataset_to_bytes(ds) ds2 = xr.load_dataset(BytesIO(ds_bytes))

assert ds2.equals(ds) and all(ds2.attrs[k]==ds.attrs[k] for k in set(ds2.attrs).union(ds.attrs)) ```

The assertion holds true, however the file size on disk is different. It's possible they were saved using different netCDF4 versions, I haven't had time to test that.

I tried using just ds.to_netcdf() but get the following error:

`ValueError: NetCDF 3 does not support type |S32`

That's because it falls back to the 'scipy' engine. Would be nice to have a non-hacky way to write netcdf4 files to byte streams. :smiley:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remote writing NETCDF4 files to Amazon S3 449706080
518869785 https://github.com/pydata/xarray/issues/2995#issuecomment-518869785 https://api.github.com/repos/pydata/xarray/issues/2995 MDEyOklzc3VlQ29tbWVudDUxODg2OTc4NQ== NicWayand 1117224 2019-08-06T22:39:07Z 2019-08-06T22:39:07Z NONE

Is it possible to read mulitple netcdf files on s3 using open_mfdataset?

{
    "total_count": 3,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 3
}
  Remote writing NETCDF4 files to Amazon S3 449706080
497066189 https://github.com/pydata/xarray/issues/2995#issuecomment-497066189 https://api.github.com/repos/pydata/xarray/issues/2995 MDEyOklzc3VlQ29tbWVudDQ5NzA2NjE4OQ== DocOtak 868027 2019-05-29T18:56:17Z 2019-05-29T18:56:17Z CONTRIBUTOR

Thanks @rabernat I had forgotten about the other netcdf storage engines... do you know if h5netcdf stable enough that I should use in "production" outside of xarray for my netcdf4 reading/writing needs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remote writing NETCDF4 files to Amazon S3 449706080
497063685 https://github.com/pydata/xarray/issues/2995#issuecomment-497063685 https://api.github.com/repos/pydata/xarray/issues/2995 MDEyOklzc3VlQ29tbWVudDQ5NzA2MzY4NQ== fmaussion 10050469 2019-05-29T18:49:37Z 2019-05-29T18:49:37Z MEMBER

This takes about a minute to open for me.

It took me much longer earlier this week when I tried :roll_eyes: Is the bottleneck in the parsing of the coordinates?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remote writing NETCDF4 files to Amazon S3 449706080
497038453 https://github.com/pydata/xarray/issues/2995#issuecomment-497038453 https://api.github.com/repos/pydata/xarray/issues/2995 MDEyOklzc3VlQ29tbWVudDQ5NzAzODQ1Mw== rabernat 1197350 2019-05-29T17:42:45Z 2019-05-29T17:42:45Z MEMBER

Forget about zarr for a minute. Let's stick with the original goal of remote access to netcdf4 files in S3. You can use s3fs (or gcsfs) for this.

python import xarray as xr import s3fs fs_s3 = s3fs.S3FileSystem(anon=True) s3path = 'era5-pds/2008/01/data/air_temperature_at_2_metres.nc' remote_file_obj = fs_s3.open(s3path, mode='rb') ds = xr.open_dataset(remote_file_obj, engine='h5netcdf')

<xarray.Dataset> Dimensions: (lat: 640, lon: 1280, time0: 744) Coordinates: * lon (lon) float32 0.0 0.2812494 ... 359.718 * lat (lat) float32 89.784874 89.5062 ... -89.784874 * time0 (time0) datetime64[ns] 2008-01-01T07:00:00 ... 2008-02-01T06:00:00 Data variables: air_temperature_at_2_metres (time0, lat, lon) float32 ... Attributes: source: Reanalysis institution: ECMWF title: "ERA5 forecasts" history: Wed Jul 4 22:08:50 2018: ncatted /data.e1/wrk/s3_out_in/20...

This takes about a minute to open for me. I have not tried writing, but this is perhaps a starting point.

If you are unsatisfied by the performance of netcdf4 on cloud, I would indeed encourage you to investigate zarr.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remote writing NETCDF4 files to Amazon S3 449706080
497026828 https://github.com/pydata/xarray/issues/2995#issuecomment-497026828 https://api.github.com/repos/pydata/xarray/issues/2995 MDEyOklzc3VlQ29tbWVudDQ5NzAyNjgyOA== DocOtak 868027 2019-05-29T17:11:10Z 2019-05-29T17:12:51Z CONTRIBUTOR

Hi @Non-Descript-Individual

I've found that the netcdf4-python library really wants to have direct access to a disk/filesystem to work, it also really wants to do its own file access management. I've always attributed this to the python library being a wrapper for the netcdf C library.

My guess would be that the easiest way to do what you want is to separate the writing of the netcdf file step in xarray from the putting the file into S3. Something like this:

python x.to_netcdf('temp_file.nc') s3.upload_file('temp_file.nc', 'bucketname', 'real_name_for_temp_file.nc')

The netcdf4-python library does seem to provide an interface for the "diskless" flags. In this case, from the examples it looks to give you a bunch of bytes in a memoryview object on calling close(). I'm not sure this is accessible from xarray though.

Alternatively, @rabernat is an advocate of using zarr when putting netcdf compatible data into cloud storage, the zarr docs provide an example using s3fs

Quick edit: Here is the to_zarr docs in xarray

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remote writing NETCDF4 files to Amazon S3 449706080

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.193ms · About: xarray-datasette