home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

24 rows where issue = 631085856 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 8

  • martindurant 7
  • peterdudfield 5
  • zoj613 5
  • rsignell-usgs 3
  • nbren12 1
  • dcherian 1
  • scottyhq 1
  • alaws-USGS 1

author_association 3

  • NONE 14
  • CONTRIBUTOR 8
  • MEMBER 2

issue 1

  • Document writing netcdf from xarray directly to S3 · 24 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1453911083 https://github.com/pydata/xarray/issues/4122#issuecomment-1453911083 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85WqOwr martindurant 6042212 2023-03-03T18:12:01Z 2023-03-03T18:12:01Z CONTRIBUTOR

what are the limitations of the netcdf3 standard vs netcdf4

No compression, encoding or chunking except for the one "append" dimension.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1453906696 https://github.com/pydata/xarray/issues/4122#issuecomment-1453906696 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85WqNsI zoj613 44142765 2023-03-03T18:08:07Z 2023-03-03T18:08:07Z NONE

Based on the docs

The default format is NETCDF4 if you are saving a file to disk and have the netCDF4-python library available. Otherwise, xarray falls back to using scipy to write netCDF files and defaults to the NETCDF3_64BIT format (scipy does not support netCDF4).

It appears scipy engine is safe is one does not need to be bothered with specifying engines.By the way, what are the limitations of the netcdf3 standard vs netcdf4?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1453902381 https://github.com/pydata/xarray/issues/4122#issuecomment-1453902381 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85WqMot martindurant 6042212 2023-03-03T18:04:29Z 2023-03-03T18:04:29Z CONTRIBUTOR

scipy only reads/writes netcdf2/3 ( https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.netcdf_file.html ), which is a very different and simpler format than netcdf4. The latter uses HDF5 as a container, and h5netcdf as the xarray engine. I guess "to_netcdf" is ambiguous.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1453901550 https://github.com/pydata/xarray/issues/4122#issuecomment-1453901550 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85WqMbu peterdudfield 34686298 2023-03-03T18:03:49Z 2023-03-03T18:03:49Z NONE

I never needed to specify an engine when writing, you only need it when reading the file. I use the engine="scipy" one for reading.

using engine="scipy" worked - thank you

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1453899322 https://github.com/pydata/xarray/issues/4122#issuecomment-1453899322 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85WqL46 peterdudfield 34686298 2023-03-03T18:02:04Z 2023-03-03T18:02:04Z NONE

I use the engine="scipy" one for reading.

This is netCDF3, in that case. If that's fine for you, no problem.

What do you mean this is netcdf3?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1453898602 https://github.com/pydata/xarray/issues/4122#issuecomment-1453898602 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85WqLtq martindurant 6042212 2023-03-03T18:01:30Z 2023-03-03T18:01:30Z CONTRIBUTOR

I use the engine="scipy" one for reading.

This is netCDF3, in that case. If that's fine for you, no problem.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1453897364 https://github.com/pydata/xarray/issues/4122#issuecomment-1453897364 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85WqLaU zoj613 44142765 2023-03-03T18:00:33Z 2023-03-03T18:00:33Z NONE

I never needed to specify an engine when writing, you only need it when reading the file. I use the engine="scipy" one for reading.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1453562756 https://github.com/pydata/xarray/issues/4122#issuecomment-1453562756 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85Wo5uE peterdudfield 34686298 2023-03-03T13:52:08Z 2023-03-03T17:48:15Z NONE

Maybe it is netCDF3? xarray is supposed to be able to determine the file type

with fsspec.open("s3://some_bucket/some_remote_destination.nc", mode="rb") as ff: ds = xr.open_dataset(ff)

but maybe play with the engine= argument.

Thanks, i tried to make sure it was engine=h5netcdf when saving, but not sure this worked

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1453883439 https://github.com/pydata/xarray/issues/4122#issuecomment-1453883439 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85WqIAv peterdudfield 34686298 2023-03-03T17:47:43Z 2023-03-03T17:47:43Z NONE

It could be here's some running notes of mine - https://github.com/openclimatefix/MetOfficeDataHub/issues/65

The same method is python with fsspec.open("simplecache::s3://nowcasting-nwp-development/data/test.netcdf", mode="wb") as f: dataset.to_netcdf(f,engine='h5netcdf')

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1453873364 https://github.com/pydata/xarray/issues/4122#issuecomment-1453873364 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85WqFjU alaws-USGS 108412194 2023-03-03T17:38:11Z 2023-03-03T17:38:33Z NONE

@peterdudfield Could this be an issue with how you wrote out your NetCDF file? When I write to requester pays buckets, my approach using your variables would look like this and incorporates instructions from fsspec for remote write caching:

url = "simplecache::s3://file_path/to/file.nc" with fsspec.open(url, mode="wb", s3={"profile": "default"}) as ff: # the important part is using s3={"profile":"your_aws_profile"} daymet_sel.to_netcdf(ff)

I haven't had any read issues with files saved this way.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1453558039 https://github.com/pydata/xarray/issues/4122#issuecomment-1453558039 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85Wo4kX martindurant 6042212 2023-03-03T13:48:09Z 2023-03-03T13:48:09Z CONTRIBUTOR

Maybe it is netCDF3? xarray is supposed to be able to determine the file type with fsspec.open("s3://some_bucket/some_remote_destination.nc", mode="rb") as ff: ds = xr.open_dataset(ff) but maybe play with the engine= argument.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1453553088 https://github.com/pydata/xarray/issues/4122#issuecomment-1453553088 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85Wo3XA peterdudfield 34686298 2023-03-03T13:43:58Z 2023-03-03T13:45:59Z NONE

What didn't work:

python f = fsspec.filesystem("s3", anon=False) with f.open("some_bucket/some_remote_destination.nc", mode="wb") as ff: xr.open_dataset("some_local_file.nc").to_netcdf(ff)

this results in a OSError: [Errno 29] Seek only available in read mode exception

Changing the above to

python with fsspec.open("simplecache::s3://some_bucket/some_remote_destination.nc", mode="wb") as ff: xr.open_dataset("some_local_file.nc").to_netcdf(ff)

fixed it.

How would you go about reading this file? Once it is saved in s3

Im currently getting an error ValueError: b'CDF\x02\x00\x00\x00\x00' is not the signature of a valid netCDF4 file

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1401040677 https://github.com/pydata/xarray/issues/4122#issuecomment-1401040677 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85Tgi8l zoj613 44142765 2023-01-23T21:49:46Z 2023-01-23T21:52:29Z NONE

What didn't work: python f = fsspec.filesystem("s3", anon=False) with f.open("some_bucket/some_remote_destination.nc", mode="wb") as ff: xr.open_dataset("some_local_file.nc").to_netcdf(ff) this results in a OSError: [Errno 29] Seek only available in read mode exception

Changing the above to python with fsspec.open("simplecache::s3://some_bucket/some_remote_destination.nc", mode="wb") as ff: xr.open_dataset("some_local_file.nc").to_netcdf(ff) fixed it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1400583499 https://github.com/pydata/xarray/issues/4122#issuecomment-1400583499 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85TezVL martindurant 6042212 2023-01-23T15:57:24Z 2023-01-23T15:57:24Z CONTRIBUTOR

Would you mind writing out long-hand the version that worked and the version that didn't?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1400564474 https://github.com/pydata/xarray/issues/4122#issuecomment-1400564474 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85Teur6 zoj613 44142765 2023-01-23T15:44:20Z 2023-01-23T15:44:20Z NONE

'/silt/usgs/Projects/stellwagen/CF-1.6/BUZZ_BAY/2651-A.cdf') outfile = fsspec.open('simpl

Thanks, this actually worked for me. It seems as though initializing an s3 store using fs = fsspec.S3FileSystem(...) beforehand and using it as a context manager via with fs.open(...) as out: data.to_netcdf(out) caused the failure.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1400545067 https://github.com/pydata/xarray/issues/4122#issuecomment-1400545067 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85Tep8r martindurant 6042212 2023-01-23T15:31:16Z 2023-01-23T15:31:16Z CONTRIBUTOR

I can confirm that something like the following does work, basically automating the "write local and then push" workflow: import xarray as xr import fsspec ds = xr.open_dataset('http://geoport.usgs.esipfed.org/thredds/dodsC' '/silt/usgs/Projects/stellwagen/CF-1.6/BUZZ_BAY/2651-A.cdf') outfile = fsspec.open('simplecache::gcs://mdtemp/foo2.nc', mode='wb') with outfile as f: ds.to_netcdf(f)

Unfortunately, directly writing to the remote file without a local cached file is not supported, because HDF5 does not write in a linear way.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
1400519887 https://github.com/pydata/xarray/issues/4122#issuecomment-1400519887 https://api.github.com/repos/pydata/xarray/issues/4122 IC_kwDOAMm_X85TejzP zoj613 44142765 2023-01-23T15:16:21Z 2023-01-23T15:16:21Z NONE

Is there any reliable to use to write a xr.Dataset object as a netcdf file in 2023? I tried using the above approach with fsspec but I keep getting a OSError: [Errno 29] Seek only available in read mode exception.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
745520766 https://github.com/pydata/xarray/issues/4122#issuecomment-745520766 https://api.github.com/repos/pydata/xarray/issues/4122 MDEyOklzc3VlQ29tbWVudDc0NTUyMDc2Ng== rsignell-usgs 1872600 2020-12-15T19:39:16Z 2020-12-15T19:39:16Z NONE

I'm closing this the recommended approach for writing NetCDF to object stroage is to write locally, then push.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
655298190 https://github.com/pydata/xarray/issues/4122#issuecomment-655298190 https://api.github.com/repos/pydata/xarray/issues/4122 MDEyOklzc3VlQ29tbWVudDY1NTI5ODE5MA== nbren12 1386642 2020-07-08T05:39:14Z 2020-07-08T05:39:14Z CONTRIBUTOR

I’ve run into this as well. It’s not pretty, but my usual work around is to write to a local temporary file and then upload with fsspec. I can never remember exactly which netCDF engine to use...

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
640548620 https://github.com/pydata/xarray/issues/4122#issuecomment-640548620 https://api.github.com/repos/pydata/xarray/issues/4122 MDEyOklzc3VlQ29tbWVudDY0MDU0ODYyMA== rsignell-usgs 1872600 2020-06-08T11:36:14Z 2020-06-08T11:37:21Z NONE

@martindurant, I asked @ajelenak offline and he reminded me that:

File metadata are dispersed throughout an HDF5 [and NetCDF4] file in order to support writing and modifying array sizes at any time of execution

Looking forward to simplecache:: for writing in fsspec=0.7.5!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
639771646 https://github.com/pydata/xarray/issues/4122#issuecomment-639771646 https://api.github.com/repos/pydata/xarray/issues/4122 MDEyOklzc3VlQ29tbWVudDYzOTc3MTY0Ng== rsignell-usgs 1872600 2020-06-05T20:08:37Z 2020-06-05T20:54:36Z NONE

Okay @scottyhq, I tried setting engine='h5netcdf', but still got: OSError: Seek only available in read mode Thinking about this a little more, it's pretty clear why writing NetCDF to S3 would require seek mode.

I asked @martindurant about supporting seek for writing in fsspec and he said that would be pretty hard. And in fact, the performance probably would be pretty terrible as lots of little writes would be required.

So maybe it's best just to write netcdf files locally and then push them to S3.

And to facilitate that, @martindurant merged a PR yesterday to enable simplecache for writing in fsspec, so after doing: pip install git+https://github.com/intake/filesystem_spec.git in my environment, this now works: ```python import xarray as xr import fsspec

ds = xr.open_dataset('http://geoport.usgs.esipfed.org/thredds/dodsC' '/silt/usgs/Projects/stellwagen/CF-1.6/BUZZ_BAY/2651-A.cdf')

outfile = fsspec.open('simplecache::s3://chs-pangeo-data-bucket/rsignell/foo2.nc', mode='wb', s3=dict(profile='default')) with outfile as f: ds.to_netcdf(f) `` (Here I'm tellingfsspec` to use the AWS credentials in my "default" profile)

Thanks Martin!!!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
639777701 https://github.com/pydata/xarray/issues/4122#issuecomment-639777701 https://api.github.com/repos/pydata/xarray/issues/4122 MDEyOklzc3VlQ29tbWVudDYzOTc3NzcwMQ== martindurant 6042212 2020-06-05T20:17:38Z 2020-06-05T20:17:38Z CONTRIBUTOR

The write feature for simplecache isn't released yet, of course.

It would be interesting if someone could subclass file and write locally with h5netcdf to see what kind of seeks it does. Is it popping back to some file header to update array sizes? Presumably it would need a fixed-size header to do that. Parquet and other cloud formats have the metadata at the footer exactly for this reason, so you only write once you know everything and you only ever move forward in the fie.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
639774637 https://github.com/pydata/xarray/issues/4122#issuecomment-639774637 https://api.github.com/repos/pydata/xarray/issues/4122 MDEyOklzc3VlQ29tbWVudDYzOTc3NDYzNw== dcherian 2448579 2020-06-05T20:13:04Z 2020-06-05T20:13:04Z MEMBER

I think we should add some documentation on this stuff.

We have "cloud storage buckets" under zarr( https://xarray.pydata.org/en/stable/io.html#cloud-storage-buckets) so maybe a similar section under netCDF?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
639651072 https://github.com/pydata/xarray/issues/4122#issuecomment-639651072 https://api.github.com/repos/pydata/xarray/issues/4122 MDEyOklzc3VlQ29tbWVudDYzOTY1MTA3Mg== scottyhq 3924836 2020-06-05T17:27:58Z 2020-06-05T17:27:58Z MEMBER

Not sure, but I think the h5netcdf engine is the only one that allows for file-like objects (so anything going through fsspec)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.198ms · About: xarray-datasette