html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2995#issuecomment-1516635334,https://api.github.com/repos/pydata/xarray/issues/2995,1516635334,IC_kwDOAMm_X85aZgTG,49281118,2023-04-20T16:38:46Z,2023-04-20T16:38:46Z,NONE,Related issue: https://github.com/pydata/xarray/issues/4122 ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,449706080 https://github.com/pydata/xarray/issues/2995#issuecomment-723528226,https://api.github.com/repos/pydata/xarray/issues/2995,723528226,MDEyOklzc3VlQ29tbWVudDcyMzUyODIyNg==,2656596,2020-11-08T04:13:39Z,2020-11-08T04:13:39Z,NONE,"Hi all, I'd love to have an effective method to save a netcdf4 Dataset to a bytes object (for the S3 purpose specifically). I'm currently using netcdf3 through scipy as described earlier which works fine, but I'm just missing out on some newer netcdf4 options as a consequence. Thanks!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,449706080 https://github.com/pydata/xarray/issues/2995#issuecomment-659441282,https://api.github.com/repos/pydata/xarray/issues/2995,659441282,MDEyOklzc3VlQ29tbWVudDY1OTQ0MTI4Mg==,1539596,2020-07-16T14:15:28Z,2020-07-16T14:15:28Z,NONE,It looks like #23 is related. Do we have a plan about this?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,449706080 https://github.com/pydata/xarray/issues/2995#issuecomment-658540125,https://api.github.com/repos/pydata/xarray/issues/2995,658540125,MDEyOklzc3VlQ29tbWVudDY1ODU0MDEyNQ==,1217238,2020-07-15T04:35:35Z,2020-07-15T04:35:35Z,MEMBER,"> That's because it falls back to the `'scipy'` engine. Would be nice to have a non-hacky way to write netcdf4 files to byte streams. 😃 I agree, this would be a welcome improvement! Currently `Dataset.to_netcdf()` without a `path` argument always using the SciPy netCDF writer, which only supports netCDF3. This is mostly because support for bytestreams is a relatively new feature in netCDF4-Python and h5py.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,449706080 https://github.com/pydata/xarray/issues/2995#issuecomment-657798184,https://api.github.com/repos/pydata/xarray/issues/2995,657798184,MDEyOklzc3VlQ29tbWVudDY1Nzc5ODE4NA==,2067093,2020-07-13T21:17:06Z,2020-07-13T21:17:06Z,NONE,"I ran into this issue, here's a simple workaround that seems to work: ```python def dataset_to_bytes(ds: xr.Dataset, name: str = ""my-dataset"") -> bytes: """"""Converts datset to bytes."""""" nc4_ds = netCDF4.Dataset(name, mode=""w"", diskless=True, memory=ds.nbytes) nc4_store = NetCDF4DataStore(nc4_ds) dump_to_store(ds, nc4_store) res_mem = nc4_ds.close() res_bytes = res_mem.tobytes() return res_bytes ``` I tested this using the following: ```python import BytesIO fname = ""REDACTED.nc"" ds = xr.load_dataset(fname) ds_bytes = dataset_to_bytes(ds) ds2 = xr.load_dataset(BytesIO(ds_bytes)) assert ds2.equals(ds) and all(ds2.attrs[k]==ds.attrs[k] for k in set(ds2.attrs).union(ds.attrs)) ``` The assertion holds true, however the file size on disk is different. It's possible they were saved using different netCDF4 versions, I haven't had time to test that. I tried using just `ds.to_netcdf()` but get the following error: `ValueError: NetCDF 3 does not support type |S32` That's because it falls back to the `'scipy'` engine. Would be nice to have a non-hacky way to write netcdf4 files to byte streams. :smiley:","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,449706080 https://github.com/pydata/xarray/issues/2995#issuecomment-518869785,https://api.github.com/repos/pydata/xarray/issues/2995,518869785,MDEyOklzc3VlQ29tbWVudDUxODg2OTc4NQ==,1117224,2019-08-06T22:39:07Z,2019-08-06T22:39:07Z,NONE,Is it possible to read mulitple netcdf files on s3 using open_mfdataset?,"{""total_count"": 3, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 3}",,449706080 https://github.com/pydata/xarray/issues/2995#issuecomment-497066189,https://api.github.com/repos/pydata/xarray/issues/2995,497066189,MDEyOklzc3VlQ29tbWVudDQ5NzA2NjE4OQ==,868027,2019-05-29T18:56:17Z,2019-05-29T18:56:17Z,CONTRIBUTOR,"Thanks @rabernat I had forgotten about the other netcdf storage engines... do you know if h5netcdf stable enough that I should use in ""production"" outside of xarray for my netcdf4 reading/writing needs?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,449706080 https://github.com/pydata/xarray/issues/2995#issuecomment-497063685,https://api.github.com/repos/pydata/xarray/issues/2995,497063685,MDEyOklzc3VlQ29tbWVudDQ5NzA2MzY4NQ==,10050469,2019-05-29T18:49:37Z,2019-05-29T18:49:37Z,MEMBER,"> This takes about a minute to open for me. It took me much longer earlier this week when I tried :roll_eyes: Is the bottleneck in the parsing of the coordinates?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,449706080 https://github.com/pydata/xarray/issues/2995#issuecomment-497038453,https://api.github.com/repos/pydata/xarray/issues/2995,497038453,MDEyOklzc3VlQ29tbWVudDQ5NzAzODQ1Mw==,1197350,2019-05-29T17:42:45Z,2019-05-29T17:42:45Z,MEMBER,"Forget about zarr for a minute. Let's stick with the original goal of remote access to netcdf4 files in S3. You can use [s3fs](https://s3fs.readthedocs.io/en/latest/) (or [gcsfs](https://gcsfs.readthedocs.io/en/latest/)) for this. ```python import xarray as xr import s3fs fs_s3 = s3fs.S3FileSystem(anon=True) s3path = 'era5-pds/2008/01/data/air_temperature_at_2_metres.nc' remote_file_obj = fs_s3.open(s3path, mode='rb') ds = xr.open_dataset(remote_file_obj, engine='h5netcdf') ``` ``` Dimensions: (lat: 640, lon: 1280, time0: 744) Coordinates: * lon (lon) float32 0.0 0.2812494 ... 359.718 * lat (lat) float32 89.784874 89.5062 ... -89.784874 * time0 (time0) datetime64[ns] 2008-01-01T07:00:00 ... 2008-02-01T06:00:00 Data variables: air_temperature_at_2_metres (time0, lat, lon) float32 ... Attributes: source: Reanalysis institution: ECMWF title: ""ERA5 forecasts"" history: Wed Jul 4 22:08:50 2018: ncatted /data.e1/wrk/s3_out_in/20... ``` This takes about a minute to open for me. I have not tried writing, but this is perhaps a starting point. If you are unsatisfied by the performance of netcdf4 on cloud, I would indeed encourage you to investigate zarr. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,449706080 https://github.com/pydata/xarray/issues/2995#issuecomment-497026828,https://api.github.com/repos/pydata/xarray/issues/2995,497026828,MDEyOklzc3VlQ29tbWVudDQ5NzAyNjgyOA==,868027,2019-05-29T17:11:10Z,2019-05-29T17:12:51Z,CONTRIBUTOR,"Hi @Non-Descript-Individual I've found that the netcdf4-python library really wants to have direct access to a disk/filesystem to work, it also really wants to do its own file access management. I've always attributed this to the python library being a wrapper for the netcdf C library. My guess would be that the easiest way to do what you want is to separate the writing of the netcdf file step in xarray from the putting the file into S3. Something like this: ```python x.to_netcdf('temp_file.nc') s3.upload_file('temp_file.nc', 'bucketname', 'real_name_for_temp_file.nc') ``` The netcdf4-python library does seem to provide an interface for the ""diskless"" flags. In this case, from the examples it looks to give you a bunch of bytes in a `memoryview` object on calling `close()`. I'm not sure this is accessible from xarray though. Alternatively, @rabernat is an advocate of using zarr when putting netcdf compatible data into cloud storage, the zarr docs [provide an example using s3fs ](https://zarr.readthedocs.io/en/stable/tutorial.html#distributed-cloud-storage) Quick edit: [Here is the `to_zarr` docs in xarray](http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_zarr.html)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,449706080