home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

2 rows where repo = 13221727, state = "open" and user = 57705593 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue 2

state 1

  • open · 2 ✖

repo 1

  • xarray · 2 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1108138101 I_kwDOAMm_X85CDNh1 6174 [FEATURE]: Read from/write to several NetCDF4 groups with a single file open/close operation tovogt 57705593 open 0     10 2022-01-19T14:02:36Z 2022-02-03T08:41:16Z   CONTRIBUTOR      

Is your feature request related to a problem?

I know that there is a big discussion going on in #4118 about organizing hierarchies of datasets within xarray's data structures. But this issue is supposed to address only a comparably simple aspect of this.

Suppose that you have a list ds_list of xarray.Dataset objects with different dimensions etc. and you want to store them all in one NetCDF4 file by using the group feature introduced in NetCDF4. The group name of each dataset is stored in ds_names. Obviously, you can do something like this: python for name, ds in zip(ds_names, ds_list): ds.to_netcdf(path, group=name) However, this is really slow when you have many (hundreds or thousands of) small datasets because the file is opened and closed in every iteration.

Describe the solution you'd like

I would like to have a function xr.to_netcdf that writes a list (or a dictionary) of datasets to a single NetCDF4 file with a single open/close operation. Ideally there should also be a way to read many datasets at once from a single NetCDF4 file using xr.open_dataset.

Describe alternatives you've considered

Currently, I'm using the following read/write functions to achieve the same:

```python import pathlib from xarray.backends import NetCDF4DataStore from xarray.backends.api import dump_to_store from xarray.backends.common import ArrayWriter from xarray.backends.store import StoreBackendEntrypoint

def _xr_to_netcdf_multi(path, ds_dict, encoding=None): """Write multiple xarray Datasets to separate groups in a single NetCDF4 file

Parameters
----------
path : str or Path
    Path of the target NetCDF file.
ds_dict : dict whose keys are group names and values are xr.Dataset
    Each xr.Dataset in the dict is stored in the group identified by its key in the dict.
    Note that an empty string ("") is a valid group name and refers to the root group.
encoding : dict whose keys are group names and values are dict, optional
    For each dataset/group, one dict that is compliant with the format of the `encoding`
    keyword parameter in `xr.Dataset.to_netcdf`. Default: None
"""
path = str(pathlib.Path(path).expanduser().absolute())
store = NetCDF4DataStore.open(path, "w", "NETCDF4", None)
try:
    writer = ArrayWriter()
    for group, dataset in ds_dict.items():
        store._group = group
        unlimited_dims = dataset.encoding.get("unlimited_dims", None)
        encoding = None if encoding is None or group not in encoding else encoding[group]
        dump_to_store(dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims)
finally:
    store.close()

def _xr_open_dataset_multi(path, prefix=""): """Read multiple xarray Datasets from groups contained in a single NetCDF4 file

Warning: The data is loaded into memory!

Parameters
----------
path : str or Path
    Path of the NetCDF file to read.
prefix : str, optional
    If given, only read groups whose name starts with this prefix. Default: ""

Returns
-------
ds_dict : dict whose keys are group names and values are xr.Dataset
    Each xr.Dataset in the dict is taken from the group identified by its key in the dict.
    Note that an empty string ("") is a valid group name and refers to the root group.
"""
path = str(pathlib.Path(path).expanduser().absolute())
store = NetCDF4DataStore.open(path, "r", "NETCDF4", None)
ds_dict = {}
try:
    groups = [g for g in _xr_nc4_groups_from_store(store) if g.startswith(prefix)]
    store_entrypoint = StoreBackendEntrypoint()
    for group in groups:
        store._group = group
        ds = store_entrypoint.open_dataset(store)
        ds.load()
        ds_dict[group] = ds
finally:
    store.close()
return ds_dict

def _xr_nc4_groups_from_store(store): """List all groups contained in the given NetCDF4 data store

Parameters
----------
store : xarray.backend.NetCDF4DataStore

Returns
-------
list of str
"""
def iter_groups(ds, prefix=""):
    groups = [""]
    for group_name, group_ds in ds.groups.items():
        groups.extend([f"{prefix}{group_name}{subgroup}"
                       for subgroup in iter_groups(group_ds, prefix="/")])
    return groups
with store._manager.acquire_context(False) as root:
    return iter_groups(root)

```

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6174/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1110166098 I_kwDOAMm_X85CK8pS 6181 [FEATURE]: Support reading/writing NetCDF4 from/to buffer (and .nc.gz) tovogt 57705593 open 0     0 2022-01-21T07:59:53Z 2022-01-21T16:07:46Z   CONTRIBUTOR      

Is your feature request related to a problem?

There is this veeeeery old PR from 2016 about adding gzip support to NetCDF4 when used with OpenDAP: https://github.com/pydata/xarray/pull/817 The last comment, from 2018, claimed that "netCDF4-Python does not support Python file objects". However, this limitation has been solved a long time ago in 2017 in NetCDF4 (https://github.com/Unidata/netcdf4-python/pull/652).

Describe the solution you'd like

It would be great if xr.Dataset.to_netcdf as well as xr.open_dataset could add support for file-like objects when used with engine="netcdf4". Once this is done, it should be easy to also support .nc.gz files in case of the NETCDF4 format (which is currently only supported in case of NETCDF3).

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6181/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 3597.028ms · About: xarray-datasette