home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 134376872

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
134376872 MDU6SXNzdWUxMzQzNzY4NzI= 768 save/load DataArray to numpy npz functions 5497186 closed 0     11 2016-02-17T19:29:31Z 2016-12-24T11:55:40Z 2016-12-24T11:55:40Z NONE      

hey -

Apologies if this is bad form: I wanted to pass this along but don't have time to do a proper pull request.

I have found pickle to be really problematic for serializing data, so wrote these two functions to save to numpy's binary npz format and retrieve it. Generally, the numpy format is much less likely to bomb when attempting to load on another computer because of some unseen dependency. If there's interest, I could probably add this as a serialization method to DataArray in the next month or so.

``` python def to_npz(da, file_or_buffer): if 'dims' in da.dims: raise ValueError('Can\'t use "dims" as a dim name.') if 'values' in da.dims: raise ValueError('Can\'t use "values" as a dim name.') arrays = {} arrays['dims'] = da.dims for dim in da.dims: arrays[dim] = da.indexes[dim] arrays['values'] = da.values np.savez(file_or_buffer, **arrays)

def from_npz(file_or_buffer): data = np.load(file_or_buffer) assert hasattr(data, 'keys'), "np.load returned a {}, not a dict-like object".format(type(data)) assert 'dims' in data, 'Can\'t locate "dims" key in file' assert 'values' in data, 'Can\'t locate "values" key in file' for dimname in data['dims']: assert dimname in data, 'Can\'t locate "{}" key in file'.format(dimname) return xray.DataArray(data['values'], dims=data['dims'], coords=dict(zip(data['dims'], [data[dimname] for dimname in data['dims']]))) ```

it's pretty speedy, here is an example for a (3, 4, 5) shaped DataArray:

In [42]: def save_and_load_again(da): with open('/path/to/datarray.npz', 'w') as f: to_npz(da, f) with open('/path/to/datarray.npz', 'r') as f: a = from_npz(f) return a %time (save_and_load_again(da) == da).all() CPU times: user 12.6 ms, sys: 0 ns, total: 12.6 ms Wall time: 26.2 ms Out[42]: <xray.DataArray ()> array(True, dtype=bool)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/768/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 11 rows from issue in issue_comments
Powered by Datasette · Queries took 0.588ms · About: xarray-datasette