home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 336458472

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
336458472 MDU6SXNzdWUzMzY0NTg0NzI= 2256 xarray to zarr 4338975 closed 0     16 2018-06-28T03:17:51Z 2018-12-20T17:49:13Z 2018-12-20T17:49:13Z NONE      

@jhamman Hi I've been experimenting with converting Argo float profiles (http://www.argo.ucsd.edu/About_Argo.html) data to zarr as a cache for cloud processing of Argo data. One thing I've noticed is that Argo floats have each cycle (up down in the water column). The samples depths are not consistent across cycles. and there are a lot of single value attributes in the cycle file. e.g. Latitude etc. I loaded 250 cycle files from a single float and pushed them into a zarr using .to_zarr on each file putting each cycle into its own group:

cache/123456 (float id)/1(cycle)

This resulted in over 70k small files being created. small files are very inefficient on disk utilisation my data went from 10Meg to over 100 of disk utilisation.

With a straight pickle to zarr array the compression had the whole data series down to <1 MB!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2256/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 16 rows from issue in issue_comments
Powered by Datasette · Queries took 0.602ms · About: xarray-datasette