home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 336458472 and user = 1197350 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date)

user 1

  • rabernat · 6 ✖

issue 1

  • xarray to zarr · 6 ✖

author_association 1

  • MEMBER 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
449081085 https://github.com/pydata/xarray/issues/2256#issuecomment-449081085 https://api.github.com/repos/pydata/xarray/issues/2256 MDEyOklzc3VlQ29tbWVudDQ0OTA4MTA4NQ== rabernat 1197350 2018-12-20T17:49:13Z 2018-12-20T17:49:13Z MEMBER

I'm going to close this. Please feel free to reopen if more discussion is needed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray to zarr 336458472
401087741 https://github.com/pydata/xarray/issues/2256#issuecomment-401087741 https://api.github.com/repos/pydata/xarray/issues/2256 MDEyOklzc3VlQ29tbWVudDQwMTA4Nzc0MQ== rabernat 1197350 2018-06-28T16:07:02Z 2018-06-28T16:07:02Z MEMBER

Zarr is most useful for very large, homogeneous arrays. The argo data are not that large, and are inhomogeneous. So I'm not sure zarr will really help you out that much here.

In your original post, you said you were doing "cloud processing", but later you referred to a cluster filesystem. Do you plan to put this data in object storage?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray to zarr 336458472
400906996 https://github.com/pydata/xarray/issues/2256#issuecomment-400906996 https://api.github.com/repos/pydata/xarray/issues/2256 MDEyOklzc3VlQ29tbWVudDQwMDkwNjk5Ng== rabernat 1197350 2018-06-28T04:27:38Z 2018-06-28T04:27:38Z MEMBER

Thanks for the extra info!

I am still confused about what you are trying to achieve. What do you mean by "cache"? Is your goal to compress the data so that it uses less space on disk? Or is it to provide a more "analysis ready" format?

In other words, why do you feel you need to transform this data to zarr? Why not just work directly with the netcdf files?

Sorry to keep asking questions rather than providing any answers! Just trying to understand your goals...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray to zarr 336458472
400905950 https://github.com/pydata/xarray/issues/2256#issuecomment-400905950 https://api.github.com/repos/pydata/xarray/issues/2256 MDEyOklzc3VlQ29tbWVudDQwMDkwNTk1MA== rabernat 1197350 2018-06-28T04:18:56Z 2018-06-28T04:18:56Z MEMBER

FYI, I edited your comment to place the output in block quotes (triple ``` before and after) so it is more readable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray to zarr 336458472
400902555 https://github.com/pydata/xarray/issues/2256#issuecomment-400902555 https://api.github.com/repos/pydata/xarray/issues/2256 MDEyOklzc3VlQ29tbWVudDQwMDkwMjU1NQ== rabernat 1197350 2018-06-28T03:51:34Z 2018-06-28T03:51:34Z MEMBER

Can you clarify what you are trying to achieve with the transformations?

Why not do something like this? python for file in filenames: ds = xr.open_dataset(file) ds.to_zarr(file + '.zarr')

I'm particularly confused by this line: cycles[int(ds.CYCLE_NUMBER.values[0])-1]=ds Could it be that you are describing the "straight pickle to zarr array" workflow you referred to in your earlier post? This is definitely an unconventional and not recommended way to interface xarray with zarr. It would be better to use the built-in .to_zarr() function. We can help you debug why that isn't working well, but we need more information.

Specifically:

Could you please post the repr of a single netcdf dataset from this collection, i.e. python ds = xr.open_dataset('file.nc') print(ds)

Then could you call ds.to_zarr() and describe the contents of the resulting zarr store in more detail? (For example, could you list the directories within the store?)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray to zarr 336458472
400899756 https://github.com/pydata/xarray/issues/2256#issuecomment-400899756 https://api.github.com/repos/pydata/xarray/issues/2256 MDEyOklzc3VlQ29tbWVudDQwMDg5OTc1Ng== rabernat 1197350 2018-06-28T03:31:34Z 2018-06-28T03:31:34Z MEMBER

I think this effort should be of great interest to a lot of computational oceanographers. I have worked a lot with both Argo data and zarr, but never yet tried to combine them.

I would recommend reading this guide if you have not done so already: http://pangeo-data.org/data.html#guide-to-preparing-cloud-optimized-data

Then could you post the xarray repr of one of the netcdf files you are working with here? i.e. ds = xr.open_dataset('file.nc') print(ds)

And then finally post the full code you are using to read, transform, and output the zarr data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray to zarr 336458472

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3283.928ms · About: xarray-datasette