home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where author_association = "CONTRIBUTOR" and issue = 606683601 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • Mikejmnez 3
  • weiji14 2
  • martindurant 1

issue 1

  • xarray.open_mzar: open multiple zarr files (in parallel) · 6 ✖

author_association 1

  • CONTRIBUTOR · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
651530759 https://github.com/pydata/xarray/pull/4003#issuecomment-651530759 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDY1MTUzMDc1OQ== Mikejmnez 8241481 2020-06-30T04:45:42Z 2020-06-30T04:45:42Z CONTRIBUTOR

@weiji14 @shoyer Thanks you guys! Sorry it has taken me long to come back to this PR - I really mean to come back to this but I got stuck with another bigger PR that is actually part of my main research project. Anyways, much appreciated for the help, cheers!!

  • Since I am a novice at this, on my end, should I close this PR?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
651481343 https://github.com/pydata/xarray/pull/4003#issuecomment-651481343 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDY1MTQ4MTM0Mw== weiji14 23487320 2020-06-30T02:24:23Z 2020-06-30T02:33:37Z CONTRIBUTOR

Sure, I can move it, but I just wanted to make sure @Mikejmnez gets the credit for this PR. Edit: moved to https://github.com/pydata/xarray/pull/4187.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
651397892 https://github.com/pydata/xarray/pull/4003#issuecomment-651397892 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDY1MTM5Nzg5Mg== weiji14 23487320 2020-06-29T22:15:08Z 2020-06-29T23:06:10Z CONTRIBUTOR

@Mikejmnez, do you mind if I pick up working on this branch? I'd be really keen to see it get into xarray 0.16, and then it will be possible to resolve the intake-xarray issue at https://github.com/intake/intake-xarray/issues/70. ~~Not sure if it's possible to get commit access here, or if I should just submit a PR to your fork, or maybe there's a better way?~~ Edit: I've opened up a pull request to the fork.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
620943840 https://github.com/pydata/xarray/pull/4003#issuecomment-620943840 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYyMDk0Mzg0MA== Mikejmnez 8241481 2020-04-29T01:43:43Z 2020-04-29T01:44:46Z CONTRIBUTOR

Following your advise, open_dataset can now open zarr files. This is done: python ds = xarray.open_dataset(store, engine="zarr", chunks="auto")

NOTE: xr.open_dataset has chunks=None by default, whereas it used to be chunks="auto" on xarray.open_zarr.

Additional feature: As a result of these changes, open_mfdataset can now (automatically) open multiple zarr files (e.g. in parallel) when given a glob. This is, python paths='directory_name/*/subdirectory_name/*' ds = xarray.open_mfdataset(paths, enginne="zarr", chunks="auto", concat_dim="time", combine="nested") does yield the desired behavior.

This is different from fsspec.open_local vs fsspec.mapper on intake-xarray when opening files with a glob. But agreed, that can be addressed in a different PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
620151178 https://github.com/pydata/xarray/pull/4003#issuecomment-620151178 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYyMDE1MTE3OA== martindurant 6042212 2020-04-27T18:19:54Z 2020-04-27T18:19:54Z CONTRIBUTOR

the behavior of zarr it appears will rely heavily on fsspec more in the future.

IF we can push on https://github.com/zarr-developers/zarr-python/pull/546 ; but here is also an opportunity to get the behaviour out of the zarr/fsspec interaction most convenient for this work.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
620133764 https://github.com/pydata/xarray/pull/4003#issuecomment-620133764 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYyMDEzMzc2NA== Mikejmnez 8241481 2020-04-27T17:45:25Z 2020-04-27T17:45:25Z CONTRIBUTOR

I like this approach (add capability to open_mfdataset to open multiple zarr files), as it is the easiest and cleanest. I considered it, and I am glad this is coming up because I wanted to know different opinions. Two things influenced my decision to have open_mzarr separate from open_mfdataset:

  1. Zarr stores are inherently different from netcdf-files, which becomes more evident when openning multiple files given a glob-path (paths='directory*/subdirectory*/*'). zarr stores can potentially be recognized as directories rather than files (e.g. as opposed to paths='directory*/subdirectory*/*.nc'). This distinction comes into play when, for example, trying to open files (zarr vs netcdf) through intake-xarray. I know this an upstream behavior, but I think it needs to be considered and it is my end goal by allowing xarray to read multiple zarr files (in parallel) - To use intake-xarray to read them. The way to open files on intake-xarray (zarr vs others) is again kept separate, and uses different functions. This is,

For netcdf-files (intake-xarray/netcdf.py): python url_path = fsspec.open_local(paths, *kwargs) which can interpret a glob path. Then url is then passed to xarray.open_mfdataset

zarr files (intake-xarray/xzarr): python url_path = fsspec.mapper(paths, *kwargs) fsspec.mapper does not recognize glob-paths, and fspec.open_local, which does recognize globs, cannot detect zarr-stores (as these are recognized as directories rather than files with a known extension). See an issue I created about such behavior https://github.com/intake/filesystem_spec/issues/286#issue-606019293 (apologizes, I am new at github and don't know if this is the correct way to link issues across repositories)

  1. Zarr continues to be under development, and the behavior of zarr it appears will rely heavily on fsspec more in the future. I wonder if such future development is the reason why even on xarray, open_zarr is contained in a different file from open_mfdataset, a similar behavior also happening in intake-xarray.

I am extremely interested what people think about xarray and intake-xarray compatibility/development, when it comes with zarr files being read in parallel...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.579ms · About: xarray-datasette