home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

16 rows where issue = 606683601 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 10

  • Mikejmnez 3
  • shoyer 2
  • dcherian 2
  • weiji14 2
  • TomNicholas 2
  • rabernat 1
  • jhamman 1
  • martindurant 1
  • keewis 1
  • pep8speaks 1

author_association 3

  • MEMBER 9
  • CONTRIBUTOR 6
  • NONE 1

issue 1

  • xarray.open_mzar: open multiple zarr files (in parallel) · 16 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
651734845 https://github.com/pydata/xarray/pull/4003#issuecomment-651734845 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDY1MTczNDg0NQ== keewis 14808389 2020-06-30T11:30:10Z 2020-06-30T11:30:10Z MEMBER

Since I am a novice at this, on my end, should I close this PR?

don't worry about that: we can close this PR when we merge #4187

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
651530759 https://github.com/pydata/xarray/pull/4003#issuecomment-651530759 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDY1MTUzMDc1OQ== Mikejmnez 8241481 2020-06-30T04:45:42Z 2020-06-30T04:45:42Z CONTRIBUTOR

@weiji14 @shoyer Thanks you guys! Sorry it has taken me long to come back to this PR - I really mean to come back to this but I got stuck with another bigger PR that is actually part of my main research project. Anyways, much appreciated for the help, cheers!!

  • Since I am a novice at this, on my end, should I close this PR?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
651486921 https://github.com/pydata/xarray/pull/4003#issuecomment-651486921 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDY1MTQ4NjkyMQ== shoyer 1217238 2020-06-30T02:40:28Z 2020-06-30T02:40:28Z MEMBER

Sure, I can move it, but I just wanted to make sure @Mikejmnez gets the credit for this PR

Yes, absolutely! As long as you preserve his original commits and add yours on top of them, both of you will be credited in the Git history. If you're writing a release note in whats-new.rst about the feature, please include both of your names in the credits.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
651481343 https://github.com/pydata/xarray/pull/4003#issuecomment-651481343 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDY1MTQ4MTM0Mw== weiji14 23487320 2020-06-30T02:24:23Z 2020-06-30T02:33:37Z CONTRIBUTOR

Sure, I can move it, but I just wanted to make sure @Mikejmnez gets the credit for this PR. Edit: moved to https://github.com/pydata/xarray/pull/4187.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
651476633 https://github.com/pydata/xarray/pull/4003#issuecomment-651476633 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDY1MTQ3NjYzMw== shoyer 1217238 2020-06-30T02:11:18Z 2020-06-30T02:11:18Z MEMBER

@weiji14 could you kindly reopen your new pull request against the main xarray repository? Your pull request is currently in Mikejmnez/xarray

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
651397892 https://github.com/pydata/xarray/pull/4003#issuecomment-651397892 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDY1MTM5Nzg5Mg== weiji14 23487320 2020-06-29T22:15:08Z 2020-06-29T23:06:10Z CONTRIBUTOR

@Mikejmnez, do you mind if I pick up working on this branch? I'd be really keen to see it get into xarray 0.16, and then it will be possible to resolve the intake-xarray issue at https://github.com/intake/intake-xarray/issues/70. ~~Not sure if it's possible to get commit access here, or if I should just submit a PR to your fork, or maybe there's a better way?~~ Edit: I've opened up a pull request to the fork.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
619316555 https://github.com/pydata/xarray/pull/4003#issuecomment-619316555 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYxOTMxNjU1NQ== pep8speaks 24736507 2020-04-25T04:08:54Z 2020-05-22T16:45:34Z NONE

Hello @Mikejmnez! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

  • In the file xarray/backends/api.py:

Line 514:13: F841 local variable 'overwrite_encoded_chunks' is assigned to but never used

  • In the file xarray/backends/zarr.py:

Line 362:5: E303 too many blank lines (2) Line 392:22: F821 undefined name 'get_chunk' Line 396:22: F821 undefined name 'tokenize' Line 399:16: F821 undefined name 'overwrite_encoded_chunks'

  • In the file xarray/tests/test_backends.py:

Line 1513:75: E225 missing whitespace around operator

Comment last updated at 2020-05-22 16:45:34 UTC
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
620943840 https://github.com/pydata/xarray/pull/4003#issuecomment-620943840 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYyMDk0Mzg0MA== Mikejmnez 8241481 2020-04-29T01:43:43Z 2020-04-29T01:44:46Z CONTRIBUTOR

Following your advise, open_dataset can now open zarr files. This is done: python ds = xarray.open_dataset(store, engine="zarr", chunks="auto")

NOTE: xr.open_dataset has chunks=None by default, whereas it used to be chunks="auto" on xarray.open_zarr.

Additional feature: As a result of these changes, open_mfdataset can now (automatically) open multiple zarr files (e.g. in parallel) when given a glob. This is, python paths='directory_name/*/subdirectory_name/*' ds = xarray.open_mfdataset(paths, enginne="zarr", chunks="auto", concat_dim="time", combine="nested") does yield the desired behavior.

This is different from fsspec.open_local vs fsspec.mapper on intake-xarray when opening files with a glob. But agreed, that can be addressed in a different PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
620196044 https://github.com/pydata/xarray/pull/4003#issuecomment-620196044 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYyMDE5NjA0NA== dcherian 2448579 2020-04-27T19:47:38Z 2020-04-27T19:47:38Z MEMBER

IMO we should support zarr-store-1/ zarr-store-2/ zarr-file-store-3 but raise NotImplementedError for

zarr-store-1/ subdir/zarr-store-2

I don't know whether it would be easy to detect the glob pattern for the second example in all cases though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
620176655 https://github.com/pydata/xarray/pull/4003#issuecomment-620176655 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYyMDE3NjY1NQ== rabernat 1197350 2020-04-27T19:09:33Z 2020-04-27T19:09:33Z MEMBER

I agree with everything Joe said. I'm fine with getting a NotImplemented error if I try to glob with zarr and open_mfdataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
620169860 https://github.com/pydata/xarray/pull/4003#issuecomment-620169860 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYyMDE2OTg2MA== jhamman 2443309 2020-04-27T18:56:10Z 2020-04-27T18:56:10Z MEMBER

+1 on deprecating open_zarr and moving to `open_dataset(..., engine='zarr')

I also agree that globing zarr stores is a tricky nut to crack. For the sake of simplicity, I'd like to suggest handling this functionaly as separate PRs. Given the heterogeneity in zarr storage options, I'm not sure its practical to support this behavior within Xarray but I'd be happy to discuss that in a separate issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
620151178 https://github.com/pydata/xarray/pull/4003#issuecomment-620151178 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYyMDE1MTE3OA== martindurant 6042212 2020-04-27T18:19:54Z 2020-04-27T18:19:54Z CONTRIBUTOR

the behavior of zarr it appears will rely heavily on fsspec more in the future.

IF we can push on https://github.com/zarr-developers/zarr-python/pull/546 ; but here is also an opportunity to get the behaviour out of the zarr/fsspec interaction most convenient for this work.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
620151075 https://github.com/pydata/xarray/pull/4003#issuecomment-620151075 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYyMDE1MTA3NQ== TomNicholas 35968931 2020-04-27T18:19:41Z 2020-04-27T18:19:41Z MEMBER

@rabernat and @jhamman I expect you will want to weigh in on how best to handle this for zarr

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
620133764 https://github.com/pydata/xarray/pull/4003#issuecomment-620133764 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYyMDEzMzc2NA== Mikejmnez 8241481 2020-04-27T17:45:25Z 2020-04-27T17:45:25Z CONTRIBUTOR

I like this approach (add capability to open_mfdataset to open multiple zarr files), as it is the easiest and cleanest. I considered it, and I am glad this is coming up because I wanted to know different opinions. Two things influenced my decision to have open_mzarr separate from open_mfdataset:

  1. Zarr stores are inherently different from netcdf-files, which becomes more evident when openning multiple files given a glob-path (paths='directory*/subdirectory*/*'). zarr stores can potentially be recognized as directories rather than files (e.g. as opposed to paths='directory*/subdirectory*/*.nc'). This distinction comes into play when, for example, trying to open files (zarr vs netcdf) through intake-xarray. I know this an upstream behavior, but I think it needs to be considered and it is my end goal by allowing xarray to read multiple zarr files (in parallel) - To use intake-xarray to read them. The way to open files on intake-xarray (zarr vs others) is again kept separate, and uses different functions. This is,

For netcdf-files (intake-xarray/netcdf.py): python url_path = fsspec.open_local(paths, *kwargs) which can interpret a glob path. Then url is then passed to xarray.open_mfdataset

zarr files (intake-xarray/xzarr): python url_path = fsspec.mapper(paths, *kwargs) fsspec.mapper does not recognize glob-paths, and fspec.open_local, which does recognize globs, cannot detect zarr-stores (as these are recognized as directories rather than files with a known extension). See an issue I created about such behavior https://github.com/intake/filesystem_spec/issues/286#issue-606019293 (apologizes, I am new at github and don't know if this is the correct way to link issues across repositories)

  1. Zarr continues to be under development, and the behavior of zarr it appears will rely heavily on fsspec more in the future. I wonder if such future development is the reason why even on xarray, open_zarr is contained in a different file from open_mfdataset, a similar behavior also happening in intake-xarray.

I am extremely interested what people think about xarray and intake-xarray compatibility/development, when it comes with zarr files being read in parallel...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
619644606 https://github.com/pydata/xarray/pull/4003#issuecomment-619644606 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYxOTY0NDYwNg== TomNicholas 35968931 2020-04-26T23:53:31Z 2020-04-26T23:53:31Z MEMBER

+1 for having open_dataset and open_mfdataset as the main (ideally only) points of entry for users, which then delegate to different backend openers. That will keep the API neater, avoid duplicate code, and be easier to make into a completely general and extensible solution eventually.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
619641890 https://github.com/pydata/xarray/pull/4003#issuecomment-619641890 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYxOTY0MTg5MA== dcherian 2448579 2020-04-26T23:30:46Z 2020-04-26T23:30:46Z MEMBER

I think the better way to do this would be to add a kwarg to open_dataset that specifies the backend to use. For e.g. xr.open_dataset(..., format="zarr").

This would then delegate to open_zarr or a new open_netcdf or open_rasterio as appropriate. Then open_mfdataset would just work for all these formats without requiring duplicate code.

cc @pydata/xarray for thoughts.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.217ms · About: xarray-datasette