home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where user = 8241481 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 2

  • xarray.open_mzar: open multiple zarr files (in parallel) 3
  • HDF5-DIAG warnings calling `open_mfdataset` with more than `file_cache_maxsize` datasets (hdf5 1.12.2) 1

user 1

  • Mikejmnez · 4 ✖

author_association 1

  • CONTRIBUTOR 4
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1483958731 https://github.com/pydata/xarray/issues/7549#issuecomment-1483958731 https://api.github.com/repos/pydata/xarray/issues/7549 IC_kwDOAMm_X85Yc2nL Mikejmnez 8241481 2023-03-26T00:41:10Z 2023-03-26T00:41:10Z CONTRIBUTOR

Thanks everybody. Similar to @gewitterblitz and based on https://github.com/SciTools/iris/issues/5187 , pinning libnetcdf to v4.8.1 did the trick

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HDF5-DIAG warnings calling `open_mfdataset` with more than `file_cache_maxsize` datasets (hdf5 1.12.2) 1596115847
651530759 https://github.com/pydata/xarray/pull/4003#issuecomment-651530759 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDY1MTUzMDc1OQ== Mikejmnez 8241481 2020-06-30T04:45:42Z 2020-06-30T04:45:42Z CONTRIBUTOR

@weiji14 @shoyer Thanks you guys! Sorry it has taken me long to come back to this PR - I really mean to come back to this but I got stuck with another bigger PR that is actually part of my main research project. Anyways, much appreciated for the help, cheers!!

  • Since I am a novice at this, on my end, should I close this PR?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
620943840 https://github.com/pydata/xarray/pull/4003#issuecomment-620943840 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYyMDk0Mzg0MA== Mikejmnez 8241481 2020-04-29T01:43:43Z 2020-04-29T01:44:46Z CONTRIBUTOR

Following your advise, open_dataset can now open zarr files. This is done: python ds = xarray.open_dataset(store, engine="zarr", chunks="auto")

NOTE: xr.open_dataset has chunks=None by default, whereas it used to be chunks="auto" on xarray.open_zarr.

Additional feature: As a result of these changes, open_mfdataset can now (automatically) open multiple zarr files (e.g. in parallel) when given a glob. This is, python paths='directory_name/*/subdirectory_name/*' ds = xarray.open_mfdataset(paths, enginne="zarr", chunks="auto", concat_dim="time", combine="nested") does yield the desired behavior.

This is different from fsspec.open_local vs fsspec.mapper on intake-xarray when opening files with a glob. But agreed, that can be addressed in a different PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601
620133764 https://github.com/pydata/xarray/pull/4003#issuecomment-620133764 https://api.github.com/repos/pydata/xarray/issues/4003 MDEyOklzc3VlQ29tbWVudDYyMDEzMzc2NA== Mikejmnez 8241481 2020-04-27T17:45:25Z 2020-04-27T17:45:25Z CONTRIBUTOR

I like this approach (add capability to open_mfdataset to open multiple zarr files), as it is the easiest and cleanest. I considered it, and I am glad this is coming up because I wanted to know different opinions. Two things influenced my decision to have open_mzarr separate from open_mfdataset:

  1. Zarr stores are inherently different from netcdf-files, which becomes more evident when openning multiple files given a glob-path (paths='directory*/subdirectory*/*'). zarr stores can potentially be recognized as directories rather than files (e.g. as opposed to paths='directory*/subdirectory*/*.nc'). This distinction comes into play when, for example, trying to open files (zarr vs netcdf) through intake-xarray. I know this an upstream behavior, but I think it needs to be considered and it is my end goal by allowing xarray to read multiple zarr files (in parallel) - To use intake-xarray to read them. The way to open files on intake-xarray (zarr vs others) is again kept separate, and uses different functions. This is,

For netcdf-files (intake-xarray/netcdf.py): python url_path = fsspec.open_local(paths, *kwargs) which can interpret a glob path. Then url is then passed to xarray.open_mfdataset

zarr files (intake-xarray/xzarr): python url_path = fsspec.mapper(paths, *kwargs) fsspec.mapper does not recognize glob-paths, and fspec.open_local, which does recognize globs, cannot detect zarr-stores (as these are recognized as directories rather than files with a known extension). See an issue I created about such behavior https://github.com/intake/filesystem_spec/issues/286#issue-606019293 (apologizes, I am new at github and don't know if this is the correct way to link issues across repositories)

  1. Zarr continues to be under development, and the behavior of zarr it appears will rely heavily on fsspec more in the future. I wonder if such future development is the reason why even on xarray, open_zarr is contained in a different file from open_mfdataset, a similar behavior also happening in intake-xarray.

I am extremely interested what people think about xarray and intake-xarray compatibility/development, when it comes with zarr files being read in parallel...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.open_mzar: open multiple zarr files (in parallel) 606683601

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.93ms · About: xarray-datasette