home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 620133764

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/4003#issuecomment-620133764 https://api.github.com/repos/pydata/xarray/issues/4003 620133764 MDEyOklzc3VlQ29tbWVudDYyMDEzMzc2NA== 8241481 2020-04-27T17:45:25Z 2020-04-27T17:45:25Z CONTRIBUTOR

I like this approach (add capability to open_mfdataset to open multiple zarr files), as it is the easiest and cleanest. I considered it, and I am glad this is coming up because I wanted to know different opinions. Two things influenced my decision to have open_mzarr separate from open_mfdataset:

  1. Zarr stores are inherently different from netcdf-files, which becomes more evident when openning multiple files given a glob-path (paths='directory*/subdirectory*/*'). zarr stores can potentially be recognized as directories rather than files (e.g. as opposed to paths='directory*/subdirectory*/*.nc'). This distinction comes into play when, for example, trying to open files (zarr vs netcdf) through intake-xarray. I know this an upstream behavior, but I think it needs to be considered and it is my end goal by allowing xarray to read multiple zarr files (in parallel) - To use intake-xarray to read them. The way to open files on intake-xarray (zarr vs others) is again kept separate, and uses different functions. This is,

For netcdf-files (intake-xarray/netcdf.py): python url_path = fsspec.open_local(paths, *kwargs) which can interpret a glob path. Then url is then passed to xarray.open_mfdataset

zarr files (intake-xarray/xzarr): python url_path = fsspec.mapper(paths, *kwargs) fsspec.mapper does not recognize glob-paths, and fspec.open_local, which does recognize globs, cannot detect zarr-stores (as these are recognized as directories rather than files with a known extension). See an issue I created about such behavior https://github.com/intake/filesystem_spec/issues/286#issue-606019293 (apologizes, I am new at github and don't know if this is the correct way to link issues across repositories)

  1. Zarr continues to be under development, and the behavior of zarr it appears will rely heavily on fsspec more in the future. I wonder if such future development is the reason why even on xarray, open_zarr is contained in a different file from open_mfdataset, a similar behavior also happening in intake-xarray.

I am extremely interested what people think about xarray and intake-xarray compatibility/development, when it comes with zarr files being read in parallel...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  606683601
Powered by Datasette · Queries took 0.662ms · About: xarray-datasette