home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where user = 13684161 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 3

  • expose zarr caching from xarray 5
  • Threading Lock issue with to_netcdf and Dask arrays 1
  • Flox based groupby operations don't support `dtype` in mean method 1

user 1

  • tasansal · 7 ✖

author_association 1

  • NONE 7
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1246009657 https://github.com/pydata/xarray/issues/2812#issuecomment-1246009657 https://api.github.com/repos/pydata/xarray/issues/2812 IC_kwDOAMm_X85KRJk5 tasansal 13684161 2022-09-13T22:24:59Z 2022-09-13T22:24:59Z NONE

@dcherian, I will start a PR. Where do you think this belongs in the docs? Some places I can think of:

  • Examples section https://docs.xarray.dev/en/stable/generated/xarray.open_dataset.html
  • https://docs.xarray.dev/en/stable/user-guide/io.html
  • FAQ? https://docs.xarray.dev/en/stable/getting-started-guide/faq.html
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expose zarr caching from xarray 421029352
1246007312 https://github.com/pydata/xarray/issues/2812#issuecomment-1246007312 https://api.github.com/repos/pydata/xarray/issues/2812 IC_kwDOAMm_X85KRJAQ tasansal 13684161 2022-09-13T22:20:57Z 2022-09-13T22:20:57Z NONE

I couldn't get open_zarr to open without Daskifying arrays. open_dataset(..., engine="zarr") does open without Daskifying when you haven't passed chunks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expose zarr caching from xarray 421029352
1245989599 https://github.com/pydata/xarray/issues/2812#issuecomment-1245989599 https://api.github.com/repos/pydata/xarray/issues/2812 IC_kwDOAMm_X85KRErf tasansal 13684161 2022-09-13T21:52:45Z 2022-09-13T21:52:45Z NONE

@rabernat

Following up on the previous, yes it does work with the Zarr backend! I agree with @dcherian, we should add this to the docs.

However, the behavior in Dask is strange. I think it is making each worker have its own cache and blowing up memory if I ask for a large cache.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expose zarr caching from xarray 421029352
1245417352 https://github.com/pydata/xarray/issues/2812#issuecomment-1245417352 https://api.github.com/repos/pydata/xarray/issues/2812 IC_kwDOAMm_X85KO4-I tasansal 13684161 2022-09-13T13:30:08Z 2022-09-13T13:58:55Z NONE

@rabernat, yes, I have tried that like this:

```python from zarr.storage import FSStore, LRUStoreCache import xarray as xr

path = "gs://prefix/object.zarr"

store_nocache = FSStore(path) store_cached = LRUStoreCache(store_nocache, max_size=2**30)

ds = xr.open_zarr(store_cached) ```

When I read the same data twice, it still downloads. Am I doing something wrong?

While I wait for a response, I will try it again and update if it works, but the last time I checked, it didn't.

Note to self: I also need to check it with Zarr backend and Dask backend.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expose zarr caching from xarray 421029352
1243814673 https://github.com/pydata/xarray/issues/2812#issuecomment-1243814673 https://api.github.com/repos/pydata/xarray/issues/2812 IC_kwDOAMm_X85KIxsR tasansal 13684161 2022-09-12T14:20:01Z 2022-09-12T14:20:01Z NONE

Hi @rabernat, I looked at your PRs, and they seem to haven't gotten much attention.

I tried using a store with LRUCache in open_zarr, but it appears to ignore the cache.

For our use cases in https://github.com/TGSAI/mdio-python, we usually want to use any form of LRUCache (it doesn't have to be Zarr's necessarily).

  • Do you know of a hack to make this work?
  • What can we do to help and start working on this?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expose zarr caching from xarray 421029352
1210890337 https://github.com/pydata/xarray/issues/6902#issuecomment-1210890337 https://api.github.com/repos/pydata/xarray/issues/6902 IC_kwDOAMm_X85ILLhh tasansal 13684161 2022-08-10T15:42:33Z 2022-08-10T15:42:33Z NONE

Added a synthetic test case for various configurations in xarray-contrib/flox#131

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flox based groupby operations don't support `dtype` in mean method 1333514579
988359778 https://github.com/pydata/xarray/issues/4406#issuecomment-988359778 https://api.github.com/repos/pydata/xarray/issues/4406 IC_kwDOAMm_X8466Sxi tasansal 13684161 2021-12-08T00:05:24Z 2021-12-08T00:06:22Z NONE

I am having a similar issue as well. Using latest versions of dask, xarray, distributed, fsspec, and gcsfs. I use h5netcdf backend because it is the only one that works with fsspec's binary stream, reading from cloud.

My workflow consists of: 1. Start dask client with 1 process per CPU, and 2 threads each. This is because it doesn't scale up reading from the cloud with threads. 2. Opening 12x monthly climate data (hourly sampled) using xarray.open_mfdataset 3. Using reasonable dask chunks in the open function 4. Take monthly average across time axis, and write to local NetCDF. 5. Repeate 2-4 for different years.

It is a hit or miss. It hangs towards the middle or end of a year. Next time I run it, it doesn't.

Once it hangs, and I hit stop, in the traceback it is stuck at await of threading lock.

Any ideas how to avoid this?

Things I tried: 1. Use processes only, 1 thread per worker 2. lock=True, lock=False on open_mfdataset 3. Dask scheduler as: spawn and forkserver 4. Different (but recent) versions of all the libraries

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Threading Lock issue with to_netcdf and Dask arrays 694112301

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.553ms · About: xarray-datasette