home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

15 rows where issue = 712782711 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • martindurant 7
  • jhnnsrs 4
  • rabernat 3
  • forman 1

author_association 3

  • CONTRIBUTOR 7
  • NONE 5
  • MEMBER 3

issue 1

  • Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) · 15 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
739959248 https://github.com/pydata/xarray/issues/4478#issuecomment-739959248 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDczOTk1OTI0OA== martindurant 6042212 2020-12-07T14:39:57Z 2020-12-07T14:39:57Z CONTRIBUTOR

Please try with fsspec master.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711
738728863 https://github.com/pydata/xarray/issues/4478#issuecomment-738728863 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDczODcyODg2Mw== forman 206773 2020-12-04T11:18:33Z 2020-12-04T11:18:33Z NONE

I'm still suffering from IndexError: pop from an empty deque. Can somebody tell me which s3fs version to use after fix by @martindurant?

Here are my relevant packages:

# Name                    Version                   Build  Channel
aiobotocore               0.10.3                     py_0    conda-forge
aiohttp                   3.7.3            py39hb82d6ee_0    conda-forge
botocore                  1.12.91                    py_0    conda-forge
dask                      2.30.0                     py_0    conda-forge
dask-core                 2.30.0                     py_0    conda-forge
distributed               2.30.1           py39hcbf5309_0    conda-forge
fsspec                    0.8.4                      py_0    conda-forge
python                    3.9.0           h7840368_5_cpython    conda-forge
s3fs                      0.5.1                      py_0    conda-forge
xarray                    0.16.2             pyhd8ed1ab_0    conda-forge
zarr                      2.5.0                      py_0    conda-forge

Thanks in advance!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711
704365715 https://github.com/pydata/xarray/issues/4478#issuecomment-704365715 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDcwNDM2NTcxNQ== jhnnsrs 3322897 2020-10-06T15:46:21Z 2020-10-06T15:46:21Z NONE

Welcome to the world of light-sheet microscopy. And this would be considered a tiny dataset.. 😄

@rabernat thanks for the tip. I was wondering what would be best chunk-size for s3 bucket storages. Will aim for that size once performance tweaking.

Thanks a lot!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711
704359408 https://github.com/pydata/xarray/issues/4478#issuecomment-704359408 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDcwNDM1OTQwOA== rabernat 1197350 2020-10-06T15:40:03Z 2020-10-06T15:40:03Z MEMBER

@jhnnsrs: just a tip based on my own experience. In the code above, you are not specifying any chunks explicitly. The default is then to let zarr choose the chunk sizes. Zarr's default chunks are usually much smaller than what is optimal with cloud storage. I would recommend you explicitly chunk you array and aim for chunks around 100 MB.

Closing this as it appears to be fixed by @martindurant's quick response in fsspec / s3fs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711
704353239 https://github.com/pydata/xarray/issues/4478#issuecomment-704353239 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDcwNDM1MzIzOQ== martindurant 6042212 2020-10-06T15:30:50Z 2020-10-06T15:30:50Z CONTRIBUTOR

That's a lot of data!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711
704351253 https://github.com/pydata/xarray/issues/4478#issuecomment-704351253 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDcwNDM1MTI1Mw== jhnnsrs 3322897 2020-10-06T15:27:56Z 2020-10-06T15:27:56Z NONE

Confirmed! Works like a charm. Went up all the way to (1024,1024,100,3,1) without any issues. Thanks for the fast fix! 👍

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711
704285976 https://github.com/pydata/xarray/issues/4478#issuecomment-704285976 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDcwNDI4NTk3Ng== martindurant 6042212 2020-10-06T13:55:34Z 2020-10-06T13:55:34Z CONTRIBUTOR

Can you confirm that this works ok with fsspec and s3fs master?

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711
702846089 https://github.com/pydata/xarray/issues/4478#issuecomment-702846089 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDcwMjg0NjA4OQ== martindurant 6042212 2020-10-02T16:59:45Z 2020-10-02T16:59:45Z CONTRIBUTOR

I have reproduced it locally (also with moto). Indeed, many threads are trying to stall the event loop at once. This will take a little finesse.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711
702816676 https://github.com/pydata/xarray/issues/4478#issuecomment-702816676 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDcwMjgxNjY3Ng== martindurant 6042212 2020-10-02T15:59:59Z 2020-10-02T15:59:59Z CONTRIBUTOR

Thanks for the digging, I'll look into it

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711
702815971 https://github.com/pydata/xarray/issues/4478#issuecomment-702815971 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDcwMjgxNTk3MQ== jhnnsrs 3322897 2020-10-02T15:58:44Z 2020-10-02T15:58:44Z NONE

Okay. That was too fast. Wasn't able to get the django implementation running, so I setup a testing environment with a docker composition running minio and a python3.8 based container with the libraries (repo attached). The setup runs fine for dask arrays of size (1024,1024,8,2,1) but causes the same asyncio problematic at (1024,1024,10,2,1) (tested with random arrays)

https://github.com/jhnnsrs/s3fs_bugreport

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711
702635437 https://github.com/pydata/xarray/issues/4478#issuecomment-702635437 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDcwMjYzNTQzNw== jhnnsrs 3322897 2020-10-02T09:51:32Z 2020-10-02T09:51:32Z NONE

Thanks for the troubleshooting. I encountered the problem within an sync worker in django channels (has been and is still working with s3fs=0.4.1 ) (same error). So I tried running the refactored relevant code in a vanilla python script on a linux docker container with WSL2 backend, no event loops and no threads running. Same results and the reason I posted in in xarray issues.

No coming back to this problem a day the vanilla script works perfectly sound and gets the job down! The Django sync worker example still runs into problems:

[ERROR] asyncio 2020-10-02 09:40:10,950 :: Exception in callback BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000)) handle: <Handle BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000))> Traceback (most recent call last): File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 510, in _sock_connect_cb fut.set_result(None) asyncio.base_futures.InvalidStateError: invalid state [ERROR] asyncio 2020-10-02 09:40:11,609 :: Exception in callback BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000)) handle: <Handle BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000))> Traceback (most recent call last): File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 510, in _sock_connect_cb fut.set_result(None) asyncio.base_futures.InvalidStateError: invalid state [ERROR] asyncio 2020-10-02 09:40:12,478 :: Exception in callback BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000)) handle: <Handle BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000))> Traceback (most recent call last): File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 510, in _sock_connect_cb fut.set_result(None) asyncio.base_futures.InvalidStateError: invalid state [ERROR] asyncio 2020-10-02 09:40:12,743 :: Exception in callback BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000)) handle: <Handle BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000))> Traceback (most recent call last): File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 510, in _sock_connect_cb fut.set_result(None) asyncio.base_futures.InvalidStateError: invalid state [ERROR] asyncio 2020-10-02 09:40:12,745 :: Exception in callback BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000)) handle: <Handle BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000))> Traceback (most recent call last): File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 510, in _sock_connect_cb fut.set_result(None) asyncio.base_futures.InvalidStateError: invalid state [ERROR] asyncio 2020-10-02 09:40:12,757 :: Exception in callback BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000)) handle: <Handle BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000))> Traceback (most recent call last): File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 510, in _sock_connect_cb fut.set_result(None) asyncio.base_futures.InvalidStateError: invalid state [ERROR] asyncio 2020-10-02 09:40:13,631 :: Exception in callback BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000)) handle: <Handle BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000))> Traceback (most recent call last): File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 510, in _sock_connect_cb fut.set_result(None) asyncio.base_futures.InvalidStateError: invalid state [ERROR] asyncio 2020-10-02 09:40:13,633 :: Exception in callback BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000)) handle: <Handle BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000))> Traceback (most recent call last): File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 510, in _sock_connect_cb fut.set_result(None) asyncio.base_futures.InvalidStateError: invalid state [ERROR] asyncio 2020-10-02 09:40:15,025 :: Exception in callback BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000)) handle: <Handle BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000))> Traceback (most recent call last): File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 510, in _sock_connect_cb fut.set_result(None) asyncio.base_futures.InvalidStateError: invalid state [ERROR] asyncio 2020-10-02 09:40:16,317 :: Exception in callback BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000)) handle: <Handle BaseSelectorEventLoop._sock_connect_cb(<Future finished result=None>, <socket.socke...8.0.4', 9000)>, ('172.18.0.4', 9000))> Traceback (most recent call last): File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 510, in _sock_connect_cb fut.set_result(None) asyncio.base_futures.InvalidStateError: invalid state Exception in thread Thread-1: Traceback (most recent call last): File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/usr/local/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.7/asyncio/base_events.py", line 541, in run_forever self._run_once() File "/usr/local/lib/python3.7/asyncio/base_events.py", line 1771, in _run_once handle = self._ready.popleft() IndexError: pop from an empty deque

I guess this is now more of a problem with the way django >=3.1 and particularly django-channels is dealing with the event loop. Do you have by any chance quick thoughts on this? Is there a way to get the django "superpowered" event loop instead of the asyncio one?

Anyway will mark this bug report as resolved as it is cleary not xarrays issue. Thanks a lot !

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711
702257613 https://github.com/pydata/xarray/issues/4478#issuecomment-702257613 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDcwMjI1NzYxMw== rabernat 1197350 2020-10-01T16:38:00Z 2020-10-01T16:38:00Z MEMBER

Martin, thanks for your comments. I have read them several times but am left scratching my head as to their meaning. It would be useful if you could help identify a specific development direction to address @jhnnsrs's bug. In your best judgement, is this - a bug in xarray - a bug in dask - a bug in zarr - a bug in s3fs - some problem relating to @jhnnsrs's environment / dependencies - something else?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711
702213899 https://github.com/pydata/xarray/issues/4478#issuecomment-702213899 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDcwMjIxMzg5OQ== martindurant 6042212 2020-10-01T15:27:00Z 2020-10-01T15:27:00Z CONTRIBUTOR

File "/usr/local/lib/python3.7/asyncio/base_events.py", line 1771, in _run_once handle = self._ready.popleft()

This looks like it may be a race conditions where multiple threads are calling the event loop at once. I wonder if you could list the event loops in use and the threads (perhaps best run with base python than ipython/jupyter).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711
702124268 https://github.com/pydata/xarray/issues/4478#issuecomment-702124268 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDcwMjEyNDI2OA== martindurant 6042212 2020-10-01T13:11:32Z 2020-10-01T13:11:32Z CONTRIBUTOR

The following code, modified to the style of the s3fs test suite, works OK: ```python def test_with_xzarr(s3): da = pytest.importorskip("dask.array") xr = pytest.importorskip("xarray") name = "sample"

nana = xr.DataArray(da.zeros((1023, 1023, 3)))

s3_path = f"{test_bucket_name}/{name}"
s3store = s3.get_mapper(s3_path)

print("Storing")
nana.to_dataset().to_zarr(store=s3store, mode="w", consolidated=True, compute=True)

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711
702118079 https://github.com/pydata/xarray/issues/4478#issuecomment-702118079 https://api.github.com/repos/pydata/xarray/issues/4478 MDEyOklzc3VlQ29tbWVudDcwMjExODA3OQ== rabernat 1197350 2020-10-01T13:00:53Z 2020-10-01T13:00:53Z MEMBER

Thanks for the bug report! There were recently some big changes to s3fs. We will look into it.

cc @martindurant

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset to zarr not working with newest s3fs Storage (s3fs > 0.5.0) 712782711

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.323ms · About: xarray-datasette