home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

15 rows where issue = 837243943 and user = 1197350 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • rabernat · 15 ✖

issue 1

  • Zarr chunking fixes · 15 ✖

author_association 1

  • MEMBER 15
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
826913149 https://github.com/pydata/xarray/pull/5065#issuecomment-826913149 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgyNjkxMzE0OQ== rabernat 1197350 2021-04-26T15:08:43Z 2021-04-26T15:08:43Z MEMBER

I think this PR has received a very thorough review. I would be pleased if someone from @pydata/xarray would merge it soon.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
826888674 https://github.com/pydata/xarray/pull/5065#issuecomment-826888674 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgyNjg4ODY3NA== rabernat 1197350 2021-04-26T14:38:49Z 2021-04-26T14:38:49Z MEMBER

The pre-commit workflow is raising a blackdoc error I am not seeing in my local env

```diff diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst index eb5c4d8..2bc3c1f 100644 --- a/doc/internals/duck-arrays-integration.rst +++ b/doc/internals/duck-arrays-integration.rst @@ -25,7 +25,7 @@ argument: ...

     def _repr_inline_(self, max_width):
  • """ format to a single line with at most max_width characters """
  • """format to a single line with at most max_width characters""" ... ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
817990859 https://github.com/pydata/xarray/pull/5065#issuecomment-817990859 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxNzk5MDg1OQ== rabernat 1197350 2021-04-12T17:27:28Z 2021-04-12T17:27:28Z MEMBER

Any further feedback on this now reduced-scope PR? Merging this would be helpful for moving forward Pangeo forge.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
815019613 https://github.com/pydata/xarray/pull/5065#issuecomment-815019613 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxNTAxOTYxMw== rabernat 1197350 2021-04-07T15:44:25Z 2021-04-07T15:44:25Z MEMBER

I have removed the controversial encoding['chunks'] stuff from the PR. Now it only contains the safe_chunks option in to_zarr.

If there are no further comments on this, I think this is good to go.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
814102743 https://github.com/pydata/xarray/pull/5065#issuecomment-814102743 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxNDEwMjc0Mw== rabernat 1197350 2021-04-06T13:03:53Z 2021-04-06T13:03:53Z MEMBER

We seem to be unable to resolve the complexities around chunk encoding. I propose to remove this from the PR and reduce the scope to just the safe_chunks features. @aurghs should probably be the one to tackle the chunk encoding problem; unfortunately it exceeds my understanding, and I don't have time to dig deeper at the moment. In the meantime safe_chunks is important for pangeo-forge forward progress.

Please give a 👍 or 👎 to this idea if you have an opinion.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811975731 https://github.com/pydata/xarray/pull/5065#issuecomment-811975731 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTk3NTczMQ== rabernat 1197350 2021-04-01T15:12:15Z 2021-04-01T15:12:15Z MEMBER

But it seems to me that having two different definitions of chunks (dask one and encoded one), is not very intuitive and it's not easy to define a clear default in writing.

My use for encoding.chunks is to tell Zarr what chunks to use on disk.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811308284 https://github.com/pydata/xarray/pull/5065#issuecomment-811308284 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTMwODI4NA== rabernat 1197350 2021-03-31T18:23:03Z 2021-03-31T18:23:03Z MEMBER

So any ideas how to proceed? 🧐

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811275436 https://github.com/pydata/xarray/pull/5065#issuecomment-811275436 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTI3NTQzNg== rabernat 1197350 2021-03-31T17:31:53Z 2021-03-31T17:32:12Z MEMBER

A just pushed a new commit which deletes all encoding inside variable.chunk(). But as you will see when the CI finishes, this leads to a lot of test failures. For example:

``` =============================================================================== FAILURES ================================================================================ _______ TestNetCDF4ViaDaskData.testroundtrip_string_encoded_characters ________

self = <xarray.tests.test_backends.TestNetCDF4ViaDaskData object at 0x18cba4c40>

def test_roundtrip_string_encoded_characters(self):
    expected = Dataset({"x": ("t", ["ab", "cdef"])})
    expected["x"].encoding["dtype"] = "S1"
    with self.roundtrip(expected) as actual:
        assert_identical(expected, actual)
      assert actual["x"].encoding["_Encoding"] == "utf-8"

E KeyError: '_Encoding'

/Users/rpa/Code/xarray/xarray/tests/test_backends.py:485: KeyError ```

Why is chunk getting called here? Does it actually get called every time we load a dataset with chunks? If so, we will need a more sophisticated solution.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811265134 https://github.com/pydata/xarray/pull/5065#issuecomment-811265134 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTI2NTEzNA== rabernat 1197350 2021-03-31T17:17:07Z 2021-03-31T17:17:07Z MEMBER

Replace self._encoding with None here?

Thanks! Yeah that's what I had in mind. But I was wondering if there was an example of doing that it else I could copy.

In any case, I'll give it a try now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811189539 https://github.com/pydata/xarray/pull/5065#issuecomment-811189539 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTE4OTUzOQ== rabernat 1197350 2021-03-31T16:12:13Z 2021-03-31T16:12:23Z MEMBER

In today's dev call, we proposed to handle encoding in chunk the same way we handle it in indexing: by deleting all encoding.

The problem is, I can't figure out where this happens. Can someone point me to the place in the code where indexing operations delete encoding?

A related question: I discovered this encoding option preferred_chunks, which is treated specially: https://github.com/pydata/xarray/blob/57a4479fcd3ebc579cf00e0d6bf85007eda44b56/xarray/core/dataset.py#L396

Should the Zarr backend be setting this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811148122 https://github.com/pydata/xarray/pull/5065#issuecomment-811148122 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTE0ODEyMg== rabernat 1197350 2021-03-31T15:16:37Z 2021-03-31T15:16:37Z MEMBER

I appreciate the discussion on this PR. Does anyone have a concrete suggestion of what to do?

If we are not in agreement about the encoding stuff, perhaps I should remove that and just move forward with the safe_chunks part of this PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
807128780 https://github.com/pydata/xarray/pull/5065#issuecomment-807128780 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwNzEyODc4MA== rabernat 1197350 2021-03-25T17:19:15Z 2021-03-25T17:19:15Z MEMBER

Perhaps a kwarg in to_zarr like ignore_encoding_chunks?

I would argue that this is unnecessary. If you want to explicitly drop encoding, just del da.encoding['chunks'] before writing. But most users don't figure out that they should do this, because the default behavior is counterintuitive.

The problem here is with the default behavior of propagating chunk encoding through computations when it no longer makes sense. My example with the dtype encoding illustrates that we already drop encoding on certain operations, so it's not unprecedented. It's more of an implementation question: where and how to do the dropping.

FWIW, I would also favor dropping encoding['chunks'] after indexing, coarsening, interpolating, etc. Basically anything that changes the array shape or chunk structure.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
806724345 https://github.com/pydata/xarray/pull/5065#issuecomment-806724345 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwNjcyNDM0NQ== rabernat 1197350 2021-03-25T13:17:03Z 2021-03-25T13:17:59Z MEMBER

I see your point. I guess I don't fully understand where else in the code path encoding gets dropped. Consider this example

python import xarray as xr ds = xr.Dataset({'foo': ('time', [1, 1], {'dtype': 'int16'})}) ds = xr.decode_cf(ds).compute() assert "dtype" in ds.foo.encoding assert "dtype" not in (0.5 * ds.foo).encoding

Xarray knows to drop the dtype encoding after an arithmetic operation. How does that work? To me .chunk feel like a similar case: an operation that invalidates any existing encoding.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
804050169 https://github.com/pydata/xarray/pull/5065#issuecomment-804050169 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwNDA1MDE2OQ== rabernat 1197350 2021-03-22T13:12:45Z 2021-03-22T13:12:45Z MEMBER

Thanks Anderson. Fixed by rebasing. Now RTD build is failing, but there is no obvious error in the logs...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
803712024 https://github.com/pydata/xarray/pull/5065#issuecomment-803712024 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwMzcxMjAyNA== rabernat 1197350 2021-03-22T01:58:23Z 2021-03-22T02:02:00Z MEMBER

Confused about the test error. It seems unrelated. In test_sparse.py:test_variable_method

E TypeError: no implementation found for 'numpy.allclose' on types that implement __array_function__: [<class 'numpy.ndarray'>, <class 'sparse._coo.core.COO'>]

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 61.914ms · About: xarray-datasette