home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

32 rows where issue = 837243943 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 7

  • rabernat 15
  • aurghs 6
  • shoyer 5
  • dcherian 3
  • andersy005 1
  • keewis 1
  • pep8speaks 1

author_association 3

  • MEMBER 25
  • COLLABORATOR 6
  • NONE 1

issue 1

  • Zarr chunking fixes · 32 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
826984456 https://github.com/pydata/xarray/pull/5065#issuecomment-826984456 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgyNjk4NDQ1Ng== dcherian 2448579 2021-04-26T16:37:37Z 2021-04-26T16:37:37Z MEMBER

Thanks @rabernat

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
826913149 https://github.com/pydata/xarray/pull/5065#issuecomment-826913149 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgyNjkxMzE0OQ== rabernat 1197350 2021-04-26T15:08:43Z 2021-04-26T15:08:43Z MEMBER

I think this PR has received a very thorough review. I would be pleased if someone from @pydata/xarray would merge it soon.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
826906106 https://github.com/pydata/xarray/pull/5065#issuecomment-826906106 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgyNjkwNjEwNg== keewis 14808389 2021-04-26T15:00:59Z 2021-04-26T15:00:59Z MEMBER

the reason is that black released a new version yesterday, and since we don't pin black for the blackdoc entry we get the new version. If you run pre-commit clean before pre-commit run --all-files you should see this change locally, too. To avoid situations like these we could start pinning black in the blackdoc entry (and run a script to synchronize this with the black entry on autoupdate).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
803706143 https://github.com/pydata/xarray/pull/5065#issuecomment-803706143 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwMzcwNjE0Mw== pep8speaks 24736507 2021-03-22T01:35:40Z 2021-04-26T14:38:56Z NONE

Hello @rabernat! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2021-04-26 14:38:56 UTC
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
826888674 https://github.com/pydata/xarray/pull/5065#issuecomment-826888674 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgyNjg4ODY3NA== rabernat 1197350 2021-04-26T14:38:49Z 2021-04-26T14:38:49Z MEMBER

The pre-commit workflow is raising a blackdoc error I am not seeing in my local env

```diff diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst index eb5c4d8..2bc3c1f 100644 --- a/doc/internals/duck-arrays-integration.rst +++ b/doc/internals/duck-arrays-integration.rst @@ -25,7 +25,7 @@ argument: ...

     def _repr_inline_(self, max_width):
  • """ format to a single line with at most max_width characters """
  • """format to a single line with at most max_width characters""" ... ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
817990859 https://github.com/pydata/xarray/pull/5065#issuecomment-817990859 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxNzk5MDg1OQ== rabernat 1197350 2021-04-12T17:27:28Z 2021-04-12T17:27:28Z MEMBER

Any further feedback on this now reduced-scope PR? Merging this would be helpful for moving forward Pangeo forge.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
815019613 https://github.com/pydata/xarray/pull/5065#issuecomment-815019613 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxNTAxOTYxMw== rabernat 1197350 2021-04-07T15:44:25Z 2021-04-07T15:44:25Z MEMBER

I have removed the controversial encoding['chunks'] stuff from the PR. Now it only contains the safe_chunks option in to_zarr.

If there are no further comments on this, I think this is good to go.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
814102743 https://github.com/pydata/xarray/pull/5065#issuecomment-814102743 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxNDEwMjc0Mw== rabernat 1197350 2021-04-06T13:03:53Z 2021-04-06T13:03:53Z MEMBER

We seem to be unable to resolve the complexities around chunk encoding. I propose to remove this from the PR and reduce the scope to just the safe_chunks features. @aurghs should probably be the one to tackle the chunk encoding problem; unfortunately it exceeds my understanding, and I don't have time to dig deeper at the moment. In the meantime safe_chunks is important for pangeo-forge forward progress.

Please give a 👍 or 👎 to this idea if you have an opinion.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811975731 https://github.com/pydata/xarray/pull/5065#issuecomment-811975731 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTk3NTczMQ== rabernat 1197350 2021-04-01T15:12:15Z 2021-04-01T15:12:15Z MEMBER

But it seems to me that having two different definitions of chunks (dask one and encoded one), is not very intuitive and it's not easy to define a clear default in writing.

My use for encoding.chunks is to tell Zarr what chunks to use on disk.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811810701 https://github.com/pydata/xarray/pull/5065#issuecomment-811810701 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTgxMDcwMQ== aurghs 35919497 2021-04-01T10:21:15Z 2021-04-01T11:01:44Z COLLABORATOR

python new_var = var.chunk(chunks, name=name2, lock=lock) new_var.encoding = var.encoding Here you are modifying _maybe_chunk, but _maybe_chunk is also used in Dataset.chunks. Probably would be better to change backend.api.py, here: https://github.com/pydata/xarray/blob/ddc352faa6de91f266a1749773d08ae8d6f09683/xarray/backends/api.py#L296-L307

But maybe also in this case we want to drop encoding["chunks"] if they are not compatible with dask ones.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811818035 https://github.com/pydata/xarray/pull/5065#issuecomment-811818035 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTgxODAzNQ== aurghs 35919497 2021-04-01T10:35:29Z 2021-04-01T10:36:01Z COLLABORATOR

Hmm. I would also be happy with explicitly deleting chunks from encoding for now. It's not adding a lot of technical debt.

I see two reasons for keeping it: - We should be able to read and write the data with the same structure on disk. - The user may be interested in this information.

But it seems to me that having two different definitions of chunks (dask one and encoded one), is not very intuitive and it's not easy to define a clear default in writing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811481334 https://github.com/pydata/xarray/pull/5065#issuecomment-811481334 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTQ4MTMzNA== shoyer 1217238 2021-03-31T21:35:11Z 2021-03-31T21:35:11Z MEMBER

Why is chunk getting called here? Does it actually get called every time we load a dataset with chunks? If so, we will need a more sophisticated solution.

This happens specifically on this line: https://github.com/pydata/xarray/blob/ddc352faa6de91f266a1749773d08ae8d6f09683/xarray/core/dataset.py#L438

So perhaps it would make sense to copy encoding specifically in this case, e.g., python new_var = var.chunk(chunks, name=name2, lock=lock) new_var.encoding = var.encoding

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811458761 https://github.com/pydata/xarray/pull/5065#issuecomment-811458761 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTQ1ODc2MQ== shoyer 1217238 2021-03-31T20:54:46Z 2021-03-31T20:54:46Z MEMBER

Hmm. I would also be happy with explicitly deleting chunks from encoding for now. It's not adding a lot of technical debt.

In the long term, the whole handling of encoding should be revisited, e.g., see https://github.com/pydata/xarray/issues/5082

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811308284 https://github.com/pydata/xarray/pull/5065#issuecomment-811308284 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTMwODI4NA== rabernat 1197350 2021-03-31T18:23:03Z 2021-03-31T18:23:03Z MEMBER

So any ideas how to proceed? 🧐

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811209453 https://github.com/pydata/xarray/pull/5065#issuecomment-811209453 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTIwOTQ1Mw== aurghs 35919497 2021-03-31T16:27:05Z 2021-03-31T17:50:19Z COLLABORATOR

~rechunk~ Variable.chunk is used always when you open a data with dask, even if you are using the default chunking. So in this way, you will drop the encoding always when dask is used (≈ always).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811284237 https://github.com/pydata/xarray/pull/5065#issuecomment-811284237 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTI4NDIzNw== aurghs 35919497 2021-03-31T17:45:29Z 2021-03-31T17:49:17Z COLLABORATOR

Does it actually get called every time we load a dataset with chunks?

Yes

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811275436 https://github.com/pydata/xarray/pull/5065#issuecomment-811275436 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTI3NTQzNg== rabernat 1197350 2021-03-31T17:31:53Z 2021-03-31T17:32:12Z MEMBER

A just pushed a new commit which deletes all encoding inside variable.chunk(). But as you will see when the CI finishes, this leads to a lot of test failures. For example:

``` =============================================================================== FAILURES ================================================================================ _______ TestNetCDF4ViaDaskData.testroundtrip_string_encoded_characters ________

self = <xarray.tests.test_backends.TestNetCDF4ViaDaskData object at 0x18cba4c40>

def test_roundtrip_string_encoded_characters(self):
    expected = Dataset({"x": ("t", ["ab", "cdef"])})
    expected["x"].encoding["dtype"] = "S1"
    with self.roundtrip(expected) as actual:
        assert_identical(expected, actual)
      assert actual["x"].encoding["_Encoding"] == "utf-8"

E KeyError: '_Encoding'

/Users/rpa/Code/xarray/xarray/tests/test_backends.py:485: KeyError ```

Why is chunk getting called here? Does it actually get called every time we load a dataset with chunks? If so, we will need a more sophisticated solution.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811265134 https://github.com/pydata/xarray/pull/5065#issuecomment-811265134 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTI2NTEzNA== rabernat 1197350 2021-03-31T17:17:07Z 2021-03-31T17:17:07Z MEMBER

Replace self._encoding with None here?

Thanks! Yeah that's what I had in mind. But I was wondering if there was an example of doing that it else I could copy.

In any case, I'll give it a try now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811262328 https://github.com/pydata/xarray/pull/5065#issuecomment-811262328 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTI2MjMyOA== dcherian 2448579 2021-03-31T17:12:55Z 2021-03-31T17:12:55Z MEMBER

The problem is, I can't figure out where this happens.

Replace self._encoding with None here? https://github.com/pydata/xarray/blob/ddc352faa6de91f266a1749773d08ae8d6f09683/xarray/core/variable.py#L1084

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811199910 https://github.com/pydata/xarray/pull/5065#issuecomment-811199910 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTE5OTkxMA== aurghs 35919497 2021-03-31T16:20:30Z 2021-03-31T16:31:32Z COLLABORATOR

Should the Zarr backend be setting this?

Yes, they are already defined in zarr: preferred_chunks=chunks. We decide to separate the chunks and the preferred_chunks: - The preferred_chunks is used by the backend to define the default chunks to be used by xarray. - The chunks are the on-disk chunks.

They are not necessarily the same. Maybe we can drop the preferred_chunks after they are used.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811189539 https://github.com/pydata/xarray/pull/5065#issuecomment-811189539 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTE4OTUzOQ== rabernat 1197350 2021-03-31T16:12:13Z 2021-03-31T16:12:23Z MEMBER

In today's dev call, we proposed to handle encoding in chunk the same way we handle it in indexing: by deleting all encoding.

The problem is, I can't figure out where this happens. Can someone point me to the place in the code where indexing operations delete encoding?

A related question: I discovered this encoding option preferred_chunks, which is treated specially: https://github.com/pydata/xarray/blob/57a4479fcd3ebc579cf00e0d6bf85007eda44b56/xarray/core/dataset.py#L396

Should the Zarr backend be setting this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811148122 https://github.com/pydata/xarray/pull/5065#issuecomment-811148122 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTE0ODEyMg== rabernat 1197350 2021-03-31T15:16:37Z 2021-03-31T15:16:37Z MEMBER

I appreciate the discussion on this PR. Does anyone have a concrete suggestion of what to do?

If we are not in agreement about the encoding stuff, perhaps I should remove that and just move forward with the safe_chunks part of this PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
808399567 https://github.com/pydata/xarray/pull/5065#issuecomment-808399567 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwODM5OTU2Nw== aurghs 35919497 2021-03-26T17:34:44Z 2021-03-26T18:08:04Z COLLABORATOR

Perhaps we could remove also overwrite_encoded_chunks, it shouldn't be any more necessary.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
807140762 https://github.com/pydata/xarray/pull/5065#issuecomment-807140762 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwNzE0MDc2Mg== shoyer 1217238 2021-03-25T17:26:46Z 2021-03-25T17:26:46Z MEMBER

FWIW, I would also favor dropping encoding['chunks'] after indexing, coarsening, interpolating, etc. Basically anything that changes the array shape or chunk structure.

We already drop all of encoding after indexing. My guess is that we do the same for coarsening and interpolations as well (though I haven't checked).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
807128780 https://github.com/pydata/xarray/pull/5065#issuecomment-807128780 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwNzEyODc4MA== rabernat 1197350 2021-03-25T17:19:15Z 2021-03-25T17:19:15Z MEMBER

Perhaps a kwarg in to_zarr like ignore_encoding_chunks?

I would argue that this is unnecessary. If you want to explicitly drop encoding, just del da.encoding['chunks'] before writing. But most users don't figure out that they should do this, because the default behavior is counterintuitive.

The problem here is with the default behavior of propagating chunk encoding through computations when it no longer makes sense. My example with the dtype encoding illustrates that we already drop encoding on certain operations, so it's not unprecedented. It's more of an implementation question: where and how to do the dropping.

FWIW, I would also favor dropping encoding['chunks'] after indexing, coarsening, interpolating, etc. Basically anything that changes the array shape or chunk structure.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
807120861 https://github.com/pydata/xarray/pull/5065#issuecomment-807120861 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwNzEyMDg2MQ== dcherian 2448579 2021-03-25T17:13:55Z 2021-03-25T17:13:55Z MEMBER

Xarray knows to drop the dtype encoding after an arithmetic operation. How does that work?

There's a subtle difference. It drops all of .encoding not dtype specifically.

@shoyer's point about indexing changing chunking is a good one too. Perhaps a kwarg in to_zarr like ignore_encoding_chunks?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
807111762 https://github.com/pydata/xarray/pull/5065#issuecomment-807111762 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwNzExMTc2Mg== shoyer 1217238 2021-03-25T17:08:09Z 2021-03-25T17:08:09Z MEMBER

Xarray knows to drop the dtype encoding after an arithmetic operation. How does that work? To me .chunk feel like a similar case: an operation that invalidates any existing encoding.

To be honest, the existing convention is quite adhoc, just based on what seemed most appropriate at the time.

https://github.com/pydata/xarray/issues/1614 is most comprehensive description of the current state of things.

We were considering saying that attrs and encoding should always use the same rules, but perhaps we should be more aggressive about dropping encoding.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
806724345 https://github.com/pydata/xarray/pull/5065#issuecomment-806724345 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwNjcyNDM0NQ== rabernat 1197350 2021-03-25T13:17:03Z 2021-03-25T13:17:59Z MEMBER

I see your point. I guess I don't fully understand where else in the code path encoding gets dropped. Consider this example

python import xarray as xr ds = xr.Dataset({'foo': ('time', [1, 1], {'dtype': 'int16'})}) ds = xr.decode_cf(ds).compute() assert "dtype" in ds.foo.encoding assert "dtype" not in (0.5 * ds.foo).encoding

Xarray knows to drop the dtype encoding after an arithmetic operation. How does that work? To me .chunk feel like a similar case: an operation that invalidates any existing encoding.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
806154872 https://github.com/pydata/xarray/pull/5065#issuecomment-806154872 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwNjE1NDg3Mg== shoyer 1217238 2021-03-24T20:10:19Z 2021-03-24T20:10:19Z MEMBER

I'm a little conflicted about dealing with encoding['chunks'] specifically in chunk():

  • On one hand, it feels inconsistent for this only this single method in xarray to modify part of encoding. Nothing else in xarray (after CF decoding) does this. Effectively encoding['chunks'] is now becoming a part of xarray's data model.
  • On the other hand, this would absolutely fix a recurrent pain-point for users, and in that sense it's worth doing.

Maybe this isn't such a big deal in this particular case, especially if we don't think we would need to add such encoding specific logic to any other methods. But are we really sure about that -- what about cases like indexing?

I guess the other alternative to make chunk() and various other methods that would change chunking drop encoding entirely. I don't know if this would really be a better comprehensive solution (I know dropping attrs is much hated), but at least it's an easier mental model.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
804050169 https://github.com/pydata/xarray/pull/5065#issuecomment-804050169 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwNDA1MDE2OQ== rabernat 1197350 2021-03-22T13:12:45Z 2021-03-22T13:12:45Z MEMBER

Thanks Anderson. Fixed by rebasing. Now RTD build is failing, but there is no obvious error in the logs...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
803754586 https://github.com/pydata/xarray/pull/5065#issuecomment-803754586 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwMzc1NDU4Ng== andersy005 13301940 2021-03-22T04:38:47Z 2021-03-22T04:38:47Z MEMBER

Confused about the test error. It seems unrelated. In test_sparse.py:test_variable_method

E TypeError: no implementation found for 'numpy.allclose' on types that implement __array_function__: [<class 'numpy.ndarray'>, <class 'sparse._coo.core.COO'>]

Related to #5059, and it appears that @keewis came up with a fix for it in https://github.com/pydata/xarray/pull/5059#issuecomment-803620128

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
803712024 https://github.com/pydata/xarray/pull/5065#issuecomment-803712024 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwMzcxMjAyNA== rabernat 1197350 2021-03-22T01:58:23Z 2021-03-22T02:02:00Z MEMBER

Confused about the test error. It seems unrelated. In test_sparse.py:test_variable_method

E TypeError: no implementation found for 'numpy.allclose' on types that implement __array_function__: [<class 'numpy.ndarray'>, <class 'sparse._coo.core.COO'>]

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.814ms · About: xarray-datasette