home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

17 rows where author_association = "MEMBER", issue = 253136694 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: updated_at (date)

These facets timed out: author_association, issue

user 1

  • shoyer · 17 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
364804265 https://github.com/pydata/xarray/pull/1528#issuecomment-364804265 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM2NDgwNDI2NQ== shoyer 1217238 2018-02-12T00:15:23Z 2018-02-12T00:15:23Z MEMBER

See https://github.com/dask/dask/issues/2000 for the dask issue. Once this works in dask it should be quite easy to implement in xarray, too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
364804162 https://github.com/pydata/xarray/pull/1528#issuecomment-364804162 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM2NDgwNDE2Mg== shoyer 1217238 2018-02-12T00:14:22Z 2018-02-12T00:14:22Z MEMBER

@martindurant that could probably be addressed most cleanly by improving __setitem__ support for dask.array.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
351588678 https://github.com/pydata/xarray/pull/1528#issuecomment-351588678 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM1MTU4ODY3OA== shoyer 1217238 2017-12-14T02:23:03Z 2017-12-14T02:23:03Z MEMBER

woohoo, thank you Ryan!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
350352097 https://github.com/pydata/xarray/pull/1528#issuecomment-350352097 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM1MDM1MjA5Nw== shoyer 1217238 2017-12-08T19:34:09Z 2017-12-08T19:34:09Z MEMBER

The default keyword was introduced in python 3.4, so this doesn't work in 2.7. I have tried a couple of options to overcome this but none of them have worked.

Oops, this is my fault!

Instead, try: python ndims = [k.ndim for k in key if isinstance(k, np.ndarray)] array_subspace_size = max(ndims) if ndims else 0

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
349554730 https://github.com/pydata/xarray/pull/1528#issuecomment-349554730 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM0OTU1NDczMA== shoyer 1217238 2017-12-06T07:10:37Z 2017-12-06T07:10:37Z MEMBER

I just pushed a commit adding a test for backends.zarr._replace_slices_with_arrays.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
348569223 https://github.com/pydata/xarray/pull/1528#issuecomment-348569223 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM0ODU2OTIyMw== shoyer 1217238 2017-12-01T18:20:32Z 2017-12-01T18:20:32Z MEMBER

To finish it up, I propose to raise an error when attempting to encode variable-length string data. If someone can give me a quick one liner to help identify such datatypes, that would be helpful.

Variable length strings are stored with dtype=object. So something like dtype.kind == 'O' should work.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
348560326 https://github.com/pydata/xarray/pull/1528#issuecomment-348560326 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM0ODU2MDMyNg== shoyer 1217238 2017-12-01T17:43:03Z 2017-12-01T17:43:03Z MEMBER

I'll give this another look over the weekend.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
347984582 https://github.com/pydata/xarray/pull/1528#issuecomment-347984582 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM0Nzk4NDU4Mg== shoyer 1217238 2017-11-29T20:22:33Z 2017-11-29T20:22:33Z MEMBER

I'm fine skipping strings entirely for now. They are indeed unneeded for most netCDF datasets.

On Wed, Nov 29, 2017 at 8:18 PM Ryan Abernathey notifications@github.com wrote:

Right now I am in a dilemma over how to move forward. Fixing this string encoding issue will require some serious hacks to cf encoding. If I do this before #1087 https://github.com/pydata/xarray/pull/1087 is finished, it will be a waste of time (and a pain). On the other hand #1087 https://github.com/pydata/xarray/pull/1087 could take a long time, since it is a major refactor itself.

Is there some way to punt on the multi-length string encoding for now? We could just error if such variables are present. This would allow us to get the experimental zarr backend out into the wild. FWIW, none of the datasets I want to use this with actually have any string data variables at all. I believe 95% of netcdf datasets are just regular numbers. This is an edge case.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1528#issuecomment-347983448, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1pSXGvZDcCgNX3DRBZs3yupZB118ks5s7bwBgaJpZM4PDrlp .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
347351224 https://github.com/pydata/xarray/pull/1528#issuecomment-347351224 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM0NzM1MTIyNA== shoyer 1217238 2017-11-27T22:32:47Z 2017-11-28T07:51:31Z MEMBER

Overall, I find the conventions module to be a bit unwieldy. There is a lot of stuff in there, not all of which is related to CF conventions. It would be useful to separate the actual conventions from the encoding / decoding needed for different backends.

Agreed!

I wonder why zarr doesn't have a UTF-8 variable length string type (https://github.com/alimanfoo/zarr/issues/206) -- that would feel like the obvious first choice for encoding this data.

That said, xarary should be able to use fixed-length bytes just fine, doing UTF-8 encoding/decoding on the fly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
345091139 https://github.com/pydata/xarray/pull/1528#issuecomment-345091139 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM0NTA5MTEzOQ== shoyer 1217238 2017-11-16T23:02:14Z 2017-11-16T23:02:14Z MEMBER

can we brainstorm what a ZarrArrayWraper would need to be compatible with the new indexing API?

We will need to write new adapter code to map xarray's explicit indexer classes onto the appropriate zarr methods, e.g., ```python def getitem(self, key): array = self.get_arraay() if isinstance(key, BasicIndexer): return array[key.tuple] elif isinstance(key, VectorizedIndexer): return array.vindex[_replace_slices_with_arrays(key.tuple, self.shape)] else: assert isinstance(key, OuterIndexer) return array.oindex[key.tuple]

untested, but I think this does the appropriate shape munging to make slices

appear as the last axes of the result array

def _replace_slice_with_arrays(key, shape): num_slices = sum(1 for k in key if isinstance(k, slice)) num_arrays = len(shape) - num_slices new_key = [] slice_count = 0 for k, size in zip(key, shape): if isinstance(k, slice): array = np.arange(*k.indices(size)) sl = [np.newaxis] * len(shape) sl[num_arrays + slice_count] = np.newaxis k = array[sl] slice_count += 1 else: assert isinstance(k, numpy.ndarray) k = k[(slice(None),) * num_arrays + (np.newaxis,) * num_slices] new_key.append(k) return tuple(new_key) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
344040250 https://github.com/pydata/xarray/pull/1528#issuecomment-344040250 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM0NDA0MDI1MA== shoyer 1217238 2017-11-13T20:02:03Z 2017-11-13T20:02:03Z MEMBER

@rabernat sorry for the churn here, but you are also probably going to need to update after the explicit indexing changes in #1705.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
335015485 https://github.com/pydata/xarray/pull/1528#issuecomment-335015485 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDMzNTAxNTQ4NQ== shoyer 1217238 2017-10-08T15:46:36Z 2017-10-08T15:46:36Z MEMBER

For serializing attributes, the easiest fix is to call .item() on any numpy scalars (instances of np.generic) and .tolist() on any numpy arrays. For thoroughness this might be worth doing with custom JSON encoder on the zarr side, but would also be easy to do in the xarray wrapper.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
327900874 https://github.com/pydata/xarray/pull/1528#issuecomment-327900874 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDMyNzkwMDg3NA== shoyer 1217238 2017-09-07T19:32:41Z 2017-09-07T19:32:41Z MEMBER

@rabernat indeed, the backend tests are not terribly well organized right now. Probably the place to start is to inherit from DatasetIOTestCases and TestCase and then implement create_store and roundtrip. DaskTest abuses the "backend" notation a little bit, but these lines cover the essentials: https://github.com/pydata/xarray/blob/98a05f11c6f38489c82e86c9e9df796e7fb65fd2/xarray/tests/test_backends.py#L1271-L1279

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
325742232 https://github.com/pydata/xarray/pull/1528#issuecomment-325742232 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDMyNTc0MjIzMg== shoyer 1217238 2017-08-29T17:50:04Z 2017-08-29T17:50:04Z MEMBER

If we think there is an advantage to using the zarr native filters, that could be added via a future PR once we have the basic backend working.

The only advantage here would be for non-xarray users, who could use zarr to do this decoding/encoding automatically.

For what it's worth, the implementation of scale offsets in xarray looks basically equivalent to what's done in zarr. I don't think there's a performance difference either way.

A further rather big advantage in zarr that I'm not aware of in cdf/hdf (I may be wrong) is not just null values, but not having a given block be written to disc at all if it only contains null data.

If you use chunks, I believe HDF5/NetCDF4 do the same thing, e.g., ``` In [10]: with h5py.File('one-chunk.h5') as f: f.create_dataset('foo', (100, 100), chunks=(100, 100))

In [11]: with h5py.File('many-chunk.h5') as f: f.create_dataset('foo', (100000, 100000), chunks=(100, 100))

In [12]: ls -l | grep chunk.h5 -rw-r--r-- 1 shoyer eng 1400 Aug 29 10:48 many-chunk.h5 -rw-r--r-- 1 shoyer eng 1400 Aug 29 10:48 one-chunk.h5 ``` (Note the same file-size)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
325723577 https://github.com/pydata/xarray/pull/1528#issuecomment-325723577 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDMyNTcyMzU3Nw== shoyer 1217238 2017-08-29T16:43:58Z 2017-08-29T16:44:25Z MEMBER

Is the goal here to be able to round-trip the file, such that calling .to_netcdf() produces an identical file to the original source file?

Yes, exactly.

I don't understand how encoding interacts with attributes? When is something an attribute vs. an encoding (add_offset for example)?

Typically, we store things in encoding that are attributes on the underlying NetCDF file, but no longer make sense to describe the decoded data. For example: - On the file, add_offset is an attribute. - If loaded with open_dataset(..., mask_and_scale=True), add_offset can be found in encoding, not attrs, because the data has already been offset. - If loaded with open_dataset(..., mask_and_scale=False), add_offset will still be on attrs (the data has not been offset).

How does xarray know whether the store automatically encodes / decodes the encodings vs. when it has to be done by xarray, e.g. by calling mask_and_scale

Currently, we assume that stores never do this, and always handle it ourselves. We might need a special exception for zarr and scale/offset encoding.

Does this mean that my ZarrStore should inherit from WritableCFDataStore instead of AbstractWritableDataStore?

Maybe, though again it will probably need slightly customized conventions for writing data (if we let zarr handling scale/offset encoding).

I don't yet understand how to make these elements work together properly, for example, do avoid applying the scale / offset function twice, as I mentioned above.

We have two options: 1. Handle it all in xarray via the machinery in conventions.py. Never pass the arguments to do scale/offset encoding to zarr (just save them as attributes). 2. Handle it all in zarr. We'll need special case logic to skip this part of encoding.

I think (2) would be the preferred way to do this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
325716892 https://github.com/pydata/xarray/pull/1528#issuecomment-325716892 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDMyNTcxNjg5Mg== shoyer 1217238 2017-08-29T16:19:57Z 2017-08-29T16:19:57Z MEMBER

@rabernat I think this is #1531 -- require_pynio seems to have infected all our other requirements!

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
325525827 https://github.com/pydata/xarray/pull/1528#issuecomment-325525827 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDMyNTUyNTgyNw== shoyer 1217238 2017-08-29T01:14:05Z 2017-08-29T01:14:05Z MEMBER

What is "encoding" at the variable level? (I have never understood this part of xarray.) How should encoding be handled with zarr?

encoding keeps track of how variables are represented in a file (e.g., chunking schemes, _FillValue/add_offset/scale_factor compression, time units), so we reconstruct a netCDF file that looks almost exactly like the file we've read from disk. In the case of zarr, I guess we might include chunking, fill values, compressor options....

Should we encode / decode CF for zarr stores?

Yes, probably, if we want to handle netcdf conventions for times, fill values and scaling.

Do we want to always automatically align dask chunks with the underlying zarr chunks?

This would be nice! But it's also a bigger issue (will look for the number, I think it's already been opened).

What sort of public API should the zarr backend have? Should you be able to load zarr stores via open_dataset? Or do we need a new method? I think .to_zarr() would be quite useful.

Still need to think about this one.

zarr arrays are extensible along all axes. What does this imply for unlimited dimensions?

I guess we can ignore them (maybe add a warning?) -- they're not part of the zarr data model.

Is any autoclose logic needed? As far as I can tell, zarr objects don't need to be closed.

I don't think we need any autoclose logic at all -- zarr doesn't leave open files hanging around already.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2889.223ms · About: xarray-datasette