github: issue_comments: 17 rows where author_association = "MEMBER", issue = 253136694 and user = 1217238 sorted by updated

17 rows where author_association = "MEMBER", issue = 253136694 and user = 1217238 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
364804265	https://github.com/pydata/xarray/pull/1528#issuecomment-364804265	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM2NDgwNDI2NQ==	shoyer 1217238	2018-02-12T00:15:23Z	2018-02-12T00:15:23Z	MEMBER	See https://github.com/dask/dask/issues/2000 for the dask issue. Once this works in dask it should be quite easy to implement in xarray, too.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
364804162	https://github.com/pydata/xarray/pull/1528#issuecomment-364804162	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM2NDgwNDE2Mg==	shoyer 1217238	2018-02-12T00:14:22Z	2018-02-12T00:14:22Z	MEMBER	@martindurant that could probably be addressed most cleanly by improving `__setitem__` support for dask.array.	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
351588678	https://github.com/pydata/xarray/pull/1528#issuecomment-351588678	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM1MTU4ODY3OA==	shoyer 1217238	2017-12-14T02:23:03Z	2017-12-14T02:23:03Z	MEMBER	woohoo, thank you Ryan!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
350352097	https://github.com/pydata/xarray/pull/1528#issuecomment-350352097	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM1MDM1MjA5Nw==	shoyer 1217238	2017-12-08T19:34:09Z	2017-12-08T19:34:09Z	MEMBER	The default keyword was introduced in python 3.4, so this doesn't work in 2.7. I have tried a couple of options to overcome this but none of them have worked. Oops, this is my fault! Instead, try: `python ndims = [k.ndim for k in key if isinstance(k, np.ndarray)] array_subspace_size = max(ndims) if ndims else 0`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
349554730	https://github.com/pydata/xarray/pull/1528#issuecomment-349554730	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM0OTU1NDczMA==	shoyer 1217238	2017-12-06T07:10:37Z	2017-12-06T07:10:37Z	MEMBER	I just pushed a commit adding a test for `backends.zarr._replace_slices_with_arrays`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
348569223	https://github.com/pydata/xarray/pull/1528#issuecomment-348569223	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM0ODU2OTIyMw==	shoyer 1217238	2017-12-01T18:20:32Z	2017-12-01T18:20:32Z	MEMBER	To finish it up, I propose to raise an error when attempting to encode variable-length string data. If someone can give me a quick one liner to help identify such datatypes, that would be helpful. Variable length strings are stored with `dtype=object`. So something like `dtype.kind == 'O'` should work.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
348560326	https://github.com/pydata/xarray/pull/1528#issuecomment-348560326	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM0ODU2MDMyNg==	shoyer 1217238	2017-12-01T17:43:03Z	2017-12-01T17:43:03Z	MEMBER	I'll give this another look over the weekend.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
347984582	https://github.com/pydata/xarray/pull/1528#issuecomment-347984582	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM0Nzk4NDU4Mg==	shoyer 1217238	2017-11-29T20:22:33Z	2017-11-29T20:22:33Z	MEMBER	I'm fine skipping strings entirely for now. They are indeed unneeded for most netCDF datasets. On Wed, Nov 29, 2017 at 8:18 PM Ryan Abernathey notifications@github.com wrote: Right now I am in a dilemma over how to move forward. Fixing this string encoding issue will require some serious hacks to cf encoding. If I do this before #1087 https://github.com/pydata/xarray/pull/1087 is finished, it will be a waste of time (and a pain). On the other hand #1087 https://github.com/pydata/xarray/pull/1087 could take a long time, since it is a major refactor itself. Is there some way to punt on the multi-length string encoding for now? We could just error if such variables are present. This would allow us to get the experimental zarr backend out into the wild. FWIW, none of the datasets I want to use this with actually have any string data variables at all. I believe 95% of netcdf datasets are just regular numbers. This is an edge case. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1528#issuecomment-347983448, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1pSXGvZDcCgNX3DRBZs3yupZB118ks5s7bwBgaJpZM4PDrlp .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
347351224	https://github.com/pydata/xarray/pull/1528#issuecomment-347351224	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM0NzM1MTIyNA==	shoyer 1217238	2017-11-27T22:32:47Z	2017-11-28T07:51:31Z	MEMBER	Overall, I find the conventions module to be a bit unwieldy. There is a lot of stuff in there, not all of which is related to CF conventions. It would be useful to separate the actual conventions from the encoding / decoding needed for different backends. Agreed! I wonder why zarr doesn't have a UTF-8 variable length string type (https://github.com/alimanfoo/zarr/issues/206) -- that would feel like the obvious first choice for encoding this data. That said, xarary should be able to use fixed-length bytes just fine, doing UTF-8 encoding/decoding on the fly.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
345091139	https://github.com/pydata/xarray/pull/1528#issuecomment-345091139	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM0NTA5MTEzOQ==	shoyer 1217238	2017-11-16T23:02:14Z	2017-11-16T23:02:14Z	MEMBER	can we brainstorm what a ZarrArrayWraper would need to be compatible with the new indexing API? We will need to write new adapter code to map xarray's explicit indexer classes onto the appropriate zarr methods, e.g., ```python def getitem(self, key): array = self.get_arraay() if isinstance(key, BasicIndexer): return array[key.tuple] elif isinstance(key, VectorizedIndexer): return array.vindex[_replace_slices_with_arrays(key.tuple, self.shape)] else: assert isinstance(key, OuterIndexer) return array.oindex[key.tuple] untested, but I think this does the appropriate shape munging to make slices appear as the last axes of the result array def _replace_slice_with_arrays(key, shape): num_slices = sum(1 for k in key if isinstance(k, slice)) num_arrays = len(shape) - num_slices new_key = [] slice_count = 0 for k, size in zip(key, shape): if isinstance(k, slice): array = np.arange(k.indices(size)) sl = [np.newaxis] len(shape) sl[num_arrays + slice_count] = np.newaxis k = array[sl] slice_count += 1 else: assert isinstance(k, numpy.ndarray) k = k[(slice(None),) * num_arrays + (np.newaxis,) * num_slices] new_key.append(k) return tuple(new_key) ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
344040250	https://github.com/pydata/xarray/pull/1528#issuecomment-344040250	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDM0NDA0MDI1MA==	shoyer 1217238	2017-11-13T20:02:03Z	2017-11-13T20:02:03Z	MEMBER	@rabernat sorry for the churn here, but you are also probably going to need to update after the explicit indexing changes in #1705.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
335015485	https://github.com/pydata/xarray/pull/1528#issuecomment-335015485	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDMzNTAxNTQ4NQ==	shoyer 1217238	2017-10-08T15:46:36Z	2017-10-08T15:46:36Z	MEMBER	For serializing attributes, the easiest fix is to call `.item()` on any numpy scalars (instances of `np.generic`) and `.tolist()` on any numpy arrays. For thoroughness this might be worth doing with custom JSON encoder on the zarr side, but would also be easy to do in the xarray wrapper.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
327900874	https://github.com/pydata/xarray/pull/1528#issuecomment-327900874	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDMyNzkwMDg3NA==	shoyer 1217238	2017-09-07T19:32:41Z	2017-09-07T19:32:41Z	MEMBER	@rabernat indeed, the backend tests are not terribly well organized right now. Probably the place to start is to inherit from `DatasetIOTestCases` and `TestCase` and then implement `create_store` and `roundtrip`. `DaskTest` abuses the "backend" notation a little bit, but these lines cover the essentials: https://github.com/pydata/xarray/blob/98a05f11c6f38489c82e86c9e9df796e7fb65fd2/xarray/tests/test_backends.py#L1271-L1279	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
325742232	https://github.com/pydata/xarray/pull/1528#issuecomment-325742232	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDMyNTc0MjIzMg==	shoyer 1217238	2017-08-29T17:50:04Z	2017-08-29T17:50:04Z	MEMBER	If we think there is an advantage to using the zarr native filters, that could be added via a future PR once we have the basic backend working. The only advantage here would be for non-xarray users, who could use zarr to do this decoding/encoding automatically. For what it's worth, the implementation of scale offsets in xarray looks basically equivalent to what's done in zarr. I don't think there's a performance difference either way. A further rather big advantage in zarr that I'm not aware of in cdf/hdf (I may be wrong) is not just null values, but not having a given block be written to disc at all if it only contains null data. If you use chunks, I believe HDF5/NetCDF4 do the same thing, e.g., ``` In [10]: with h5py.File('one-chunk.h5') as f: f.create_dataset('foo', (100, 100), chunks=(100, 100)) In [11]: with h5py.File('many-chunk.h5') as f: f.create_dataset('foo', (100000, 100000), chunks=(100, 100)) In [12]: ls -l \| grep chunk.h5 -rw-r--r-- 1 shoyer eng 1400 Aug 29 10:48 many-chunk.h5 -rw-r--r-- 1 shoyer eng 1400 Aug 29 10:48 one-chunk.h5 ``` (Note the same file-size)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
325723577	https://github.com/pydata/xarray/pull/1528#issuecomment-325723577	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDMyNTcyMzU3Nw==	shoyer 1217238	2017-08-29T16:43:58Z	2017-08-29T16:44:25Z	MEMBER	Is the goal here to be able to round-trip the file, such that calling .to_netcdf() produces an identical file to the original source file? Yes, exactly. I don't understand how encoding interacts with attributes? When is something an attribute vs. an encoding (add_offset for example)? Typically, we store things in encoding that are attributes on the underlying NetCDF file, but no longer make sense to describe the decoded data. For example: - On the file, `add_offset` is an attribute. - If loaded with `open_dataset(..., mask_and_scale=True)`, `add_offset` can be found in `encoding`, not `attrs`, because the data has already been offset. - If loaded with `open_dataset(..., mask_and_scale=False)`, `add_offset` will still be on `attrs` (the data has not been offset). How does xarray know whether the store automatically encodes / decodes the encodings vs. when it has to be done by xarray, e.g. by calling mask_and_scale Currently, we assume that stores never do this, and always handle it ourselves. We might need a special exception for zarr and scale/offset encoding. Does this mean that my ZarrStore should inherit from WritableCFDataStore instead of AbstractWritableDataStore? Maybe, though again it will probably need slightly customized conventions for writing data (if we let zarr handling scale/offset encoding). I don't yet understand how to make these elements work together properly, for example, do avoid applying the scale / offset function twice, as I mentioned above. We have two options: 1. Handle it all in xarray via the machinery in `conventions.py`. Never pass the arguments to do scale/offset encoding to zarr (just save them as attributes). 2. Handle it all in zarr. We'll need special case logic to skip this part of encoding. I think (2) would be the preferred way to do this.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
325716892	https://github.com/pydata/xarray/pull/1528#issuecomment-325716892	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDMyNTcxNjg5Mg==	shoyer 1217238	2017-08-29T16:19:57Z	2017-08-29T16:19:57Z	MEMBER	@rabernat I think this is #1531 -- `require_pynio` seems to have infected all our other requirements!	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 1, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694
325525827	https://github.com/pydata/xarray/pull/1528#issuecomment-325525827	https://api.github.com/repos/pydata/xarray/issues/1528	MDEyOklzc3VlQ29tbWVudDMyNTUyNTgyNw==	shoyer 1217238	2017-08-29T01:14:05Z	2017-08-29T01:14:05Z	MEMBER	What is "encoding" at the variable level? (I have never understood this part of xarray.) How should encoding be handled with zarr? `encoding` keeps track of how variables are represented in a file (e.g., chunking schemes, `_FillValue`/`add_offset`/`scale_factor` compression, time units), so we reconstruct a netCDF file that looks almost exactly like the file we've read from disk. In the case of zarr, I guess we might include chunking, fill values, compressor options.... Should we encode / decode CF for zarr stores? Yes, probably, if we want to handle netcdf conventions for times, fill values and scaling. Do we want to always automatically align dask chunks with the underlying zarr chunks? This would be nice! But it's also a bigger issue (will look for the number, I think it's already been opened). What sort of public API should the zarr backend have? Should you be able to load zarr stores via open_dataset? Or do we need a new method? I think .to_zarr() would be quite useful. Still need to think about this one. zarr arrays are extensible along all axes. What does this imply for unlimited dimensions? I guess we can ignore them (maybe add a warning?) -- they're not part of the zarr data model. Is any autoclose logic needed? As far as I can tell, zarr objects don't need to be closed. I don't think we need any autoclose logic at all -- zarr doesn't leave open files hanging around already.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Zarr backend 253136694

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

17 rows where author_association = "MEMBER", issue = 253136694 and user = 1217238 sorted by updated_at descending

untested, but I think this does the appropriate shape munging to make slices

appear as the last axes of the result array

Advanced export