github: issue_comments: where author_association = "MEMBER" and user = 1217238 sorted by updated

where author_association = "MEMBER" and user = 1217238 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1572412059	https://github.com/pydata/xarray/pull/7880#issuecomment-1572412059	https://api.github.com/repos/pydata/xarray/issues/7880	IC_kwDOAMm_X85duRqb	shoyer 1217238	2023-06-01T16:51:07Z	2023-06-01T17:10:49Z	MEMBER	Given that this error only is caused when Python is shutting down, which is exactly a case in which we do not need to clean up open file objects, maybe we can remove the `__del__` instead? Something like: ```python import atexit @atexit.register def _remove_del_method(): # We don't need to close unclosed files at program exit, # and may not be able to do, because Python is cleaning up # imports. del CachingFileManager.del ``` (I have not tested this!)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	don't use `CacheFileManager.__del__` on interpreter shutdown 1730664352
1572350143	https://github.com/pydata/xarray/pull/7880#issuecomment-1572350143	https://api.github.com/repos/pydata/xarray/issues/7880	IC_kwDOAMm_X85duCi_	shoyer 1217238	2023-06-01T16:16:40Z	2023-06-01T16:16:40Z	MEMBER	I agree that this seems very hard to test! Have you verfied that this fixes things at least on your machine?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	don't use `CacheFileManager.__del__` on interpreter shutdown 1730664352
1546951468	https://github.com/pydata/xarray/issues/5511#issuecomment-1546951468	https://api.github.com/repos/pydata/xarray/issues/5511	IC_kwDOAMm_X85cNJss	shoyer 1217238	2023-05-14T17:17:56Z	2023-05-14T17:17:56Z	MEMBER	If we can find cases where we know concurrent writes are unsafe, we can definitely start raising errors. Better to be safe than to suffer from silent data corruption!	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending data to a dataset stored in Zarr format produce PermissonError or NaN values in the final result 927617256
1543042186	https://github.com/pydata/xarray/issues/7325#issuecomment-1543042186	https://api.github.com/repos/pydata/xarray/issues/7325	IC_kwDOAMm_X85b-PSK	shoyer 1217238	2023-05-11T01:24:27Z	2023-05-11T01:24:27Z	MEMBER	For anyone following along, I released a small package for reading TensorStore data into Xarray: https://github.com/google/xarray-tensorstore	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Support reading Zarr data via TensorStore 1465287257
1530685353	https://github.com/pydata/xarray/issues/4001#issuecomment-1530685353	https://api.github.com/repos/pydata/xarray/issues/4001	IC_kwDOAMm_X85bPGep	shoyer 1217238	2023-05-02T00:35:52Z	2023-05-02T00:35:52Z	MEMBER	Can we delete the "Flexible indexes" meeting? It doesn't happen anymore.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	[community] Bi-weekly community developers meeting 606530049
1526489103	https://github.com/pydata/xarray/issues/7764#issuecomment-1526489103	https://api.github.com/repos/pydata/xarray/issues/7764	IC_kwDOAMm_X85a_GAP	shoyer 1217238	2023-04-27T21:15:23Z	2023-04-27T21:15:23Z	MEMBER	Allowing for explicitly passing a function matching the `einsum` interface is certainly more flexible than a boolean or enum argument, so @TomNicholas's suggestion of `einsum_func=np.einsum` is the version I would suggest. The overhead from optimizing contraction paths is probably very small relative to the overhead of Xarray in general, so I would support setting `optimize=True` by default in Xarray, and/or using opt-einsum automatically if it is installed. JAX always use opt-einsum (opt-einsum is actually a hard dependency) and I have never heard any complaints.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Support opt_einsum in xr.dot 1672288892
1496912849	https://github.com/pydata/xarray/issues/6323#issuecomment-1496912849	https://api.github.com/repos/pydata/xarray/issues/6323	IC_kwDOAMm_X85ZORPR	shoyer 1217238	2023-04-05T04:49:34Z	2023-04-05T04:49:34Z	MEMBER	In the hypothetical invocation `open_dataset(..., return_encoding=True)`, do you envision the returned encoding as being a separate returned object, or would it still be an attribute on the Dataset object? My expectation was that this would be a separate object, e.g., `dataset, encoding = xarray.open_dataset(..., return_encoding=True)`, where `encoding` is a dict providing the encoding on each variable, and which could be passed as the `encoding` argument into `to_netcdf()`. That said, I can see how keeping encoding as variable attributes could also be convenient. "disable all encoding propagation by discarding encoding attributes once a Dataset has been modified" would be an intermediate step, on the route to removing `encoding` from Xarray's data model entirely entirely. (As a side note, I would probably spell this as `open_dataset_with_encoding` rather than having a function with a variable return signature.)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	propagation of `encoding` 1158378382
1464180874	https://github.com/pydata/xarray/issues/2227#issuecomment-1464180874	https://api.github.com/repos/pydata/xarray/issues/2227	IC_kwDOAMm_X85XRaCK	shoyer 1217238	2023-03-10T18:04:23Z	2023-03-10T18:04:23Z	MEMBER	@dschwoerer are you sure that you are actually calculating the same thing in both cases? What exactly do the values of `slc[d]` look like? I would test thing on smaller inputs to verify. My guess is that you are inadvertently calculating something different, recalling that Xarray's broadcasting rules differ slightly from NumPy's.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Slow performance of isel 331668890
1434932769	https://github.com/pydata/xarray/issues/4079#issuecomment-1434932769	https://api.github.com/repos/pydata/xarray/issues/4079	IC_kwDOAMm_X85Vh1Yh	shoyer 1217238	2023-02-17T17:03:52Z	2023-02-17T17:03:52Z	MEMBER	I agree, automatic dimension only ever really made sense for interactive usecases, where a user could see and fix the default names. It's a little late to change the default now to raising an error instead, but maybe we could add a warning?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Unnamed dimensions 621078539
1414068565	https://github.com/pydata/xarray/issues/5081#issuecomment-1414068565	https://api.github.com/repos/pydata/xarray/issues/5081	IC_kwDOAMm_X85USPlV	shoyer 1217238	2023-02-02T17:00:39Z	2023-02-02T17:00:39Z	MEMBER	Is `LazilyIndexedArray` really a public API? I don't see it on the API docs page. Personally I would not want to guarantee external stability/availability for this API in its current state.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Lazy indexing arrays as a stand-alone package 842436143
1412396693	https://github.com/pydata/xarray/pull/7496#issuecomment-1412396693	https://api.github.com/repos/pydata/xarray/issues/7496	IC_kwDOAMm_X85UL3aV	shoyer 1217238	2023-02-01T17:00:21Z	2023-02-01T17:00:21Z	MEMBER	I like `open_zarr(...)` because it's less typing than `open_dataset(..., engine='zarr')`. The automatic backend detection logic doesn't currently work for Zarr, and in every case it adds overhead, which could be significant in the case of remote storage backends like Zarr. So personally I would rather go the other direction and add `open_netcdf()`. The inconsistency in the `chunks` argument is non-ideal, but that could be handled by a separate deprecation process.	{ "total_count": 5, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	deprecate open_zarr 1564661430
1378074559	https://github.com/pydata/xarray/pull/7418#issuecomment-1378074559	https://api.github.com/repos/pydata/xarray/issues/7418	IC_kwDOAMm_X85SI7-_	shoyer 1217238	2023-01-11T00:27:47Z	2023-01-11T00:27:47Z	MEMBER	I agree, datatree is an important data structure for Xarray. My preferred way to do this would be follow @rabernat's suggestion and to fork the code the existing repo into the Xarray main codebase. My main concern is that we should carefully evaluate the datatree API to make sure we won't want to change it soon. Once we bring it into Xarray, there will be a higher expectation that the interface will remain stable.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Import datatree in xarray? 1519552711
1366880017	https://github.com/pydata/xarray/issues/7404#issuecomment-1366880017	https://api.github.com/repos/pydata/xarray/issues/7404	IC_kwDOAMm_X85ReO8R	shoyer 1217238	2022-12-28T19:46:07Z	2022-12-28T19:46:07Z	MEMBER	If you care about memory usage, you should explicitly close files after you use them, e.g., by calling `ds.close()` or by using a context manager. Does that work for you?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Memory leak - xr.open_dataset() not releasing memory. 1512460818
1351908915	https://github.com/pydata/xarray/issues/7344#issuecomment-1351908915	https://api.github.com/repos/pydata/xarray/issues/7344	IC_kwDOAMm_X85QlH4z	shoyer 1217238	2022-12-14T18:24:04Z	2022-12-14T18:24:04Z	MEMBER	I think it's OK to still require bottleneck for `ffill` and `bfill`: There are no numerical concerns: these functions simply repeat numbers forward (or backwards). There is no good alternative to using a loop, and writing the loop in NumPy would be probitively slow.	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Disable bottleneck by default? 1471685307
1345646743	https://github.com/pydata/xarray/pull/7368#issuecomment-1345646743	https://api.github.com/repos/pydata/xarray/issues/7368	IC_kwDOAMm_X85QNPCX	shoyer 1217238	2022-12-11T20:17:15Z	2022-12-11T20:17:15Z	MEMBER	I'm actually trying to merge `IndexedCoordinates` with `Coordinates` but I'm stuck: the latter is abstract and I don't really see how I could refactor it together with `DatasetCoordinates` and `DataArrayCoordinates` `Coordinates` is abstract because in the (current) Xarray data model, it doesn't actually store any data -- coordinates are stored in private attributes of the original `Dataset` (`._variables` and `._coord_names`) or `DataArray` (`._coords`). So `Coordinates` needs to serve as a proxy for the data. In the long term, I think we should refactor `Dataset`/`DataArray` to actually store data (coordinate variables, indexes and dimension sizes) on `Coordinates`, but that's a bigger refactor. For now, it's worth noting that the current `Coordinates` class isn't actually exposed in Xarray's public API, just the `DatasetCoordinates` and `DataArrayCoordinates` classes (and not even their constructors). So the intermdiate step I would try is: 1. Rename the current `Coordinates` baseclass to `AbstractCoordinates`. 2. Rename your `IndexedCoordinates` class to `Coordinates`. Expose it in the public API. Make sure it can handle `DatasetCoordinates` and `DataArrayCoordinates` in the constructor. 3. Maybe: use some Python magic to make DatasetCorodinates/DataArrayCoordinates subclasses of the new Coordinates. Or maybe make them actual subclassses, overriding many of the methods (including the constructor).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Expose "Coordinates" as part of Xarray's public API 1485037066
1344968954	https://github.com/pydata/xarray/pull/7368#issuecomment-1344968954	https://api.github.com/repos/pydata/xarray/issues/7368	IC_kwDOAMm_X85QKpj6	shoyer 1217238	2022-12-10T01:37:35Z	2022-12-10T01:37:35Z	MEMBER	Long term, do you think it would make sense to merge together Indexes, Coordinates and IndexedCoordinates? They are sort of all containers for the same thing.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Expose "Coordinates" as part of Xarray's public API 1485037066
1344944917	https://github.com/pydata/xarray/pull/7368#issuecomment-1344944917	https://api.github.com/repos/pydata/xarray/issues/7368	IC_kwDOAMm_X85QKjsV	shoyer 1217238	2022-12-10T00:31:46Z	2022-12-10T00:31:46Z	MEMBER	what do you think about the approach proposed here? I'd like to check that with you before going further with tests, docs, etc. Generally this looks great to me! How to avoid building any default index? It seems silly to add or use the `indexes` argument just for that purpose? We could address that later. My suggestion would be: `coords` passed as a `dict`: create default indexes `coords` passed as `IndexedCoordinates`: do not create defaults Alternatively to an IndexedCoordinates subclass I was wondering if we could reuse the Coordinates base class? Yes, this makes more sense to me! What if the Indexes class was a facade based on IndexedCoordinates instead of the other way around? Yes, I also agree! This makes more sense.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Expose "Coordinates" as part of Xarray's public API 1485037066
1341296800	https://github.com/pydata/xarray/issues/6610#issuecomment-1341296800	https://api.github.com/repos/pydata/xarray/issues/6610	IC_kwDOAMm_X85P8pCg	shoyer 1217238	2022-12-07T17:12:05Z	2022-12-07T17:12:05Z	MEMBER	I also like the idea of creating specific Grouper objects for different types of selection, e.g., `UniqueGrouper` (the default), `BinGrouper`, `TimeResampleGrouper`, etc.	{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Update GroupBy constructor for grouping by multiple variables, dask arrays 1236174701
1338121102	https://github.com/pydata/xarray/issues/7350#issuecomment-1338121102	https://api.github.com/repos/pydata/xarray/issues/7350	IC_kwDOAMm_X85PwhuO	shoyer 1217238	2022-12-05T20:23:46Z	2022-12-05T20:23:46Z	MEMBER	IMO, it's not correctly implementing the rule as you phrased it. You said "still present", which isn't the case here since the coordinate wasn't present before. Another way of describing the current behavior would be that xarray keeps around "every coordinate which could possibly still be valid," which is determined based upon dimension names. The main challenge is that "Coordinate variables should not have their coordinates changed" doesn't really make sense in Xarray's data model. Only `Dataset` or `DataArray` objects have coordinates, which apply to the the entire Dataset/DataArray. Let me give an example of why we might want to keep scalar coordinates around. Consider a Dataset where `lat` and `lon` need to be represented as 2D arrays, along `x` and `y` dimensions. If we index out a single lat/lon point, i.e., `ds.isel(x=0, y=0)` it would have scalar coordinates "x", "y", "lat" and "lon." If we now convert any of these to a DataArray, arguably all the coordinates are still valid.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Coordinate variable gains coordinate on subset 1473329967
1336304695	https://github.com/pydata/xarray/issues/7342#issuecomment-1336304695	https://api.github.com/repos/pydata/xarray/issues/7342	IC_kwDOAMm_X85PpmQ3	shoyer 1217238	2022-12-04T02:28:45Z	2022-12-04T02:28:45Z	MEMBER	The "robust" part is really just a modification to how the limits for color scales are chosen, i.e., ignoring the bottom and top 2% of the dtaa from the color scale. So it sounds like what you're hoping for is separate per-column or per-row color scaling?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`xr.DataArray.plot.pcolormesh(robust="col/row")` 1471561942
1336302962	https://github.com/pydata/xarray/issues/7350#issuecomment-1336302962	https://api.github.com/repos/pydata/xarray/issues/7350	IC_kwDOAMm_X85Ppl1y	shoyer 1217238	2022-12-04T02:16:25Z	2022-12-04T02:16:25Z	MEMBER	This was an intentional design choice, back in the early days of Xarray. The rule Xarray uses for choosing which coordinates to associate with a DataArray created from a Dataset or DataArray is "every coordinate whose dimensions are still present on the new DataArray." This includes scalar coordinates, which are always kept around (because their dimensions are always included). What rule would you suggest instead? I agree that the behavior in this case "feels" wrong, but keep in mind that once `time` because a scalar coordinate, Xarray doesn't have any way of knowing that it used to have its own dimension.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Coordinate variable gains coordinate on subset 1473329967
1336299057	https://github.com/pydata/xarray/issues/7344#issuecomment-1336299057	https://api.github.com/repos/pydata/xarray/issues/7344	IC_kwDOAMm_X85Ppk4x	shoyer 1217238	2022-12-04T01:55:34Z	2022-12-04T01:55:34Z	MEMBER	The case where Bottleneck really makes a difference was for moving window statistics, where it uses a smarter algorithm than our current NumPy implementation, which creating a moving window view. Otherwise, I agree, it probably isn't worth the trouble. That said -- we could also switch to smarter NumPy based algorithms to implement most moving window calculations, e.g,. using `np.nancumsum` for moving window means.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Disable bottleneck by default? 1471685307
1330164500	https://github.com/pydata/xarray/issues/7299#issuecomment-1330164500	https://api.github.com/repos/pydata/xarray/issues/7299	IC_kwDOAMm_X85PSLMU	shoyer 1217238	2022-11-29T06:53:48Z	2022-11-29T06:53:48Z	MEMBER	Difference between empty and non-empty arrays comes from different logic used for empty arrays in `Variable._getitem_with_mask`. The problem itself can be solved by fixing `dtypes.maybe_promote` to return `fill_value=np.float32('nan')` instead of `fill_value=np.nan` on `dtype('float32')` input. Thanks for the excellent report! I agree, this sounds like a good fix to me. I think something like the following would work: Replace the return line of `maybe_promote` `python return np.dtype(dtype), fill_value` with `python dtype = np.dtype(dtype) fill_value = dtype.type(fill_value) return dtype, fill_value`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	DataArray.reindex of empty array changes dtype from float32 to float64 1455771929
1328156723	https://github.com/pydata/xarray/pull/7323#issuecomment-1328156723	https://api.github.com/repos/pydata/xarray/issues/7323	IC_kwDOAMm_X85PKhAz	shoyer 1217238	2022-11-27T02:31:51Z	2022-11-27T02:31:51Z	MEMBER	Use cases would be in any web service that would like to provide the final data values back to a user in JSON. For what it's worth, I think your users will have a poor experience with encoded JSON data for very large arrays. It will be slow to compress and transfer this data. In the long term, you would probably do better to transmit the data in some binary form (e.g., by calling `tobytes()` on the underlying np.ndarray objects, or by using Xarray's `to_netcdf`).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	(Issue #7324) added functions that return data values in memory efficient manner 1465047346
1328156304	https://github.com/pydata/xarray/pull/7323#issuecomment-1328156304	https://api.github.com/repos/pydata/xarray/issues/7323	IC_kwDOAMm_X85PKg6Q	shoyer 1217238	2022-11-27T02:27:07Z	2022-11-27T02:27:07Z	MEMBER	Thanks for report and the PR! This really needs a "minimal complete verifiable" example (e.g., by creating and loading a Zarr array with random data) so others can verify your reported the performance gains: https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports https://stackoverflow.com/help/minimal-reproducible-example To be honest, this fix looks a little funny to me, because NumPy's own implementation of `tolist()` is so similar. I would love to understand what is going on. If you can reproduce the issue only using NumPy, it could also make more sense to file this as a upstream bug report to NumPy. The NumPy maintainers are in a better position to debug tricky memory allocation issues involving NumPy.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	(Issue #7324) added functions that return data values in memory efficient manner 1465047346
1295283938	https://github.com/pydata/xarray/pull/7214#issuecomment-1295283938	https://api.github.com/repos/pydata/xarray/issues/7214	IC_kwDOAMm_X85NNHbi	shoyer 1217238	2022-10-28T17:49:10Z	2022-10-28T17:49:10Z	MEMBER	Explicitly providing indexes is an advanced user feature. Agreed. However, `xr.Dataset(coords={"x": pandas_midx})` is something that presumably a lot of users rely on (it is used extensively in Xarray's tests) and that we should really deprecate IMO. If we don't provide a convenient alternative, I expect many of those users will complain. I agree -- we should support this for backwards compatibility (even if we deprecate it). it's easier to explicitly manipulate indexes in the form of a dict While generally I also prefer handling plain `dict` objects over custom dict-like objects, here I don't see much reasons of manipulating Xarray index objects independently of their coordinate variables. `Indexes` allows keeping them tied together, and it is already returned by `.xindexes`. EDIT -- For more context: initially an `Indexes` object was almost equivalent to a `Frozen(obj._indexes)`. In #5692 I tried hard and struggled to keep dealing with separate dicts of indexes and indexed variables, but in the end it made things much easier to encapsulate the variables in `Indexes`, which is also used internally in different places. OK, this totally makes sense. I don't love that it is possible to express invalid states in Xarray's data model. This motivated the creation of `assert_internal_invariants` and currently mostly is a concern for Xarray's own developers, but when we exposes the `indexes` argument, it will be easier for users to make the same sort of errors. I wonder if we should consider the broader refactor of merging the `Indexes` and `Coordinates` objects, and expose the constructor as a public API. For clarity, I'll call it `CoordinatesAndIndexes` for now, but it could likely reuse the public name of `Coordinates`. This would have a number of benefits: It's impossible to provide inconsistent `coords` and `indexes`, because there is no separate `indexes` argument. Likewise, it is impossible to create inconsistent coordinates and indexes on an existing Xarray object. All the logic for verifying consistent coords and indexes can go in one place, shared between Dataset/DataArray. (Yes, it would be annoying to refactor `Dataset` to merge in variables from `CoordinatesAndIndexes` rather than the current separate `Dataset._variables`) The public API also becomes clearer: if users want default indexes, they can pass a dict of variables into `coords`. If they want to copy indexes from another object, they can pass in a `CoordinatesAndIndexes` object (either from another Xarray object or constructed directly).	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Pass indexes directly to the DataArray and Dataset constructors 1422543378
1294262457	https://github.com/pydata/xarray/pull/7221#issuecomment-1294262457	https://api.github.com/repos/pydata/xarray/issues/7221	IC_kwDOAMm_X85NJOC5	shoyer 1217238	2022-10-28T00:27:22Z	2022-10-28T00:27:22Z	MEMBER	I no longer remember why I added these checks, but I certainly did not expect to see this sort of performance penalty!	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Remove debugging slow assert statement 1423312198
1293909730	https://github.com/pydata/xarray/pull/7214#issuecomment-1293909730	https://api.github.com/repos/pydata/xarray/issues/7214	IC_kwDOAMm_X85NH37i	shoyer 1217238	2022-10-27T18:28:40Z	2022-10-27T18:28:40Z	MEMBER	I'm thinking of only accepting one or more instances of Indexes as indexes argument in the Dataset and DataArray constructors I would lean against this, only because it's easier to explicitly manipulate indexes in the form of a `dict` than an `xarray.Indexes` object. Explicitly providing indexes is an advanced user feature. I think it's OK to require users to do a bit more work in this case and to not necessarily do consistency checks (beyond verifying that the coordinate variables exist).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Pass indexes directly to the DataArray and Dataset constructors 1422543378
1288188522	https://github.com/pydata/xarray/issues/7132#issuecomment-1288188522	https://api.github.com/repos/pydata/xarray/issues/7132	IC_kwDOAMm_X85MyDJq	shoyer 1217238	2022-10-23T19:59:28Z	2022-10-23T19:59:28Z	MEMBER	This is correct -- `CFDatetimeCoder.encode` is not lazy, even if the inputs are Dask arrays. We would welcome contributions to fix this. This would entail making the `encode` look similar to the `decode` method (using `lazy_elemwise_func`). We would also need a fall-back method for determining appropriate time units without looking at the array values. Something like `seconds since 1900-01-01T00:00:00` would probably be a reasonable choice.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Saving a DataArray of datetime objects as zarr is not a lazy operation despite compute=False 1397532790
1286421985	https://github.com/pydata/xarray/issues/6807#issuecomment-1286421985	https://api.github.com/repos/pydata/xarray/issues/6807	IC_kwDOAMm_X85MrT3h	shoyer 1217238	2022-10-21T03:49:18Z	2022-10-21T03:49:18Z	MEMBER	Cubed should define a concatenate function, so that should be OK	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Alternative parallel execution frameworks in xarray 1308715638
1278202565	https://github.com/pydata/xarray/pull/4879#issuecomment-1278202565	https://api.github.com/repos/pydata/xarray/issues/4879	IC_kwDOAMm_X85ML9LF	shoyer 1217238	2022-10-13T21:34:05Z	2022-10-13T21:34:05Z	MEMBER	I think we could fix this by marking CachingFileManager with `typing.final`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Cache files for different CachingFileManager objects separately 803068773
1269050790	https://github.com/pydata/xarray/pull/4879#issuecomment-1269050790	https://api.github.com/repos/pydata/xarray/issues/4879	IC_kwDOAMm_X85LpC2m	shoyer 1217238	2022-10-05T22:27:28Z	2022-10-05T22:27:28Z	MEMBER	Anyone want to review here? I think this should be ready to go in.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Cache files for different CachingFileManager objects separately 803068773
1268700309	https://github.com/pydata/xarray/pull/4879#issuecomment-1268700309	https://api.github.com/repos/pydata/xarray/issues/4879	IC_kwDOAMm_X85LntSV	shoyer 1217238	2022-10-05T17:06:02Z	2022-10-05T17:57:19Z	MEMBER	~~Actually maybe we don't need to keep files open after pickling... let me give this one more try.~~ Nevermind, this didn't work -- it still results in failing tests with dask-distributed on Windows.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Cache files for different CachingFileManager objects separately 803068773
1268684962	https://github.com/pydata/xarray/pull/4879#issuecomment-1268684962	https://api.github.com/repos/pydata/xarray/issues/4879	IC_kwDOAMm_X85Lnpii	shoyer 1217238	2022-10-05T16:51:14Z	2022-10-05T16:51:14Z	MEMBER	OK, after a bit more futzing tests are passing and I think this is actually ready to go in. I ended up leaving in the reference counting after all -- I couldn't figure out another way to keep files open after a pickle round-trip.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Cache files for different CachingFileManager objects separately 803068773
1260250383	https://github.com/pydata/xarray/issues/6293#issuecomment-1260250383	https://api.github.com/repos/pydata/xarray/issues/6293	IC_kwDOAMm_X85LHeUP	shoyer 1217238	2022-09-28T00:49:26Z	2022-09-28T00:49:26Z	MEMBER	Yes yes -- the sooner we can get rid of MultiIndex special cases the better!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes: next steps 1148021907
1259823913	https://github.com/pydata/xarray/pull/4879#issuecomment-1259823913	https://api.github.com/repos/pydata/xarray/issues/4879	IC_kwDOAMm_X85LF2Mp	shoyer 1217238	2022-09-27T17:26:06Z	2022-09-27T17:26:06Z	MEMBER	I added @cjauvin's integration test, and verified that the fix works for the scipy and h5netcdf backends. Unfortunately, it doesn't work yet for the netCDF4 backend. I don't think we can solve this in Xarray without fixes netCDF4-Python or the netCDF-C library: https://github.com/Unidata/netcdf4-python/issues/1195	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Cache files for different CachingFileManager objects separately 803068773
1249910951	https://github.com/pydata/xarray/issues/7045#issuecomment-1249910951	https://api.github.com/repos/pydata/xarray/issues/7045	IC_kwDOAMm_X85KgCCn	shoyer 1217238	2022-09-16T22:26:36Z	2022-09-16T22:26:36Z	MEMBER	As a concrete example, suppose we have two datasets: 1. Hourly predictions for 10 days 2. Daily observations for a month. ```python import numpy as np import pandas as pd import xarray predictions = xarray.DataArray( np.random.RandomState(0).randn(2410), {'time': pd.date_range('2022-01-01', '2022-01-11', freq='1h', closed='left')}, ) observations = xarray.DataArray( np.random.RandomState(1).randn(31), {'time': pd.date_range('2022-01-01', '2022-01-31', freq='24h')}, ) ``` Today, if you compare these datasets, they automatically align: ``` predictions - observations <xarray.DataArray (time: 10)> array([ 0.13970698, 2.88151104, -1.0857261 , 2.21236931, -0.85490761, 2.67796423, 0.63833301, 1.94923669, -0.35832191, 0.23234996]) Coordinates: time (time) datetime64[ns] 2022-01-01 2022-01-02 ... 2022-01-10 ``` With this proposed change, you would get an error, e.g., something like: ``` predictions - observations ValueError: xarray objects are not aligned along dimension 'time': array(['2022-01-01T00:00:00.000000000', '2022-01-02T00:00:00.000000000', '2022-01-03T00:00:00.000000000', '2022-01-04T00:00:00.000000000', '2022-01-05T00:00:00.000000000', '2022-01-06T00:00:00.000000000', '2022-01-07T00:00:00.000000000', '2022-01-08T00:00:00.000000000', '2022-01-09T00:00:00.000000000', '2022-01-10T00:00:00.000000000', '2022-01-11T00:00:00.000000000', '2022-01-12T00:00:00.000000000', '2022-01-13T00:00:00.000000000', '2022-01-14T00:00:00.000000000', '2022-01-15T00:00:00.000000000', '2022-01-16T00:00:00.000000000', '2022-01-17T00:00:00.000000000', '2022-01-18T00:00:00.000000000', '2022-01-19T00:00:00.000000000', '2022-01-20T00:00:00.000000000', '2022-01-21T00:00:00.000000000', '2022-01-22T00:00:00.000000000', '2022-01-23T00:00:00.000000000', '2022-01-24T00:00:00.000000000', '2022-01-25T00:00:00.000000000', '2022-01-26T00:00:00.000000000', '2022-01-27T00:00:00.000000000', '2022-01-28T00:00:00.000000000', '2022-01-29T00:00:00.000000000', '2022-01-30T00:00:00.000000000', '2022-01-31T00:00:00.000000000'], dtype='datetime64[ns]') vs array(['2022-01-01T00:00:00.000000000', '2022-01-01T01:00:00.000000000', '2022-01-01T02:00:00.000000000', ..., '2022-01-10T21:00:00.000000000', '2022-01-10T22:00:00.000000000', '2022-01-10T23:00:00.000000000'], dtype='datetime64[ns]') ``` Instead, you would need to manually align these objects, e.g., with `xarray.align`, `reindex_like()` or `interp_like()`, e.g., ``` predictions, observations = xarray.align(predictions, observations) `or` observations = observations.reindex_like(predictions) `or` predictions = predictions.interp_like(observations) ``` To (partially) simulate the effect of this change on a codebase today, you could write `xarray.set_options(arithmetic_join='exact')` -- but presmably it would also make sense to change Xarray's other alignment code (e.g., in `concat` and `merge`).	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Should Xarray stop doing automatic index-based alignment? 1376109308
1249601076	https://github.com/pydata/xarray/issues/7045#issuecomment-1249601076	https://api.github.com/repos/pydata/xarray/issues/7045	IC_kwDOAMm_X85Ke2Y0	shoyer 1217238	2022-09-16T17:16:52Z	2022-09-16T17:18:38Z	MEMBER	IMO we could first align (hah) these choices to be the same: the exact mode of automatic alignment (outer vs inner vs left join) depends on the specific operation. The problem is that user expectations are actually rather different for different options: With data movement operations like `xarray.merge`, you expect to keep around all existing data -- so you want an outer join. With inplace operations that modify an existing Dataset, e.g., by adding new variables, you don't expect the existing coordinates to change -- so you want a left join. With computate based operations (like arithmatic), you don't have an expectation that all existing data is unmodified, so keeping around a bunch of NaN values felt very wasteful -- hence the inner join. What do you think of making the default FloatIndex use a reasonable (hard to define!) `rtol` for comparisons? This would definitely be a step forward! However, it's a tricky nut to crack. We would both need a heuristic for defining `rtol` (some fraction of coordinate spacing?) and a method for deciding what the resulting coordinates should be (use values from the first object?). Even then, automatic alignment is often problematic, e.g., imagine cases where a coordinate is defined in separate units.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Should Xarray stop doing automatic index-based alignment? 1376109308
1244918028	https://github.com/pydata/xarray/issues/7002#issuecomment-1244918028	https://api.github.com/repos/pydata/xarray/issues/7002	IC_kwDOAMm_X85KM_EM	shoyer 1217238	2022-09-13T05:30:12Z	2022-09-13T05:30:12Z	MEMBER	I like option (4). If a multi-coordinate index needs to care about order, it can implement that logic itself.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Custom indexes and coordinate (re)ordering 1364388790
1210976795	https://github.com/pydata/xarray/issues/6904#issuecomment-1210976795	https://api.github.com/repos/pydata/xarray/issues/6904	IC_kwDOAMm_X85ILgob	shoyer 1217238	2022-08-10T16:43:36Z	2022-08-10T16:43:36Z	MEMBER	You might look into different multiprocessing modes: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods It may also be that the NetCDF or HDF5 libraries were simply not written in a way that can support multi-processing. This would not surprise me. BTW is there any advantage or difference in terms of cpu and memory consumption in opening the file only one or let it open by every process? I'm asking because I thought opening in every process was just plain stupid but it seems to perform exactly the same, so maybe I'm just creating a problem where there is none I agree, maybe this isn't worth the trouble. I have not seen it done successfully before.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`sel` behaving randomly when applying to a dataset with multiprocessing 1333650265
1210255676	https://github.com/pydata/xarray/issues/6904#issuecomment-1210255676	https://api.github.com/repos/pydata/xarray/issues/6904	IC_kwDOAMm_X85IIwk8	shoyer 1217238	2022-08-10T07:10:41Z	2022-08-10T07:10:41Z	MEMBER	Will that work in the same way if I still use `process_map`, which uses `concurrent.futures` under the hood? Yes it should, as long as you're using multi-processing under the covers. If you do multi-threading, then you would want to use `threading.Lock()`. But I believe we already apply a thread lock by default.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`sel` behaving randomly when applying to a dataset with multiprocessing 1333650265
1210233503	https://github.com/pydata/xarray/issues/6904#issuecomment-1210233503	https://api.github.com/repos/pydata/xarray/issues/6904	IC_kwDOAMm_X85IIrKf	shoyer 1217238	2022-08-10T06:45:06Z	2022-08-10T06:45:06Z	MEMBER	Can you try explicitly passing in a multiprocessing lock into the `open_dataset()` constructor? Something like: `python from multiprocessing import Lock ds = xarray.open_dataset(file, lock=Lock())` (We automatically select appropriate locks if using Dask, but I'm not sure how we would do that more generally...)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`sel` behaving randomly when applying to a dataset with multiprocessing 1333650265
1210190649	https://github.com/pydata/xarray/issues/4285#issuecomment-1210190649	https://api.github.com/repos/pydata/xarray/issues/4285	IC_kwDOAMm_X85IIgs5	shoyer 1217238	2022-08-10T05:48:47Z	2022-08-10T05:48:47Z	MEMBER	I am tempted to suggest that the right way to handle Awkward array is to treat "var" dimensions similar to NumPy's structured dtypes, with `shape` only handling non-variable dimensions. The uniform dimensions are the only ones for which Xarray's API is going to work properly out of the box, and Awkward array properly already has the right tools for working with ragged dimensions. Either way, I would definitely encourage figuring out some actual use-cases before building this out :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Awkward array backend? 667864088
1204280307	https://github.com/pydata/xarray/pull/6874#issuecomment-1204280307	https://api.github.com/repos/pydata/xarray/issues/6874	IC_kwDOAMm_X85Hx9vz	shoyer 1217238	2022-08-03T17:44:20Z	2022-08-03T17:44:20Z	MEMBER	As I understand it, the main purpose here is to remove Xarray lazy indexing class. Maybe call this `get_duck_array()`, just to be a little more descriptive?	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Avoid calling np.asarray on lazy indexing classes 1327380960
1200314984	https://github.com/pydata/xarray/issues/2304#issuecomment-1200314984	https://api.github.com/repos/pydata/xarray/issues/2304	IC_kwDOAMm_X85Hi1po	shoyer 1217238	2022-07-30T23:55:04Z	2022-07-30T23:55:04Z	MEMBER	the unpacked data should match the type of these attributes, which must both be of type float or both be of type double. An additional restriction in this case is that the variable containing the packed data must be of type byte, short or int. It is not advised to unpack an int into a float as there is a potential precision loss. I find this is ambiguous. is `float` above referring to `float16` or `float32`? Is `double` referring to `float64`? Yes, I'm pretty sure "float" means single precision (np.float32), given that "double" certainly means double precision (no.float64). If so, then they do recommend `float64`, as requested by the OP, because the test data is `short` and the `scale_factor` is `float64` (a.k.a `double`?) Yes, I believe so. The broader discussion here is about CF compliance. I find the spec ambiguous and xarray non-compliant. So many tests rely on the existing behavior, that I am unsure how best to proceed to improve compliance. I worry it may be a major refactor, and possibly break things relying on the existing behavior. I'd like to discuss architecture. Should this be in a new issue, if this closes with PR #6851? Should there be a new keyword for `cf_strict` or something? I think we can treat this a bug fix and just go forward with it. Yes, some people are going to be surprised, but I don't think it's distruptive enough that we need to go to a major effort to preserve backwards compatibility. It should already be straightforward to work around by setting `decode_cf=False` when opening a file and then explicitly calling `xarray.decode_cf()`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822
1199939328	https://github.com/pydata/xarray/issues/6849#issuecomment-1199939328	https://api.github.com/repos/pydata/xarray/issues/6849	IC_kwDOAMm_X85HhZ8A	shoyer 1217238	2022-07-29T20:56:05Z	2022-07-29T20:56:05Z	MEMBER	I agree, I think only setting a few indexes at a time would be normal. If we eventually need convenience methods for setting multiple indexes we can add those later.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Public API for setting new indexes: add a set_xindex method? 1322198907
1199753281	https://github.com/pydata/xarray/issues/6849#issuecomment-1199753281	https://api.github.com/repos/pydata/xarray/issues/6849	IC_kwDOAMm_X85HgshB	shoyer 1217238	2022-07-29T17:00:06Z	2022-07-29T17:00:06Z	MEMBER	This sounds great to me! I don't think we need support for setting multiple indexes at once in a single method call. You can call `set_xindex` multiple times for that if needed.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Public API for setting new indexes: add a set_xindex method? 1322198907
1198375377	https://github.com/pydata/xarray/issues/6833#issuecomment-1198375377	https://api.github.com/repos/pydata/xarray/issues/6833	IC_kwDOAMm_X85HbcHR	shoyer 1217238	2022-07-28T16:29:30Z	2022-07-28T16:29:30Z	MEMBER	I just toggled the "Require a pull request before merging" option	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Require a pull request before merging to main 1318800553
1188520871	https://github.com/pydata/xarray/issues/6807#issuecomment-1188520871	https://api.github.com/repos/pydata/xarray/issues/6807	IC_kwDOAMm_X85G12On	shoyer 1217238	2022-07-19T02:18:03Z	2022-07-19T02:18:03Z	MEMBER	Sounds good to me. The challenge will be defining a parallel computing API that works across all these projects, with their slightly different models.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Alternative parallel execution frameworks in xarray 1308715638
1183458691	https://github.com/pydata/xarray/issues/6505#issuecomment-1183458691	https://api.github.com/repos/pydata/xarray/issues/6505	IC_kwDOAMm_X85GiiWD	shoyer 1217238	2022-07-13T16:51:09Z	2022-07-13T16:51:31Z	MEMBER	Reopening because my second example `print(stacked.assign_coords(z=[1, 2, 3, 4]))` is still broken with the same error message. It would be ideal to fix this before the release.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Dropping a MultiIndex variable raises an error after explicit indexes refactor 1210267320
1176808719	https://github.com/pydata/xarray/issues/2697#issuecomment-1176808719	https://api.github.com/repos/pydata/xarray/issues/2697	IC_kwDOAMm_X85GJK0P	shoyer 1217238	2022-07-06T22:21:48Z	2022-07-06T22:21:48Z	MEMBER	Maybe a separate project in xarray-contrib would make sense? I would be reluctant to add this into Xarray proper if we need a new external dependency for reading XML files. On Wed, Jul 6, 2022 at 2:37 PM David Huard @.*> wrote: I've got a first draft that parses an NcML document and spits out an xarray.Dataset. It does not cover all the NcML syntax, but the essential elements are there. It uses xsdata https://xsdata.readthedocs.io/en/latest/ to parse the XML, using a datamodel automatically generated from the NcML 2-2 schema. I've scrapped test files from the netcdf-java https://github.com/Unidata/netcdf-java repo to create a test suite. Wondering what's the best place to host the code, tests and test data so others may give it a spin ? — Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2697#issuecomment-1176775280, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJFVW32WV5YKZZP7KFVBTVSX4BZANCNFSM4GRUVDBQ . You are receiving this because you were mentioned.Message ID: @.*>	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	read ncml files to create multifile datasets 401874795
1165848854	https://github.com/pydata/xarray/pull/6721#issuecomment-1165848854	https://api.github.com/repos/pydata/xarray/issues/6721	IC_kwDOAMm_X85FfXEW	shoyer 1217238	2022-06-24T18:57:42Z	2022-06-24T18:57:42Z	MEMBER	The simplest option would probably be a custom Zarr store that raises an error if you try to look at array data. This could be implemented as a subclass of an existing Zarr store (e.g., the in memory store) that raises an error in `__getitem__` is the filename of requests does not start with `.`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix .chunks loading lazy backed array data 1284071791
1165847538	https://github.com/pydata/xarray/pull/6721#issuecomment-1165847538	https://api.github.com/repos/pydata/xarray/issues/6721	IC_kwDOAMm_X85FfWvy	shoyer 1217238	2022-06-24T18:55:51Z	2022-06-24T18:55:51Z	MEMBER	We have some tests with `InaccessibleVariableDataStore` for this sort of thing, but I don't know immediately how to hook that into the Zarr backend.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix .chunks loading lazy backed array data 1284071791
1163345547	https://github.com/pydata/xarray/issues/6704#issuecomment-1163345547	https://api.github.com/repos/pydata/xarray/issues/6704	IC_kwDOAMm_X85FVz6L	shoyer 1217238	2022-06-22T16:31:33Z	2022-06-22T16:31:33Z	MEMBER	`Dataset.rename` does both variables and dimensions. That seems useful in many cases. I think it also makes more sense than `Dataset.drop` does, given that variables and dimensions often use the same names -- whereas `drop` mixed up variable names and index values.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Future of `DataArray.rename` 1275752720
1163299397	https://github.com/pydata/xarray/issues/6646#issuecomment-1163299397	https://api.github.com/repos/pydata/xarray/issues/6646	IC_kwDOAMm_X85FVopF	shoyer 1217238	2022-06-22T15:57:14Z	2022-06-22T15:57:14Z	MEMBER	NumPy mostly uses `axis` instead of `axes`, which we could copy.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`dim` vs `dims` 1250939008
1163296444	https://github.com/pydata/xarray/issues/6646#issuecomment-1163296444	https://api.github.com/repos/pydata/xarray/issues/6646	IC_kwDOAMm_X85FVn68	shoyer 1217238	2022-06-22T15:55:13Z	2022-06-22T15:56:35Z	MEMBER	It would be helpful to understand if there are also other uses of `dim`/`dims` that are inconsistent. Which is the most common pattern?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`dim` vs `dims` 1250939008
1163292851	https://github.com/pydata/xarray/issues/6704#issuecomment-1163292851	https://api.github.com/repos/pydata/xarray/issues/6704	IC_kwDOAMm_X85FVnCz	shoyer 1217238	2022-06-22T15:52:12Z	2022-06-22T15:52:12Z	MEMBER	Should we call it `rename_vars` or `rename_coords`? The later might make more sense, but then it wouldn't mirror `Dataset`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Future of `DataArray.rename` 1275752720
1150280375	https://github.com/pydata/xarray/issues/644#issuecomment-1150280375	https://api.github.com/repos/pydata/xarray/issues/644	IC_kwDOAMm_X85Ej-K3	shoyer 1217238	2022-06-08T18:56:17Z	2022-06-08T18:56:17Z	MEMBER	This might fit more naturally into interp() as a new method like "nearest-valid" rather than in sel(). The difference is that sel() only looks at indexes (and not the data) to select out a single value, whereas interp() can combine adjacent values in arbitrary ways.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature request: only allow nearest-neighbor .sel for valid data (not NaN positions) 114773593
1146873595	https://github.com/pydata/xarray/issues/6524#issuecomment-1146873595	https://api.github.com/repos/pydata/xarray/issues/6524	IC_kwDOAMm_X85EW-b7	shoyer 1217238	2022-06-05T19:54:47Z	2022-06-05T19:54:47Z	MEMBER	`error: "ndarray[Any, dtype[Any]]" has no attribute "rename"` Yes, it's worth discussing. I don't know if there will be a satisfying resolution, though.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	NumPy `__array_ufunc__` does not work with typing 1217425815
1137839614	https://github.com/pydata/xarray/issues/6633#issuecomment-1137839614	https://api.github.com/repos/pydata/xarray/issues/6633	IC_kwDOAMm_X85D0g3-	shoyer 1217238	2022-05-25T20:55:14Z	2022-05-25T20:55:14Z	MEMBER	Looking at this mur-sst dataset in particular, it stores time in chunks of size 5. That means fetching the 6443 time values requires 1288 separate HTTP requests -- no wonder it's so slow! If the time axis were instead stored in a single chunk of 51 KB, Xarray would only need 3 small size HTTP requests to load the lat, lon and time indexes, which would probably complete in a fraction of a second. That said, I agree that this would be nice to have in general.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Opening dataset without loading any indexes? 1247010680
1137754031	https://github.com/pydata/xarray/issues/6633#issuecomment-1137754031	https://api.github.com/repos/pydata/xarray/issues/6633	IC_kwDOAMm_X85D0L-v	shoyer 1217238	2022-05-25T19:12:40Z	2022-05-25T19:12:40Z	MEMBER	but another option (post explicit index refactor) might be an option for opening a dataset without creating indexes for 1D coordinates along dimensions. It might indeed be worth considering this case too in #6392. Maybe `indexes=None` (default) to create default indexes for 1D coordinates and `indexes={}` (empty dictionary) to explicitly skip creating indexes? +1 this syntax makes sense to me!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Opening dataset without loading any indexes? 1247010680
1137661171	https://github.com/pydata/xarray/pull/6475#issuecomment-1137661171	https://api.github.com/repos/pydata/xarray/issues/6475	IC_kwDOAMm_X85Dz1Tz	shoyer 1217238	2022-05-25T18:10:21Z	2022-05-25T18:10:21Z	MEMBER	One issue with relying only on `Array` and `Group` as currently implemented in Zarr-Python is that we can create array nodes outside of any group subfolder. e.g. one can currently create an Array directly at path 'array1' and this would put the chunks under `'data/root/array1/'`, and metadata at `'meta/root/array1.array.json'`. However, the root itself is not a `Group`. A group is basically a subfolder under root (e.g.' `open_group` with path = `group1` creates `'/meta/root/group1/'` folder and `'meta/root/group1.group.json'` metadata). There is no mechanism in the spec to open root directly as a `Group`! is there an issue on the Zarr side where this is currently being discussed? I opened up https://github.com/zarr-developers/zarr-python/issues/1039	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	implement Zarr v3 spec support 1200581329
1137572812	https://github.com/pydata/xarray/issues/6633#issuecomment-1137572812	https://api.github.com/repos/pydata/xarray/issues/6633	IC_kwDOAMm_X85DzfvM	shoyer 1217238	2022-05-25T17:10:04Z	2022-05-25T17:10:04Z	MEMBER	Early versions of Xarray used to have lazy loading of data for indexes, but we removed this for the sake of simplicity. In principle we could restore lazy indexes, but another option (post explicit index refactor) might be an option for opening a dataset without creating indexes for 1D coordinates along dimensions. Another way to solve this sort of challenges might be to load index data in parallel when using Dask. Right now I believe the data corresponding to indexes is always loaded eagerly, without using Dask. All that said -- Do you have a specific example where this has been problematic? In my experience it has been pretty reasonable to use xarray.Dataset objects for schema-like templates, even with index data needing to be loaded eagerly. Possibly another Zarr chunking scheme for your index data could be more efficient?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Opening dataset without loading any indexes? 1247010680
1126587818	https://github.com/pydata/xarray/issues/6607#issuecomment-1126587818	https://api.github.com/repos/pydata/xarray/issues/6607	IC_kwDOAMm_X85DJl2q	shoyer 1217238	2022-05-14T00:10:13Z	2022-05-14T00:10:13Z	MEMBER	We could raise an error asking the user to switch to `swap_dims`. This seems like a good idea In the long term, we like to decouple indexes from coordinate, and make something like the following work: `ds.set_coords(['lon']).rename(x='lon').set_index('lon')`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Coordinate promotion workaround broken 1235725650
1126255398	https://github.com/pydata/xarray/pull/5734#issuecomment-1126255398	https://api.github.com/repos/pydata/xarray/issues/5734	IC_kwDOAMm_X85DIUsm	shoyer 1217238	2022-05-13T16:51:24Z	2022-05-13T16:51:24Z	MEMBER	👍 this looks great to me!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Enable `flox` in `GroupBy` and `resample` 978356586
1124302215	https://github.com/pydata/xarray/pull/6566#issuecomment-1124302215	https://api.github.com/repos/pydata/xarray/issues/6566	IC_kwDOAMm_X85DA32H	shoyer 1217238	2022-05-11T21:15:36Z	2022-05-11T21:15:36Z	MEMBER	For whatever reason, Windows seems to be much stricter about requiring file handles to be explicitly closed. So my guess is that this could be solved by using `open_dataset()` as a context manager.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	New inline_array kwarg for open_dataset 1223270563
1116397246	https://github.com/pydata/xarray/issues/6517#issuecomment-1116397246	https://api.github.com/repos/pydata/xarray/issues/6517	IC_kwDOAMm_X85Cit6-	shoyer 1217238	2022-05-03T18:09:42Z	2022-05-03T18:09:42Z	MEMBER	I'm a little skeptical that it makes sense to add special case logic into Xarray in an attempt to keep NumPy's "OWNDATA" flag up to date. There are lots of places where we create views of data from existing arrays inside Xarray operations. There are definitely cases where Xarray's internal operations do memory copies followed by views, which would also result in datasets with misleading "OWNDATA" flags if you look only at resulting datasets, e.g., `DataArray.interp()` which definitely does internal memory copies: ``` y = xarray.DataArray([1, 2, 3], dims='x', coords={'x': [0, 1, 2]}) y.interp(x=0.5).data.flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False UPDATEIFCOPY : False ``` Overall, I just don't think this is a reliable way to trace memory allocation with NumPy. Maybe you could do better by also tracing back to source arrays with `.base`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Loading from NetCDF creates unnecessary numpy.ndarray-views that clears the OWNDATA-flag 1216517115
1114173984	https://github.com/pydata/xarray/issues/1621#issuecomment-1114173984	https://api.github.com/repos/pydata/xarray/issues/1621	IC_kwDOAMm_X85CaPIg	shoyer 1217238	2022-05-01T08:49:40Z	2022-05-01T08:49:40Z	MEMBER	Still relevant!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Undesired decoding to timedelta64 (was: units of "seconds" translated to time coordinate) 264321376
1111813044	https://github.com/pydata/xarray/issues/6524#issuecomment-1111813044	https://api.github.com/repos/pydata/xarray/issues/6524	IC_kwDOAMm_X85CROu0	shoyer 1217238	2022-04-28T06:52:04Z	2022-04-28T06:52:04Z	MEMBER	I think this would need to get updated on the NumPy side. Ideally NumPy ufuncs would be typed to check for `__array_ufunc__`. Something like: ```python from typing import Protocol, TypeVar class HasArrayUFunc(Protocol): def array_ufunc(ufunc, method, inputs, *kwargs): pass ArrayOrHasArrayUFunc = TypeVar("ArrayOrHasArrayUFunc", ndarray, HasArrayUFunc) def exp(x: ArrayOrHasArrayUFunc) -> ArrayOrHasArrayUFunc: ... ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	NumPy `__array_ufunc__` does not work with typing 1217425815
863427710	https://github.com/pydata/xarray/issues/2171#issuecomment-863427710	https://api.github.com/repos/pydata/xarray/issues/2171	MDEyOklzc3VlQ29tbWVudDg2MzQyNzcxMA==	shoyer 1217238	2021-06-17T17:30:17Z	2022-04-19T03:15:24Z	MEMBER	@gagebeni please open a new discussion for your issue: https://github.com/pydata/xarray/discussions	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Support alignment/broadcasting with unlabeled dimensions of size 1 325439138
1100953736	https://github.com/pydata/xarray/issues/4267#issuecomment-1100953736	https://api.github.com/repos/pydata/xarray/issues/4267	IC_kwDOAMm_X85BnziI	shoyer 1217238	2022-04-17T21:42:36Z	2022-04-17T21:42:36Z	MEMBER	This is still relevant	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CachingFileManager should not use __del__ 665488672
1099788049	https://github.com/pydata/xarray/pull/6476#issuecomment-1099788049	https://api.github.com/repos/pydata/xarray/issues/6476	IC_kwDOAMm_X85BjW8R	shoyer 1217238	2022-04-15T02:14:56Z	2022-04-15T02:14:56Z	MEMBER	I will take a look soon! On Thu, Apr 14, 2022 at 6:23 PM Maximilian Roos @.*> wrote: Hi @cisaacstern https://github.com/cisaacstern — thanks a lot and welcome to xarray! This looks very coherent, as far as the context I have. Any thoughts from others who know the area better? — Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/6476#issuecomment-1099769304, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJFVVACXN36KM4XREIBCLVFDAKVANCNFSM5TE2KQ4Q . You are receiving this because you were mentioned.Message ID: @.*>	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 }	Fix zarr append dtype checks 1200716594
1099309755	https://github.com/pydata/xarray/pull/6420#issuecomment-1099309755	https://api.github.com/repos/pydata/xarray/issues/6420	IC_kwDOAMm_X85BhiK7	shoyer 1217238	2022-04-14T15:36:14Z	2022-04-14T15:36:14Z	MEMBER	Thanks @malmans2 !	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Add support in the "zarr" backend for reading NCZarr data 1183534905
1099307673	https://github.com/pydata/xarray/pull/6475#issuecomment-1099307673	https://api.github.com/repos/pydata/xarray/issues/6475	IC_kwDOAMm_X85BhhqZ	shoyer 1217238	2022-04-14T15:33:54Z	2022-04-14T15:33:54Z	MEMBER	One issue with relying only on `Array` and `Group` as currently implemented in Zarr-Python is that we can create array nodes outside of any group subfolder. e.g. one can currently create an Array directly at path 'array1' and this would put the chunks under `'data/root/array1/'`, and metadata at `'meta/root/array1.array.json'`. However, the root itself is not a `Group`. A group is basically a subfolder under root (e.g.' `open_group` with path = `group1` creates `'/meta/root/group1/'` folder and `'meta/root/group1.group.json'` metadata). There is no mechanism in the spec to open root directly as a `Group`! is there an issue on the Zarr side where this is currently being discussed?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	implement Zarr v3 spec support 1200581329
1098229361	https://github.com/pydata/xarray/pull/6475#issuecomment-1098229361	https://api.github.com/repos/pydata/xarray/issues/6475	IC_kwDOAMm_X85BdaZx	shoyer 1217238	2022-04-13T16:04:23Z	2022-04-13T16:04:23Z	MEMBER	The v3 spec requires a path be specified when calling `open_group` or `open_consolidated`. This PR currently just sets a default group name of `'xarray'` if one is not specified via the `group` kwarg to `ZarrStore.open_group`. I think that is convenient, but one could instead be stricter and raise an error in this case. Does Zarr v3 have a notion of a "root" group? That feels like a more sensible default to me, both for Xarray and Zarr-Python If a string corresponding to a filesystem path or URL is used for `store`, then it is not possible to infer which version of the zarr spec is desired. In this case, the user must specify `zarr_version` to choose the zarr protocol version. The default of `zarr_version=None` will infer the version from a zarr `BaseStore` subclass when possible, otherwise defaulting to `zarr_version=2` for backwards compatibility. This sounds fine for now, but I am concerned that it will slow the adoption of Zarr v3. Eventually, we would presumably want to change the default to version 3, but this is difficult to do if it entirely breaks backwards compatibility. My preference would be for the default behavior to try opening Zarr v2, and fall back to opening in v3 mode, even if this requires attempting to open a file from the store. This is similar to how Xarray handles other Zarr versioning issues (e.g., for consolidated metadata). Perhaps Zarr-Python could raise an informative error that we could catch if the Zarr version is incorrect, or even handle this behavior itself?	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	implement Zarr v3 spec support 1200581329
1094123521	https://github.com/pydata/xarray/pull/6420#issuecomment-1094123521	https://api.github.com/repos/pydata/xarray/issues/6420	IC_kwDOAMm_X85BNwAB	shoyer 1217238	2022-04-09T21:00:04Z	2022-04-09T21:00:04Z	MEMBER	Could you also add brief updates to mention NCZarr support in the docstring for `open_zarr` and the user guide? In particular this paragraph should be updated: Xarray can’t open just any zarr dataset, because xarray requires special metadata (attributes) describing the dataset dimensions and coordinates. At this time, xarray can only open zarr datasets that have been written by xarray. For implementation details, see Zarr Encoding Specification.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Add support in the "zarr" backend for reading NCZarr data 1183534905
1090499559	https://github.com/pydata/xarray/issues/6374#issuecomment-1090499559	https://api.github.com/repos/pydata/xarray/issues/6374	IC_kwDOAMm_X85A_7Pn	shoyer 1217238	2022-04-06T17:04:26Z	2022-04-06T17:04:26Z	MEMBER	As it is currently it is also not possible to write a zarr which follows the GDAL ZARR driver conventions. Writing the _CRS attribute also results in a TypeError: Can you elaborate? What API are you using to do the write: python, netcdf-c, or what? This error message comes from Xarray and can be triggered by calling `to_zarr()`: https://github.com/pydata/xarray/blob/facafac359c39c3e940391a3829869b4a3df5d70/xarray/backends/api.py#L162 I don't think netCDF-C needs to be involved at all, which is why I suggested opening a separate issue.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Should the zarr backend support NCZarr conventions? 1172229856
1090464275	https://github.com/pydata/xarray/issues/6374#issuecomment-1090464275	https://api.github.com/repos/pydata/xarray/issues/6374	IC_kwDOAMm_X85A_yoT	shoyer 1217238	2022-04-06T16:25:40Z	2022-04-06T16:25:40Z	MEMBER	@wankoelias could you kindly open a new issue for writing GDAL ZARR?	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Should the zarr backend support NCZarr conventions? 1172229856
1078695337	https://github.com/pydata/xarray/issues/2233#issuecomment-1078695337	https://api.github.com/repos/pydata/xarray/issues/2233	IC_kwDOAMm_X85AS5Wp	shoyer 1217238	2022-03-25T06:20:10Z	2022-03-25T06:20:10Z	MEMBER	This is the second follow-up item in https://github.com/pydata/xarray/issues/6293 I think we could definitely experiment with relaxing this constraint now, although ideally we would continue to check off auditing all of the methods in that long list first.	{ "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Problem opening unstructured grid ocean forecasts with 4D vertical coordinates 332471780
1077253534	https://github.com/pydata/xarray/issues/6408#issuecomment-1077253534	https://api.github.com/repos/pydata/xarray/issues/6408	IC_kwDOAMm_X85ANZWe	shoyer 1217238	2022-03-24T05:53:56Z	2022-03-24T05:53:56Z	MEMBER	I think this is probably fine without a deprecation cycle. This is a very easy fix for users.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	backwards incompatible changes in reductions 1178949620
1076796582	https://github.com/pydata/xarray/issues/6374#issuecomment-1076796582	https://api.github.com/repos/pydata/xarray/issues/6374	IC_kwDOAMm_X85ALpym	shoyer 1217238	2022-03-23T20:38:12Z	2022-03-23T20:38:12Z	MEMBER	@DennisHeimbigner I think it would be great to standardize NCZarr as a super-set of the "Xarray-Zarr" standard! I think Xarray should indeed be able to read such files. If you want to read a sub-group, you can read the sub-group in a separate call to `xarray.open_zarr()`. @rabernat I would not be opposed to adding support inside Xarray for reading NCZarr data, specifically to understand NCZarr's encoding of dimension names when using Zarr-Python. This wouldn't give 100% compatibility with NCZarr, but it would be very close (maybe just with incorrect dtypes for attributes) with a minimal amount of work. I don't think it would be a big deal to look for `.nczvar` files.	{ "total_count": 3, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 2, "rocket": 0, "eyes": 0 }	Should the zarr backend support NCZarr conventions? 1172229856
1071104882	https://github.com/pydata/xarray/pull/5692#issuecomment-1071104882	https://api.github.com/repos/pydata/xarray/issues/5692	IC_kwDOAMm_X84_18Ny	shoyer 1217238	2022-03-17T17:12:07Z	2022-03-17T17:12:07Z	MEMBER	OK, in it goes! Big thanks to @benbovy for seeing this through :)	{ "total_count": 24, "+1": 0, "-1": 0, "laugh": 0, "hooray": 13, "confused": 0, "heart": 1, "rocket": 10, "eyes": 0 }	Explicit indexes 966983801
1069344000	https://github.com/pydata/xarray/pull/5692#issuecomment-1069344000	https://api.github.com/repos/pydata/xarray/issues/5692	IC_kwDOAMm_X84_vOUA	shoyer 1217238	2022-03-16T16:47:45Z	2022-03-16T16:47:45Z	MEMBER	OK, I think we’re good to go here?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes 966983801
1065381346	https://github.com/pydata/xarray/issues/6345#issuecomment-1065381346	https://api.github.com/repos/pydata/xarray/issues/6345	IC_kwDOAMm_X84_gG3i	shoyer 1217238	2022-03-11T18:38:42Z	2022-03-11T18:38:42Z	MEMBER	The data type restriction here seems to date back to the original PR adding support for appending. I turned up this comment that seems to summarize the motivation for this check: https://github.com/pydata/xarray/pull/2706#issuecomment-502481584 I think the original issue was that appending a fixed-width string could be a problem if the fixed-width does not match the width of the existing string dtype stored in Zarr. This obviously doesn't apply in this case, because you are adding an entirely new variable. So I guess the check could be removed in that case.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`to_zarr` raises `ValueError: Invalid dtype` with `mode='a'` (but not with `mode='w'`) 1164454058
1062211273	https://github.com/pydata/xarray/issues/1613#issuecomment-1062211273	https://api.github.com/repos/pydata/xarray/issues/1613	IC_kwDOAMm_X84_UA7J	shoyer 1217238	2022-03-08T21:09:05Z	2022-03-08T21:09:05Z	MEMBER	Another challenge with changing the meaning of `slice` is handling partial slices, e.g., what does `slice(500, None)` mean? With a monotonic decreasing index, that would select values below 500, but ignoring underlying coordinate order it would presumably mean selecting values above 500. I think the separate new API (e.g., `xarray.Between` or `.sel_between()`) is probably a better idea.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Should sel with slice objects care about underlying coordinate order? 263403430
1059662347	https://github.com/pydata/xarray/pull/5692#issuecomment-1059662347	https://api.github.com/repos/pydata/xarray/issues/5692	IC_kwDOAMm_X84_KSoL	shoyer 1217238	2022-03-05T03:05:36Z	2022-03-05T03:05:36Z	MEMBER	I would like to merge this PR very soon so it can get testing before the next release. If anyone has any remaining concerns, please speak up!	{ "total_count": 5, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Explicit indexes 966983801
1059546596	https://github.com/pydata/xarray/issues/1460#issuecomment-1059546596	https://api.github.com/repos/pydata/xarray/issues/1460	IC_kwDOAMm_X84_J2Xk	shoyer 1217238	2022-03-04T21:31:41Z	2022-03-04T21:31:41Z	MEMBER	Well, even if we keep `squeeze` as an option, I think `squeeze=False` would be much more consistent default behavior :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	groupby should still squeeze for non-monotonic inputs 237008177
1058366320	https://github.com/pydata/xarray/issues/1613#issuecomment-1058366320	https://api.github.com/repos/pydata/xarray/issues/1613	IC_kwDOAMm_X84_FWNw	shoyer 1217238	2022-03-03T18:39:59Z	2022-03-03T18:39:59Z	MEMBER	One complication with using `sel()` with `slice` objects is that you can do selection over non-monotonic indexes, merely based on matching bounds: ``` data = xarray.DataArray([1, 2, 3, 4, 5], dims=['x'], coords=[[5, 1, 4, 3, 2]]) data <xarray.DataArray (x: 5)> array([1, 2, 3, 4, 5]) Coordinates: * x (x) int64 5 1 4 3 2 data.sel(x=slice(1, 3))) <xarray.DataArray (x: 3)> array([2, 3, 4]) Coordinates: * x (x) int64 1 4 3 ``` If we change the semantics of `slice` in `sel()` to do filtering rather than be concerned about order (which does seem much less useful), we should probably deprecate the handling of non-monotonic ascending or descending indexes. Alternatively, we could either do the dedicated indexing object like `xarray.Between(lower, upper)` or have a dedicated method for selecting between values, e.g., perhaps `data.sel_between(x=(1, 3))` or `data.sel_bounds(x=(1, 3))`.	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 }	Should sel with slice objects care about underlying coordinate order? 263403430
1058293194	https://github.com/pydata/xarray/issues/1613#issuecomment-1058293194	https://api.github.com/repos/pydata/xarray/issues/1613	IC_kwDOAMm_X84_FEXK	shoyer 1217238	2022-03-03T17:23:09Z	2022-03-03T17:23:09Z	MEMBER	This is probably worth fixing if possible in a straightforward way. I don't think anyone is well served by matching the behavior of Python list indexing here -- it's a strange edge that case that indexing a list like `x[5:0]` returns an empty list.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Should sel with slice objects care about underlying coordinate order? 263403430
1057657161	https://github.com/pydata/xarray/issues/6176#issuecomment-1057657161	https://api.github.com/repos/pydata/xarray/issues/6176	IC_kwDOAMm_X84_CpFJ	shoyer 1217238	2022-03-03T04:32:10Z	2022-03-03T04:32:10Z	MEMBER	Breaking changes will continue to be very rare, and whenever possible will be preceeded by deprecation or future warnings for multiple months.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Xarray versioning to switch to CalVer 1108564253
1051297372	https://github.com/pydata/xarray/issues/6304#issuecomment-1051297372	https://api.github.com/repos/pydata/xarray/issues/6304	IC_kwDOAMm_X84-qYZc	shoyer 1217238	2022-02-25T21:50:15Z	2022-02-25T21:50:15Z	MEMBER	Adding a `join` argument sounds good to me. I do not remember why the default is an outer join.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	add join argument to xr.broadcast? 1150251120
1042660100	https://github.com/pydata/xarray/issues/4118#issuecomment-1042660100	https://api.github.com/repos/pydata/xarray/issues/4118	IC_kwDOAMm_X84-JbsE	shoyer 1217238	2022-02-17T07:45:24Z	2022-02-17T07:45:24Z	MEMBER	One thing that came up in our discussion about this in the developer meeting today is that we could also pretty easily expose a "low level" API for IO using dictionaries of xarray.Variable objects. This intermediate representation could be useful for cleaning up data into a form suitable for conversion into Dataset objects. On Wed, Feb 16, 2022 at 11:39 PM Alessandro Amici @.*> wrote: @TomNicholas https://github.com/TomNicholas (cc @mraspaud https://github.com/mraspaud) Do you have use cases which one of these designs could handle but the other couldn't? The two main classes of on-disk formats that, I know of, which cannot be always represented in the "group is a Dataset" approach are: in netCDF following the CF conventions for groups https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#groups, it is legal for an array to refer to a dimension or a coordinate in a different group and so arrays in the same group may have dimensions with the same name, but different size / coordinate values, the current spec for the Next-generation file formats (NGFF) https://ngff.openmicroscopy.org for bio-imaging has all scales of the same 5D data in the same group. I don't have an example at hand, but my impression is that satellite products that use HDF5 file format also place arrays with inconsistent dimensions / coordinates in the same group. — Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/4118#issuecomment-1042656377, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJFVT27QD4RQDYZ2N4W7TU3SQ3BANCNFSM4NQEIKFQ . You are receiving this because you were mentioned.Message ID: @.*>	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature Request: Hierarchical storage and processing in xarray 628719058
1035611864	https://github.com/pydata/xarray/issues/2186#issuecomment-1035611864	https://api.github.com/repos/pydata/xarray/issues/2186	IC_kwDOAMm_X849ui7Y	shoyer 1217238	2022-02-10T22:49:40Z	2022-02-10T22:50:01Z	MEMBER	For what it's wroth, the recommended way to do this is to explicitly close the Dataset with `ds.close()` rather than using `del ds`. Or with a context manager, e.g., `python for num in range(100): with xr.open_dataset('data.{}.nc'.format(num)) as ds: # do some stuff, but NOT assigning any data in ds to new variables ...`	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Memory leak while looping through a Dataset 326533369
1034196986	https://github.com/pydata/xarray/issues/6069#issuecomment-1034196986	https://api.github.com/repos/pydata/xarray/issues/6069	IC_kwDOAMm_X849pJf6	shoyer 1217238	2022-02-09T21:12:31Z	2022-02-09T21:12:31Z	MEMBER	The reason why this isn't allowed is because it's ambiguous what to do with the other variables that are not restricted to the region (['cell', 'face', 'layer', 'max_cell_node', 'max_face_nodes', 'node', 'siglay'] in this case). I can imagine quite a few different ways this behavior could be implemented: Ignore these variables entirely. Ignore variables if they also already exist, but write new ones. Write or overwrite both new and existing these variables. Write new variables. Ignore existing variables only if they already exist with the same values, and if not, raise an error. I believe your proposal here (removing these checks from `_validate_region`) would achieve (3), but I'm not sure that's the best option. (4) seems like perhaps the most user-friendly option, but checking existing variables can add significant overhead. When experimenting adding `region` support Xarray-Beam, I found many cases where it was easy to inadvertently make large parallel pipelines much slower by downloaded existing variables. The current solution is not to do any of these, and to force the user to make an explicit choice by dropping new variables, or write them in a separate call to `to_zarr`. I think it would also be OK to let a user explicitly opt-in to one of these behaviors, but I don't think guessing what the user wants would be ideal.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	to_zarr: region not recognised as dataset dimensions 1077079208
1032051447	https://github.com/pydata/xarray/issues/6230#issuecomment-1032051447	https://api.github.com/repos/pydata/xarray/issues/6230	IC_kwDOAMm_X849g9r3	shoyer 1217238	2022-02-07T23:40:48Z	2022-02-07T23:40:48Z	MEMBER	In the long term (cc @benbovy) I think we would ideally split `IndexVariable` into two classes: `FrozenVariable` which is just an immutable `Variable`, and thus that can be safely used for coordinates that have indexes. `PandasIndexArray` which wraps `pandas.Index` objects to satisfy the `np.ndarray` interface. This is the object which could allow `duck_array_ops.isin` to use the `pandas.Index.isin` method.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	[PERFORMANCE]: `isin` on `CFTimeIndex`-backed `Coordinate` slow 1120583442
1031811347	https://github.com/pydata/xarray/issues/6230#issuecomment-1031811347	https://api.github.com/repos/pydata/xarray/issues/6230	IC_kwDOAMm_X849gDET	shoyer 1217238	2022-02-07T19:01:54Z	2022-02-07T19:01:54Z	MEMBER	Oh, I guess the challenge is that `apply_ufunc` operates on arrays, not indexes. I'm not entirely sure how to deal with this easily....	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	[PERFORMANCE]: `isin` on `CFTimeIndex`-backed `Coordinate` slow 1120583442
1031810590	https://github.com/pydata/xarray/issues/6230#issuecomment-1031810590	https://api.github.com/repos/pydata/xarray/issues/6230	IC_kwDOAMm_X849gC4e	shoyer 1217238	2022-02-07T19:01:08Z	2022-02-07T19:01:08Z	MEMBER	Yes, I think replacing this with something like `lambda x, y: x.isin(y) if isinstance(x, pd.Index) else np.isin(x, y)` could work	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	[PERFORMANCE]: `isin` on `CFTimeIndex`-backed `Coordinate` slow 1120583442
1028136906	https://github.com/pydata/xarray/issues/6174#issuecomment-1028136906	https://api.github.com/repos/pydata/xarray/issues/6174	IC_kwDOAMm_X849SB_K	shoyer 1217238	2022-02-02T16:46:24Z	2022-02-02T17:20:50Z	MEMBER	Have you seen `xarray.save_mfdataset`? In principle, it was designed for exactly this sort of thing.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	[FEATURE]: Read from/write to several NetCDF4 groups with a single file open/close operation 1108138101
1020635094	https://github.com/pydata/xarray/pull/6187#issuecomment-1020635094	https://api.github.com/repos/pydata/xarray/issues/6187	IC_kwDOAMm_X8481afW	shoyer 1217238	2022-01-24T23:01:14Z	2022-01-24T23:01:14Z	MEMBER	Let me ponder the linked issue. This was not an intentional feature for `compute=False`, so I'd like to be sure we can be committed to supporting it before we document it :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	to_netcdf: docstrings for compute parameter 1112365912
1011450955	https://github.com/pydata/xarray/issues/6084#issuecomment-1011450955	https://api.github.com/repos/pydata/xarray/issues/6084	IC_kwDOAMm_X848SYRL	shoyer 1217238	2022-01-12T21:05:59Z	2022-01-12T21:05:59Z	MEMBER	E.g., I think skipping this line would save some of the users in my original post a lot of time. I don't think that line adds any measurable overhead. It's just telling dask to delay computation of a single function. For sure this would be worth elaborating on in the Xarray docs! I wrote a little bit about this in the docs for Xarray-Beam: see "One recommended pattern" in https://xarray-beam.readthedocs.io/en/latest/read-write.html#writing-data-to-zarr	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Initialise zarr metadata without computing dask graph 1083621690

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);