home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

88 rows where author_association = "CONTRIBUTOR" and user = 1828519 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

issue >30

  • Add CRS/projection information to xarray objects 19
  • New deep copy behavior in 2022.9.0 causes maximum recursion error 11
  • Multidimensional dask coordinates unexpectedly computed 5
  • Proposal: Update rasterio backend to store CRS/nodata information in standard locations. 4
  • Fix CRS being WKT instead of PROJ.4 4
  • Idea: functionally-derived non-dimensional coordinates 4
  • Opening fsspec s3 file twice results in invalid start byte 4
  • Fix 'to_masked_array' computing dask arrays twice 3
  • DataArray.unstack taking unreasonable amounts of memory 2
  • Drop support for Python 3.4 2
  • Inconsistent type conversion when doing numpy.sum gvies different results 2
  • Anyone working on a to_tiff? Alternatively, how do you write an xarray to a geotiff? 2
  • Let's list all the netCDF files that xarray can't open 2
  • Confusing handling of NetCDF coordinates 2
  • ImplicitToExplicitIndexingAdapter being returned with dask unstable version 2
  • DeprecationWarning regarding use of distutils Version classes 2
  • Support **kwargs form in `.chunk()` 2
  • Unstable pandas causes CF datetime64 issues 2
  • The new NON_NANOSECOND_WARNING is not very nice to end users 2
  • Confusing error message when attribute not equal during concat 1
  • Does interp() work on curvilinear grids (2D coordinates) ? 1
  • Add python_requires to setup.py 1
  • [discussion] Use WKT or PROJ.4 string for CRS representation? 1
  • Segmentation fault reading many groups from many files 1
  • Accessors are recreated on every access 1
  • Allow weakref 1
  • Xarray operations produce read-only array 1
  • Threading Lock issue with to_netcdf and Dask arrays 1
  • Numeric scalar variable attributes (including fill_value, scale_factor, add_offset) are 1-d instead of 0-d with h5netcdf engine, triggering ValueError: non-broadcastable output on application when loading single elements 1
  • Flexible indexes refactoring notes 1
  • …

user 1

  • djhoese · 88 ✖

author_association 1

  • CONTRIBUTOR · 88 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1539073371 https://github.com/pydata/xarray/issues/7237#issuecomment-1539073371 https://api.github.com/repos/pydata/xarray/issues/7237 IC_kwDOAMm_X85bvGVb djhoese 1828519 2023-05-08T21:23:59Z 2023-05-08T21:23:59Z CONTRIBUTOR

And with new pandas (which I understand as being the thing/library that is changing) and new xarray, what will happen? What happens between nano and non-nano times?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  The new NON_NANOSECOND_WARNING is not very nice to end users 1428549868
1538397945 https://github.com/pydata/xarray/issues/7237#issuecomment-1538397945 https://api.github.com/repos/pydata/xarray/issues/7237 IC_kwDOAMm_X85bshb5 djhoese 1828519 2023-05-08T13:53:19Z 2023-05-08T13:53:19Z CONTRIBUTOR

Sorry for dragging this issue up again, but even with the new warning message I still have some questions. Do I have to switch to nanosecond precision times or will xarray/pandas/numpy just figure it out when I combine/compare times with different precisions?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  The new NON_NANOSECOND_WARNING is not very nice to end users 1428549868
1482105886 https://github.com/pydata/xarray/pull/7551#issuecomment-1482105886 https://api.github.com/repos/pydata/xarray/issues/7551 IC_kwDOAMm_X85YVyQe djhoese 1828519 2023-03-24T00:59:35Z 2023-03-24T00:59:35Z CONTRIBUTOR

Just curious, what is the status of this PR? We have some conditions in our library's tests based on the version of xarray on what compression arguments to pass to xarray. We had hoped this PR would be included in 2023.3.0.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for the new compression arguments. 1596511582
1288100778 https://github.com/pydata/xarray/issues/7197#issuecomment-1288100778 https://api.github.com/repos/pydata/xarray/issues/7197 IC_kwDOAMm_X85Mxtuq djhoese 1828519 2022-10-23T12:23:07Z 2022-10-23T12:23:07Z CONTRIBUTOR

Ugh how did I miss that issue. Thanks. I'm fine with closing this since the existing tests have caught it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unstable pandas causes CF datetime64 issues 1419602897
1287986325 https://github.com/pydata/xarray/issues/7197#issuecomment-1287986325 https://api.github.com/repos/pydata/xarray/issues/7197 IC_kwDOAMm_X85MxRyV djhoese 1828519 2022-10-23T02:41:14Z 2022-10-23T02:41:14Z CONTRIBUTOR

Ah turns out doing to_netcdf on that DataArray is good enough:

a.to_netcdf("/tmp/mytest.nc")

to fail:

``` File ~/miniconda3/envs/satpy_py39_unstable/lib/python3.9/site-packages/xarray/core/dataarray.py:3752, in DataArray.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 3748 else: 3749 # No problems with the name - so we're fine! 3750 dataset = self.to_dataset() -> 3752 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 3753 dataset, 3754 path, 3755 mode=mode, 3756 format=format, 3757 group=group, 3758 engine=engine, 3759 encoding=encoding, 3760 unlimited_dims=unlimited_dims, 3761 compute=compute, 3762 multifile=False, 3763 invalid_netcdf=invalid_netcdf, 3764 ) File ~/miniconda3/envs/satpy_py39_unstable/lib/python3.9/site-packages/xarray/backends/api.py:1230, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1225 # TODO: figure out how to refactor this logic (here and in save_mfdataset) 1226 # to avoid this mess of conditionals 1227 try: 1228 # TODO: allow this work (setting up the file for writing array data) 1229 # to be parallelized with dask -> 1230 dump_to_store( 1231 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1232 ) 1233 if autoclose: 1234 store.close() File ~/miniconda3/envs/satpy_py39_unstable/lib/python3.9/site-packages/xarray/backends/api.py:1277, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1274 if encoder: 1275 variables, attrs = encoder(variables, attrs) -> 1277 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File ~/miniconda3/envs/satpy_py39_unstable/lib/python3.9/site-packages/xarray/backends/common.py:266, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 263 if writer is None: 264 writer = ArrayWriter() --> 266 variables, attributes = self.encode(variables, attributes) 268 self.set_attributes(attributes) 269 self.set_dimensions(variables, unlimited_dims=unlimited_dims) File ~/miniconda3/envs/satpy_py39_unstable/lib/python3.9/site-packages/xarray/backends/common.py:355, in WritableCFDataStore.encode(self, variables, attributes) 352 def encode(self, variables, attributes): 353 # All NetCDF files get CF encoded by default, without this attempting 354 # to write times, for example, would fail. --> 355 variables, attributes = cf_encoder(variables, attributes) 356 variables = {k: self.encode_variable(v) for k, v in variables.items()} 357 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()} File ~/miniconda3/envs/satpy_py39_unstable/lib/python3.9/site-packages/xarray/conventions.py:868, in cf_encoder(variables, attributes) 865 # add encoding for time bounds variables if present. 866 _update_bounds_encoding(variables) --> 868 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} 870 # Remove attrs from bounds variables (issue #2921) 871 for var in new_vars.values(): File ~/miniconda3/envs/satpy_py39_unstable/lib/python3.9/site-packages/xarray/conventions.py:868, in <dictcomp>(.0) 865 # add encoding for time bounds variables if present. 866 _update_bounds_encoding(variables) --> 868 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} 870 # Remove attrs from bounds variables (issue #2921) 871 for var in new_vars.values(): File ~/miniconda3/envs/satpy_py39_unstable/lib/python3.9/site-packages/xarray/conventions.py:273, in encode_cf_variable(var, needs_copy, name) 264 ensure_not_multiindex(var, name=name) 266 for coder in [ 267 times.CFDatetimeCoder(), 268 times.CFTimedeltaCoder(), (...) 271 variables.UnsignedIntegerCoder(), 272 ]: --> 273 var = coder.encode(var, name=name) 275 # TODO(shoyer): convert all of these to use coders, too: 276 var = maybe_encode_nonstring_dtype(var, name=name) File ~/miniconda3/envs/satpy_py39_unstable/lib/python3.9/site-packages/xarray/coding/times.py:676, in CFDatetimeCoder.encode(self, variable, name) 672 dims, data, attrs, encoding = unpack_for_encoding(variable) 673 if np.issubdtype(data.dtype, np.datetime64) or contains_cftime_datetimes( 674 variable 675 ): --> 676 (data, units, calendar) = encode_cf_datetime( 677 data, encoding.pop("units", None), encoding.pop("calendar", None) 678 ) 679 safe_setitem(attrs, "units", units, name=name) 680 safe_setitem(attrs, "calendar", calendar, name=name) File ~/miniconda3/envs/satpy_py39_unstable/lib/python3.9/site-packages/xarray/coding/times.py:612, in encode_cf_datetime(dates, units, calendar) 609 dates = np.asarray(dates) 611 if units is None: --> 612 units = infer_datetime_units(dates) 613 else: 614 units = _cleanup_netcdf_time_units(units) File ~/miniconda3/envs/satpy_py39_unstable/lib/python3.9/site-packages/xarray/coding/times.py:394, in infer_datetime_units(dates) 392 print("Formatting datetime object") 393 reference_date = dates[0] if len(dates) > 0 else "1970-01-01" --> 394 reference_date = format_cftime_datetime(reference_date) 395 unique_timedeltas = np.unique(np.diff(dates)) 396 units = _infer_time_units_from_diff(unique_timedeltas) File ~/miniconda3/envs/satpy_py39_unstable/lib/python3.9/site-packages/xarray/coding/times.py:405, in format_cftime_datetime(date) 400 def format_cftime_datetime(date): 401 """Converts a cftime.datetime object to a string with the format: 402 YYYY-MM-DD HH:MM:SS.UUUUUU 403 """ 404 return "{:04d}-{:02d}-{:02d} {:02d}:{:02d}:{:02d}.{:06d}".format( --> 405 date.year, 406 date.month, 407 date.day, 408 date.hour, 409 date.minute, 410 date.second, 411 date.microsecond, 412 ) AttributeError: 'numpy.datetime64' object has no attribute 'year' ```

NOTE: Just because this clearly fails the check in this line:

https://github.com/pydata/xarray/blob/6cb97f645475bddf2f3b1e1a5f24f0f9de690683/xarray/coding/times.py#L384

It does not seem to be the only problem from my tests. Using datetime64 objects in various parts of a DataArray seem to cause different errors...I think. I need to do more tests.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unstable pandas causes CF datetime64 issues 1419602897
1267531839 https://github.com/pydata/xarray/issues/7111#issuecomment-1267531839 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LjQA_ djhoese 1828519 2022-10-04T20:20:19Z 2022-10-04T20:20:19Z CONTRIBUTOR

We talked about this today in our pytroll/satpy meeting. We're not sure we agree with cf-xarray putting ancillary variables as coordinates or that it will work for us, so we think we could eventually remove any "automatic" ancillary variable loading and require that the user explicitly request any ancillary variables they want from Satpy's readers.

That said, this will take a lot of work to change. Since it seems like #7112 fixes a majority of our issues I'm hoping that that can still be merged. I'd hope that the memo logic when deepcopying will still protect against other recursive objects (possibly optimize?) even if they can't be directly serialized to NetCDF.

Side note: I feel like there is a difference between the NetCDF model and serializing/saving to a NetCDF file.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1266889779 https://github.com/pydata/xarray/issues/7111#issuecomment-1266889779 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LgzQz djhoese 1828519 2022-10-04T12:07:12Z 2022-10-04T12:07:37Z CONTRIBUTOR

@mraspaud See the cf-xarray link from Deepak. We could make them coordinates. Or we could reference them by name:

ds = xr.open_dataset(...) anc_name = ds["my_var"].attrs["ancillary_variables"][0] anc_var = ds[anc_name]

Edit: Let's talk more in the pytroll meeting today.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1265910157 https://github.com/pydata/xarray/issues/7111#issuecomment-1265910157 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LdEGN djhoese 1828519 2022-10-03T19:13:58Z 2022-10-03T19:13:58Z CONTRIBUTOR

@dcherian Thanks for the feedback. When these decisions were made in Satpy xarray was not able to contain dask arrays as coordinates and we depend heavily on dask for our use cases. Putting some of these datasets as coordinates as cf xarray does may have caused extra unnecessary loading/computation. I'm not sure that would be the case with modern xarray.

Note that ancillary_variables are not the only case of "embedded" DataArrays in our code. We also needed something for CRS + bounds or other geolocation information. As you know I'm very much interested in CRS and geolocation handling in xarray, but for backwards compatibility we also have pyresample AreaDefinition and SwathDefinition objects in our DataArray .attrs["area"] attributes. A SwathDefinition is able to contain two DataArray objects for longitude and latitude. These also get copied with this new deep copy behavior.

We have a monthly Pytroll/Satpy meeting tomorrow so if you have any other suggestions or points for or against our usage please comment here and we'll see what we can do.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1265792923 https://github.com/pydata/xarray/issues/7111#issuecomment-1265792923 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85Lcneb djhoese 1828519 2022-10-03T17:27:55Z 2022-10-03T17:27:55Z CONTRIBUTOR

@TomNicholas Do you mean the "name" of the sub-DataArray? Or the numpy/dask array of the sub-DataArray?

This is what I was trying to describe in https://github.com/pydata/xarray/issues/7111#issuecomment-1264386173. In Satpy we have our own Dataset-like/DataTree-like object where the user explicitly says "I want to load X from input files". As a convenience we put any ancillary variables (ex. data quality flags) in the DataArray .attrs for easier access. In Satpy there is no other direct connection between one DataArray and another. They are overall independent objects on a processing level so there may not be access to this higher-level Dataset-like container object in order to get ancillary variables by name.

@mraspaud was one of the original people who proposed our current design so maybe he can provide more context.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264515251 https://github.com/pydata/xarray/issues/7111#issuecomment-1264515251 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LXviz djhoese 1828519 2022-10-02T00:25:36Z 2022-10-02T00:41:00Z CONTRIBUTOR

Sorry, false alarm. I was running with an old environment. With this new PR it seems the ancillary_variables tests that were failing now pass, but the dask .copy() related ones still fail...which is expected so I'm ok with that.

Edit: I hacked variable.py so it had this:

if deep: if is_duck_dask_array(ndata): ndata = ndata else: ndata = copy.deepcopy(ndata, memo)

and that fixed a lot of my dask related tests, but also seems to have introduced two new failures from what I can tell. So :man_shrugging:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264437857 https://github.com/pydata/xarray/issues/7111#issuecomment-1264437857 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LXcph djhoese 1828519 2022-10-01T18:01:16Z 2022-10-01T18:01:16Z CONTRIBUTOR

It looks like that PR fixes all of my Satpy unit tests. I'm not sure how that is possible if it doesn't also change when dask arrays are copied.

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 2,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264388047 https://github.com/pydata/xarray/issues/7111#issuecomment-1264388047 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LXQfP djhoese 1828519 2022-10-01T14:56:45Z 2022-10-01T14:56:45Z CONTRIBUTOR

Also note the other important change in this new behavior which is that dask arrays are now copied (.copy()) when they weren't before. This is causing some equality issues for us in Satpy, but I agree with the change on xarray's side (xarray should be able to call .copy() on whatever array it has.

https://github.com/dask/dask/issues/9533

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1264386173 https://github.com/pydata/xarray/issues/7111#issuecomment-1264386173 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LXQB9 djhoese 1828519 2022-10-01T14:47:51Z 2022-10-01T14:47:51Z CONTRIBUTOR

I'm a little torn on this. Obviously I'm not an xarray maintainer so I'm not the one who would have to maintain it or answer support questions about it. We actually had the user-side of this discussion in the Satpy library group a while ago which is leading to this whole problem for us now. In Satpy we don't typically use or deal with xarray Datasets (the new DataTree library is likely what we'll move to) so when we have relationships between DataArrays we'll use something like ancillary variables to connect them. For example, a data quality flag that is used by the other variables in a file. Our users don't usually care about the DQF but we don't want to stop them from being able to easily access it. I was never a huge fan of putting a DataArray in the attrs of another DataArray, but nothing seemed to disallow it so I ultimately lost that argument.

So on one hand I agree it seems like there shouldn't be a need in most cases to have a DataArray inside a DataArray, especially a circular dependency. On the other hand, I'm not looking forward to the updates I'll need to make to Satpy to fix this. Note, we don't do this everywhere in Satpy, just something we use for a few formats we read.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1263967009 https://github.com/pydata/xarray/issues/7111#issuecomment-1263967009 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LVpsh djhoese 1828519 2022-09-30T19:58:11Z 2022-09-30T19:58:11Z CONTRIBUTOR

I'd have to check, but this structure I think was originally produce by xarray reading a CF compliant NetCDF file. That is my memory at least. It could be that our library (satpy) is doing this as a convenience, replacing the name of an ancillary variable with the DataArray of that ancillary variable.

My other new issue seems to be related to .copy() doing a .copy() on dask arrays which then makes them not equivalent anymore. Working on an MVCE now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1263952252 https://github.com/pydata/xarray/issues/7111#issuecomment-1263952252 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LVmF8 djhoese 1828519 2022-09-30T19:41:30Z 2022-09-30T19:41:30Z CONTRIBUTOR

I get a similar error for different structures and if I do something like data_arr.where(data_arr > 5, drop=True). In this case I have dask array based DataArrays and dask ends up trying to hash the object and it ends up in a loop trying to get xarray to hash the DataArray or something and xarray trying to hash the DataArrays inside .attrs.

``` In [9]: import dask.array as da

In [15]: a = xr.DataArray(da.zeros(5.0), attrs={}, dims=("a_dim",))

In [16]: b = xr.DataArray(da.zeros(8.0), attrs={}, dims=("b_dim",))

In [20]: a.attrs["other"] = b

In [24]: lons = xr.DataArray(da.random.random(8), attrs={"ancillary_variables": [b]})

In [25]: lats = xr.DataArray(da.random.random(8), attrs={"ancillary_variables": [b]})

In [26]: b.attrs["some_attr"] = [lons, lats]

In [27]: cond = a > 5

In [28]: c = a.where(cond, drop=True) ... File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/dask/utils.py:1982, in _HashIdWrapper.hash(self) 1981 def hash(self): -> 1982 return id(self.wrapped)

RecursionError: maximum recursion depth exceeded while calling a Python object

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1263927640 https://github.com/pydata/xarray/issues/7111#issuecomment-1263927640 https://api.github.com/repos/pydata/xarray/issues/7111 IC_kwDOAMm_X85LVgFY djhoese 1828519 2022-09-30T19:14:48Z 2022-09-30T19:14:48Z CONTRIBUTOR

CC @headtr1ck any idea if this is supposed to work with your new #7089?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New deep copy behavior in 2022.9.0 causes maximum recursion error 1392878100
1205503288 https://github.com/pydata/xarray/issues/6813#issuecomment-1205503288 https://api.github.com/repos/pydata/xarray/issues/6813 IC_kwDOAMm_X85H2oU4 djhoese 1828519 2022-08-04T16:36:43Z 2022-08-04T16:36:43Z CONTRIBUTOR

@wroberts4 I'd say maybe make a pull request and we'll see what (if any) tests fail and what the people in charge of merging think about it. I think we've gone through the various possibilities and I think if there were any thread-safety issues trying to be protected against with the exception as it was, they weren't actually being protected against (later reading of the file could have caused an issue).

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening fsspec s3 file twice results in invalid start byte 1310058435
1204355953 https://github.com/pydata/xarray/issues/6813#issuecomment-1204355953 https://api.github.com/repos/pydata/xarray/issues/6813 IC_kwDOAMm_X85HyQNx djhoese 1828519 2022-08-03T18:57:20Z 2022-08-03T18:57:20Z CONTRIBUTOR

Good point. My initial answer was going to be that it isn't a problem because in the second usage of the file we would get the exception about .tell() not being at 0, but after the .seek(0) that would be true and we wouldn't get that exception. So...I guess maybe it should be documented that xarray doesn't support opening the same file-like object from different threads. In which case, making the changes suggested here would only add usability/functionality and not cause any additional issues...unless we're missing something.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening fsspec s3 file twice results in invalid start byte 1310058435
1204316906 https://github.com/pydata/xarray/issues/6813#issuecomment-1204316906 https://api.github.com/repos/pydata/xarray/issues/6813 IC_kwDOAMm_X85HyGrq djhoese 1828519 2022-08-03T18:17:41Z 2022-08-03T18:17:41Z CONTRIBUTOR

I am not certain whether seeking/reading from the same file in multiple places might have unforeseen consequences, such as when doing open_dataset in multiple threads.

Oh duh, that's a good point. So it might be fine dask-wise if the assumption is that open_dataset is called in the main thread and then dask is used to do computations on the arrays later on. If we're talking regular Python Threads or dask delayed functions that are calling open_dataset on the same file-like object (that was passed to the worker function) then it would cause issues. Possibly rare case, but still probably something that xarray wants to support.

Yeah I thought the .read/.write omission from the IOBase class was odd too. Just wanted to point out that the if block is using .read but IOBase is not guaranteed to have .read.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening fsspec s3 file twice results in invalid start byte 1310058435
1204300671 https://github.com/pydata/xarray/issues/6813#issuecomment-1204300671 https://api.github.com/repos/pydata/xarray/issues/6813 IC_kwDOAMm_X85HyCt_ djhoese 1828519 2022-08-03T18:02:20Z 2022-08-03T18:02:20Z CONTRIBUTOR

I talked with @wroberts4 about this in person and if we're not missing some reason to not .seek(0) on a data source then this seems like a simple convenience and user experience improvement. We were thinking maybe it would make more sense to change the function to look like:

python def read_magic_number_from_file(filename_or_obj, count=8) -> bytes: # check byte header to determine file type if isinstance(filename_or_obj, bytes): magic_number = filename_or_obj[:count] elif isinstance(filename_or_obj, io.IOBase): if filename_or_obj.tell() != 0: filename_or_obj.seek(0) # warn about re-seeking? magic_number = filename_or_obj.read(count) filename_or_obj.seek(0) else: raise TypeError(f"cannot read the magic number form {type(filename_or_obj)}") return magic_number

Additionally, the isinstance check is for io.IOBase but that base class isn't guaranteed to have a .read method. The check should probably be for RawIOBase:

https://docs.python.org/3/library/io.html#class-hierarchy

@kmuehlbauer @lamorton I saw you commented on the almost related #3991, do you have any thoughts on this? Should we put a PR together to continue the discussion? Maybe the fsspec folks (@martindurant?) have an opinion on this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening fsspec s3 file twice results in invalid start byte 1310058435
1095790236 https://github.com/pydata/xarray/pull/6471#issuecomment-1095790236 https://api.github.com/repos/pydata/xarray/issues/6471 IC_kwDOAMm_X85BUG6c djhoese 1828519 2022-04-12T01:38:12Z 2022-04-12T01:38:12Z CONTRIBUTOR

Thanks for the fix. It is always risking merging before all CI is finished, especially when code is modified.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support **kwargs form in `.chunk()` 1200309334
1095537116 https://github.com/pydata/xarray/pull/6471#issuecomment-1095537116 https://api.github.com/repos/pydata/xarray/issues/6471 IC_kwDOAMm_X85BTJHc djhoese 1828519 2022-04-11T20:31:24Z 2022-04-11T20:34:34Z CONTRIBUTOR

This uses the Number object but never imports it and is causing CI failures on my unstable dependency environment:

https://github.com/pydata/xarray/blob/ec13944bbd4022614491b6ec479ff2618da14ba8/xarray/core/dataarray.py#L1156

Not sure how the tests didn't hit this or any of the linting checks.

Edit: Ah I see, the tests all failed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support **kwargs form in `.chunk()` 1200309334
999717645 https://github.com/pydata/xarray/issues/6092#issuecomment-999717645 https://api.github.com/repos/pydata/xarray/issues/6092 IC_kwDOAMm_X847lnsN djhoese 1828519 2021-12-22T16:44:32Z 2021-12-22T16:44:32Z CONTRIBUTOR

Yes, sorry, I meant DuckArrayModule.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DeprecationWarning regarding use of distutils Version classes 1085992113
999709747 https://github.com/pydata/xarray/issues/6092#issuecomment-999709747 https://api.github.com/repos/pydata/xarray/issues/6092 IC_kwDOAMm_X847llwz djhoese 1828519 2021-12-22T16:32:24Z 2021-12-22T16:32:24Z CONTRIBUTOR

@mathause I agree that .parse is the preferred method when you are taking a version string from an outside source. If you were using a static/constant string then it would probably be fine to use Version.

I wasn't sure what the best approach would be for xarray given that LooseVersion is a public property on the duck array wrapper, right? I'm not sure if packaging's Version is backwards compatible and whether or not that matters inside xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DeprecationWarning regarding use of distutils Version classes 1085992113
926807091 https://github.com/pydata/xarray/issues/3620#issuecomment-926807091 https://api.github.com/repos/pydata/xarray/issues/3620 IC_kwDOAMm_X843PfQz djhoese 1828519 2021-09-24T17:37:54Z 2021-09-24T17:37:54Z CONTRIBUTOR

@benbovy It's been a while since I've looked into xarray's flexible index work. What's the current state of this work (sorry if there is some issue or blog I should be watching for)? Is it possible for me as a user to create my own index classes that xarray will accept?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Idea: functionally-derived non-dimensional coordinates 537772490
862639338 https://github.com/pydata/xarray/issues/3620#issuecomment-862639338 https://api.github.com/repos/pydata/xarray/issues/3620 MDEyOklzc3VlQ29tbWVudDg2MjYzOTMzOA== djhoese 1828519 2021-06-16T19:06:50Z 2021-06-16T19:06:50Z CONTRIBUTOR

@benbovy I'm reading over the changes in #5322. All of this is preparing for the future, right? Is it worth it to start playing with these base classes (Index) in geoxarray or will I not be able to use them for a CRSIndex until more changes are done to xarray core? For example, none of this set_index for Index classes stuff you showed above is actually implemented yet, right?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Idea: functionally-derived non-dimensional coordinates 537772490
856055553 https://github.com/pydata/xarray/issues/3620#issuecomment-856055553 https://api.github.com/repos/pydata/xarray/issues/3620 MDEyOklzc3VlQ29tbWVudDg1NjA1NTU1Mw== djhoese 1828519 2021-06-07T15:49:42Z 2021-06-07T15:49:42Z CONTRIBUTOR

@benbovy Thanks. This looks really promising and is pretty inline with what I saw geoxarray's internals doing for a user. In your opinion will this type of CRSIndex/WCSIndex work need #5322? If so, will it also require (or benefit from) the additional internal xarray refactoring you mention in #5322?

I can really see this becoming super easy for CRS-based dataset users where libraries like geoxarray (or xoak) "know" the common types of schemes/structures that might exist in the scientific field and have a simple .geo.set_index that figures out most of the parameters for .set_index by default.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Idea: functionally-derived non-dimensional coordinates 537772490
790967872 https://github.com/pydata/xarray/pull/4979#issuecomment-790967872 https://api.github.com/repos/pydata/xarray/issues/4979 MDEyOklzc3VlQ29tbWVudDc5MDk2Nzg3Mg== djhoese 1828519 2021-03-04T21:45:53Z 2021-03-04T21:45:53Z CONTRIBUTOR

2D lat/lon arrays could be as expensive to store as the image itself, even though the values can be computed on the fly with very cheap arithmetic.

Just wanted to mention in case it comes up later, this is true for some datasets and for others the lon/lats are not uniformly spaced so they can't be calculated (just based on the way the satellite instrument works). They have to be loaded from the original dataset (on-disk file). For a while in the Satpy library we were storing 2D dask arrays for the lon/lat coordinates until we realized xarray was sometimes computing them and we didn't want that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible indexes refactoring notes 819062172
785432974 https://github.com/pydata/xarray/issues/4406#issuecomment-785432974 https://api.github.com/repos/pydata/xarray/issues/4406 MDEyOklzc3VlQ29tbWVudDc4NTQzMjk3NA== djhoese 1828519 2021-02-24T22:42:15Z 2021-02-24T22:42:15Z CONTRIBUTOR

I'm having a similar issue to what is described here, but I'm seeing it even when I'm not rewriting an output file (although it is an option in my code). I have a delayed function that is calling to_netcdf and seem to run into some race condition where I get the same deadlock as the original poster. It seems highly dependent on the number of dask tasks and the number of workers. I think I've gotten around it for now by having my delayed function return the Dataset it is working on and then calling to_dataset later. My problem is I have cases where I might not want to write the file so my delayed function returns None. To handle this I need to pre-compute my delayed functions before calling to_dataset since I don't think there is a way to pass something to to_dataset so it doesn't create a file.

With the original code it happened quite a bit but was part of a much larger application so I can't really get a MWE together. Just wanted to mention it here as another data point (to_netcdf inside a Delayed function may not work 100% of the time).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Threading Lock issue with to_netcdf and Dask arrays 694112301
784363753 https://github.com/pydata/xarray/issues/4934#issuecomment-784363753 https://api.github.com/repos/pydata/xarray/issues/4934 MDEyOklzc3VlQ29tbWVudDc4NDM2Mzc1Mw== djhoese 1828519 2021-02-23T17:18:31Z 2021-02-23T17:18:31Z CONTRIBUTOR

https://github.com/dask/dask/issues/7263

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ImplicitToExplicitIndexingAdapter being returned with dask unstable version 812692450
784334872 https://github.com/pydata/xarray/issues/4934#issuecomment-784334872 https://api.github.com/repos/pydata/xarray/issues/4934 MDEyOklzc3VlQ29tbWVudDc4NDMzNDg3Mg== djhoese 1828519 2021-02-23T16:37:20Z 2021-02-23T16:37:20Z CONTRIBUTOR

Should I make an issue with dask so this gets more eyes before they make their release (I'm told later this week)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ImplicitToExplicitIndexingAdapter being returned with dask unstable version 812692450
700696450 https://github.com/pydata/xarray/issues/4471#issuecomment-700696450 https://api.github.com/repos/pydata/xarray/issues/4471 MDEyOklzc3VlQ29tbWVudDcwMDY5NjQ1MA== djhoese 1828519 2020-09-29T13:18:16Z 2020-09-29T13:18:16Z CONTRIBUTOR

Just tested this with decode_cf=False and the rest of the loading process seems fine (note: I used engine='h5netcdf' too).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Numeric scalar variable attributes (including fill_value, scale_factor, add_offset) are 1-d instead of 0-d with h5netcdf engine, triggering ValueError: non-broadcastable output on application when loading single elements 710876876
592847155 https://github.com/pydata/xarray/issues/3813#issuecomment-592847155 https://api.github.com/repos/pydata/xarray/issues/3813 MDEyOklzc3VlQ29tbWVudDU5Mjg0NzE1NQ== djhoese 1828519 2020-02-29T03:38:46Z 2020-02-29T03:38:46Z CONTRIBUTOR

@max-sixty That's exactly it. What's really weird for this is that the original code in Satpy is using a dask array and not a numpy array. It seemed very strange to both copy the DataArray (.copy()), convert the dask array to a numpy array (np.asarray), and then still get a read-only array.

I can understand how xarray would treat numpy arrays and dask arrays the same when it comes to this, but coming from outside the project it is very surprising that a dask array would be marked as read-only when it was used to just create a "new" numpy array.

Feel free to close this or use it as a marker to clarify some documentation or error messages as mentioned in #2891.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Xarray operations produce read-only array 573031381
565679586 https://github.com/pydata/xarray/issues/3620#issuecomment-565679586 https://api.github.com/repos/pydata/xarray/issues/3620 MDEyOklzc3VlQ29tbWVudDU2NTY3OTU4Ng== djhoese 1828519 2019-12-14T03:56:38Z 2019-12-14T03:56:38Z CONTRIBUTOR

For reference, here is how CRS information is handled in rioxarray: CRS management docs.

Nice! I didn't know you had documented this.

Sorry this is going to get long. I'd like to describe the CRS stuff we deal with and the lessons learned and the decisions I've been fighting with in the geoxarray project (https://github.com/geoxarray/geoxarray). I'm looking at this from a meteorological satellite point of view. @snowman2 please correct me if I'm wrong about anything.

  1. In our field(s) and software the CRS object that @snowman2 is talking about and has implemented in pyproj encapsulates our version of these "complex mathematical functions", although ours seem much simpler. The CRS object can hold things like the model of the Earth to use and other parameters defining the coordinate system like the reference longitude/latitude.
  2. When it comes to computing the coordinates of data on a CRS, the coordinates are typically on a cartesian plane so we have an X and a Y and points in between can be linearly interpolated. These work well as 1D xarray coordinates.
  3. These X and Y coordinates don't tell you all the information alone though so we need the CRS information. Xarray's current rasterio functionality adds this CRS definition as a string value in .attrs. The problem with using .attrs for this is most operations on the DataArray object will make this information disappear (ex. adding two DataArrays).
  4. In geoxarray I was going to try putting a pyproj CRS object in a DataArray's coordinates (.coords). I figured this would be good because then if you tried to combine two DataArrays on different CRSes, xarray would fail. Turns out xarray will just ignore the difference and drop the crs coordinate so that was no longer my "perfect" option. Additionally, to access the crs object would have to be accessed by doing my_data_arr.coords['crs'].item() because xarray stores the object as a scalar array.
  5. Xarray accessors, last time I checked, often have to be recreated when working with Dataset or DataArray objects. This has to do with how low-level xarray converts Variable objects to DataArrays. I didn't expect this when I started geoxarray and I'm not really sure how to continue now.

As for your use case(s), I'm wondering if an xarray accessor could work around some of the current limitations you're seeing. They could basically be set up like @dcherian described, but "arbitrary_function" could be accessed through x, y, z = my_data_arr.astro.world_coordinates(subpixels=4) or something. You could do my_data_arr.astro.wcs_parameters to get a dictionary of common WCS parameters stored in .attrs. The point being that the accessor could simplify the interface to doing these calculations and accessing these parameters (stored in .coords and .attrs) and maybe make changes to xarray core unnecessary.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Idea: functionally-derived non-dimensional coordinates 537772490
537949928 https://github.com/pydata/xarray/pull/3318#issuecomment-537949928 https://api.github.com/repos/pydata/xarray/issues/3318 MDEyOklzc3VlQ29tbWVudDUzNzk0OTkyOA== djhoese 1828519 2019-10-03T13:39:34Z 2019-10-03T13:39:34Z CONTRIBUTOR

Any idea when a 0.13.1 release might be made to get this fix out in the wild?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow weakref 495221393
520285297 https://github.com/pydata/xarray/issues/3205#issuecomment-520285297 https://api.github.com/repos/pydata/xarray/issues/3205 MDEyOklzc3VlQ29tbWVudDUyMDI4NTI5Nw== djhoese 1828519 2019-08-12T02:37:46Z 2019-08-12T02:37:46Z CONTRIBUTOR

So I guess the questions are:

  1. Is creating a new DataArray for Dataset.__getitem__ important enough that the above behavior is expected and should be documented as a known limitation/gotcha? Or...

  2. Accessors should work when used like this and the Dataset.__getitem__ creation of DataArrays should be "fixed" or worked around to handle this better?

  3. Or some other option?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Accessors are recreated on every access 479420466
508862961 https://github.com/pydata/xarray/issues/3068#issuecomment-508862961 https://api.github.com/repos/pydata/xarray/issues/3068 MDEyOklzc3VlQ29tbWVudDUwODg2Mjk2MQ== djhoese 1828519 2019-07-05T21:10:50Z 2019-07-05T21:10:50Z CONTRIBUTOR

Ah, good call. The transpose currently in xarray would still be a problem though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional dask coordinates unexpectedly computed 462859457
508078893 https://github.com/pydata/xarray/issues/2281#issuecomment-508078893 https://api.github.com/repos/pydata/xarray/issues/2281 MDEyOklzc3VlQ29tbWVudDUwODA3ODg5Mw== djhoese 1828519 2019-07-03T12:49:20Z 2019-07-03T12:49:20Z CONTRIBUTOR

@kmuehlbauer Thanks for the ping. I don't have time to read this whole thread, but based on your comment I have a few things I'd like to point out. First, the pykdtree package is a good alternative to the scipy kdtree implementation. It has been shown to be much faster and uses openmp for parallel processing. Second, the pyresample library is my main way of resampled geolocated data. We use it in Satpy for resampling, but right now we haven't finalized the interfaces so things are kind of spread between satpy and pyresample as far as easy xarray handling. Pyresample uses SwathDefinition and AreaDefinition objects to define the geolocation of the data. In Satpy the same KDTree is used for every in-memory gridding, but we also allow a cache_dir which will save the indexes for every (source, target) area pair used in the resampling.

I'm hoping to sit down and get some geoxarray stuff implemented during SciPy next week, but usually get distracted by all the talks so no promises. I'd like geoxarray to provide a low level interface for getting and converting CRS and geolocation information on xarray objects and leave resampling and other tasks to libraries like pyresample and rioxarray.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Does interp() work on curvilinear grids (2D coordinates) ?  340486433
507656176 https://github.com/pydata/xarray/issues/3068#issuecomment-507656176 https://api.github.com/repos/pydata/xarray/issues/3068 MDEyOklzc3VlQ29tbWVudDUwNzY1NjE3Ng== djhoese 1828519 2019-07-02T12:31:54Z 2019-07-02T12:33:15Z CONTRIBUTOR

@shoyer Understood. That explains why something like this wasn't caught before, but what would be the best solution for a short term fix?

For the long term, I also understand that there isn't really a good way to check equality of two dask arrays. I wonder if dask's graph optimization could be used to "simplify" two dask arrays' graph separately and check the graph equality. For example, two dask arrays created by doing da.zeros((10, 10), chunks=2) + 5 should be theoretically equal because their dask graphs are made up of the same tasks.

Edit: "short term fix": What is the best way to avoid the unnecessary transpose? Or is this not even the right way to approach this? Change dask to avoid the unnecessary transpose or change xarray to not do the tranpose or something else?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional dask coordinates unexpectedly computed 462859457
507410467 https://github.com/pydata/xarray/issues/3068#issuecomment-507410467 https://api.github.com/repos/pydata/xarray/issues/3068 MDEyOklzc3VlQ29tbWVudDUwNzQxMDQ2Nw== djhoese 1828519 2019-07-01T20:20:05Z 2019-07-01T20:20:05Z CONTRIBUTOR

Modifying this line to be:

python if dims == expanded_vars.sizes: return expanded_vars return expanded_var.transpose(*dims)

Then this issue is avoided for at least the + case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional dask coordinates unexpectedly computed 462859457
507405717 https://github.com/pydata/xarray/issues/3068#issuecomment-507405717 https://api.github.com/repos/pydata/xarray/issues/3068 MDEyOklzc3VlQ29tbWVudDUwNzQwNTcxNw== djhoese 1828519 2019-07-01T20:05:51Z 2019-07-01T20:05:51Z CONTRIBUTOR

Ok another update. In the previous example I accidentally added the lons coordinate DataArray with the dimensions redefined (('y', 'x'), lons2) which is technically redundant but it worked (no progress bar).

However, if I fix this redundancy and do:

python a = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'lons': lons2}) b = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'lons': lons2}) with ProgressBar(): c = a + b

I do get a progress bar again (lons2 is being computed). I've tracked it down to this transpose which is transposing when it doesn't need to which is causing the dask array to change:

https://github.com/pydata/xarray/blob/master/xarray/core/variable.py#L1223

I'm not sure if this would be considered a bug in dask or xarray. Also, not sure why the redundant version of the example worked.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional dask coordinates unexpectedly computed 462859457
507396912 https://github.com/pydata/xarray/issues/3068#issuecomment-507396912 https://api.github.com/repos/pydata/xarray/issues/3068 MDEyOklzc3VlQ29tbWVudDUwNzM5NjkxMg== djhoese 1828519 2019-07-01T19:38:06Z 2019-07-01T19:38:06Z CONTRIBUTOR

Ok I'm getting a little more of an understanding on this. The main issue is that the dask array is not literally considered the same object because I'm creating the object twice. If I create a single dask array and pass it:

python lons = da.zeros((10, 10), chunks=2) a = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'y': np.arange(10), 'x': np.arange(10), 'lons': (('y', 'x'), lons)}) b = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'y': np.arange(10), 'x': np.arange(10), 'lons': (('y', 'x'), lons)})

I still get the progress bar because xarray is creating two new DataArray objects for this lons coordinate. So lons_data_arr.variable._data is not lons_data_arr2.variable._data causing the equivalency check here to fail.

If I make a single DataArray that becomes the coordinate variable then it seems to work:

python lons2 = xr.DataArray(lons, dims=('y', 'x')) a = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'y': np.arange(10), 'x': np.arange(10), 'lons': (('y', 'x'), lons2)}) b = xr.DataArray(da.zeros((10, 10), chunks=2), dims=('y', 'x'), coords={'y': np.arange(10), 'x': np.arange(10), 'lons': (('y', 'x'), lons2)})

I get no progress bar. ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional dask coordinates unexpectedly computed 462859457
505530269 https://github.com/pydata/xarray/pull/3006#issuecomment-505530269 https://api.github.com/repos/pydata/xarray/issues/3006 MDEyOklzc3VlQ29tbWVudDUwNTUzMDI2OQ== djhoese 1828519 2019-06-25T16:53:34Z 2019-06-25T16:53:34Z CONTRIBUTOR

@shoyer Any idea when there might be another release of xarray where this fix will be included? I'm teaching a tutorial at SciPy this year that is effected by this bug. Learners are starting to prepare for the tutorials and I'd like if they could have this fix before the day of the tutorial.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix 'to_masked_array' computing dask arrays twice 453964049
500400954 https://github.com/pydata/xarray/pull/3006#issuecomment-500400954 https://api.github.com/repos/pydata/xarray/issues/3006 MDEyOklzc3VlQ29tbWVudDUwMDQwMDk1NA== djhoese 1828519 2019-06-10T12:36:55Z 2019-06-10T12:36:55Z CONTRIBUTOR

@shoyer Makes sense. Any idea what's up with the travis test? It doesn't look like it is from my changes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix 'to_masked_array' computing dask arrays twice 453964049
500264231 https://github.com/pydata/xarray/pull/3006#issuecomment-500264231 https://api.github.com/repos/pydata/xarray/issues/3006 MDEyOklzc3VlQ29tbWVudDUwMDI2NDIzMQ== djhoese 1828519 2019-06-10T01:36:00Z 2019-06-10T01:36:00Z CONTRIBUTOR

In my own tests I've been using the following custom scheduler with dask.config.set(scheduler=CustomScheduler()) to point out what code is computing the array when I don't want it to:

``` class CustomScheduler(object): """Custom dask scheduler that raises an exception if dask is computed too many times."""

def __init__(self, max_computes=1):
    """Set starting and maximum compute counts."""
    self.max_computes = max_computes
    self.total_computes = 0

def __call__(self, dsk, keys, **kwargs):
    """Compute dask task and keep track of number of times we do so."""
    import dask
    self.total_computes += 1
    if self.total_computes > self.max_computes:
        raise RuntimeError("Too many dask computations were scheduled: {}".format(self.total_computes))
    return dask.get(dsk, keys, **kwargs)

```

Does something like this exist in the xarray tests? If not, I could add it then add a dask test to the DataArray tests.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix 'to_masked_array' computing dask arrays twice 453964049
499498940 https://github.com/pydata/xarray/issues/2288#issuecomment-499498940 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQ5OTQ5ODk0MA== djhoese 1828519 2019-06-06T13:42:42Z 2019-06-06T13:44:00Z CONTRIBUTOR

@Geosynopsis Cool. Your library is the third library that does something similar to what's discussed here (at least recently created ones). I'm glad there are so many people who need this functionality. The packages are: My un-started geoxarray project where I've tried to move these types of conversations (https://github.com/geoxarray/geoxarray) and rioxarray (https://github.com/corteva/rioxarray) which combines xarray and rasterio and started by @snowman2. Given what your project is trying to do maybe you could add the geopandas functionality on to rioxarray instead of a separate package? Let's discuss in an issue on rioxarray if possible, feel free to start it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
494597663 https://github.com/pydata/xarray/issues/2976#issuecomment-494597663 https://api.github.com/repos/pydata/xarray/issues/2976 MDEyOklzc3VlQ29tbWVudDQ5NDU5NzY2Mw== djhoese 1828519 2019-05-21T23:36:21Z 2019-05-21T23:36:21Z CONTRIBUTOR

Rad doesn't include the "band_id" coordinate because "band_id" includes a "band" dimension, which isn't found on the Rad variable.

Ah, that's what I was looking for! Ok, that makes sense. To be as generic as possible xarray is using the dimensions to determine if a coordinate "belongs" to a variable rather than the "coordinates" attribute of that variable. I can live with that. Thanks for the explanation and the link to that part of the code.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Confusing handling of NetCDF coordinates 446722089
494589615 https://github.com/pydata/xarray/issues/2976#issuecomment-494589615 https://api.github.com/repos/pydata/xarray/issues/2976 MDEyOklzc3VlQ29tbWVudDQ5NDU4OTYxNQ== djhoese 1828519 2019-05-21T22:59:24Z 2019-05-21T22:59:24Z CONTRIBUTOR

Ok, but then why are coordinates that exist in the Dataset and in the netcdf variable's 'coordinates' attribute not listed in the DataArray for that variable?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Confusing handling of NetCDF coordinates 446722089
492837199 https://github.com/pydata/xarray/issues/2954#issuecomment-492837199 https://api.github.com/repos/pydata/xarray/issues/2954 MDEyOklzc3VlQ29tbWVudDQ5MjgzNzE5OQ== djhoese 1828519 2019-05-15T21:51:26Z 2019-05-15T21:51:39Z CONTRIBUTOR

Would it be better if we raised an error in these cases, when you later try to access data from a file that was explicitly closed?

I would prefer if it stayed the way it is. I can use the context manager to access specific variables but still hold on to the DataArray objects with dask arrays underneath and use them later. In the non-dask case, I'm not sure.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segmentation fault reading many groups from many files 442617907
472102830 https://github.com/pydata/xarray/pull/2715#issuecomment-472102830 https://api.github.com/repos/pydata/xarray/issues/2715 MDEyOklzc3VlQ29tbWVudDQ3MjEwMjgzMA== djhoese 1828519 2019-03-12T17:31:50Z 2019-03-12T17:31:50Z CONTRIBUTOR

@shoyer Any idea when a 0.11.4 or 0.12 will be released? I'm trying to work around some other rasterio bugs and would like to remove the restriction on the rasterio version used in my CI tests, but that requires this PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix CRS being WKT instead of PROJ.4 403458737
461097743 https://github.com/pydata/xarray/pull/2715#issuecomment-461097743 https://api.github.com/repos/pydata/xarray/issues/2715 MDEyOklzc3VlQ29tbWVudDQ2MTA5Nzc0Mw== djhoese 1828519 2019-02-06T16:52:06Z 2019-02-06T16:52:06Z CONTRIBUTOR

@shoyer Added something to the whats-new. Let me know if anything needs changing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix CRS being WKT instead of PROJ.4 403458737
461084264 https://github.com/pydata/xarray/pull/2715#issuecomment-461084264 https://api.github.com/repos/pydata/xarray/issues/2715 MDEyOklzc3VlQ29tbWVudDQ2MTA4NDI2NA== djhoese 1828519 2019-02-06T16:17:41Z 2019-02-06T16:17:41Z CONTRIBUTOR

@dcherian @shoyer This PR isn't changing any functionality. It is making the same functionality available with newer versions of rasterio.

There are discussions going on regarding changing the behavior: https://github.com/pydata/xarray/issues/2723

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix CRS being WKT instead of PROJ.4 403458737
458314851 https://github.com/pydata/xarray/issues/2722#issuecomment-458314851 https://api.github.com/repos/pydata/xarray/issues/2722 MDEyOklzc3VlQ29tbWVudDQ1ODMxNDg1MQ== djhoese 1828519 2019-01-28T21:48:30Z 2019-01-28T21:48:30Z CONTRIBUTOR

@fmaussion I like the idea of the proj4_crs and wkt_crs attributes. In the future I would hope the geoxarray package could handle this type of stuff (some day I'll get to it).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [discussion] Use WKT or PROJ.4 string for CRS representation? 403971686
458260485 https://github.com/pydata/xarray/pull/2715#issuecomment-458260485 https://api.github.com/repos/pydata/xarray/issues/2715 MDEyOklzc3VlQ29tbWVudDQ1ODI2MDQ4NQ== djhoese 1828519 2019-01-28T19:07:24Z 2019-01-28T19:07:24Z CONTRIBUTOR

@fmaussion Done. And I merged @snowman2's suggestion and fixed the indent (I'm guessing github's editor made it difficult to see but it was off by one indentation).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix CRS being WKT instead of PROJ.4 403458737
456410755 https://github.com/pydata/xarray/issues/2042#issuecomment-456410755 https://api.github.com/repos/pydata/xarray/issues/2042 MDEyOklzc3VlQ29tbWVudDQ1NjQxMDc1NQ== djhoese 1828519 2019-01-22T14:05:33Z 2019-01-22T14:05:33Z CONTRIBUTOR

@guillaumeeb Not that I know of but I'm not completely in the loop with xarray. There is the geoxarray project that I started (https://github.com/geoxarray/geoxarray) but really haven't had any time to work on it. Otherwise you could look at the satpy library or its dependency library trollimage which uses rasterio but it assumes some things about how data is structured including an 'area' in .attrs from pyresample. Sorry I don't have a better idea.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Anyone working on a to_tiff? Alternatively, how do you write an xarray to a geotiff?  312203596
441742119 https://github.com/pydata/xarray/issues/2060#issuecomment-441742119 https://api.github.com/repos/pydata/xarray/issues/2060 MDEyOklzc3VlQ29tbWVudDQ0MTc0MjExOQ== djhoese 1828519 2018-11-26T18:17:01Z 2018-11-26T18:17:01Z CONTRIBUTOR

So this would mean concat would not retain any .attrs, right?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Confusing error message when attribute not equal during concat 314457748
427370729 https://github.com/pydata/xarray/pull/2465#issuecomment-427370729 https://api.github.com/repos/pydata/xarray/issues/2465 MDEyOklzc3VlQ29tbWVudDQyNzM3MDcyOQ== djhoese 1828519 2018-10-05T13:44:03Z 2018-10-05T13:44:03Z CONTRIBUTOR

What I have it set to in this PR should match the classifiers in the setup.py: 2.7 and 3.5+

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add python_requires to setup.py 367217516
419166240 https://github.com/pydata/xarray/issues/2368#issuecomment-419166240 https://api.github.com/repos/pydata/xarray/issues/2368 MDEyOklzc3VlQ29tbWVudDQxOTE2NjI0MA== djhoese 1828519 2018-09-06T16:54:43Z 2018-09-06T16:55:11Z CONTRIBUTOR

@rabernat For the groups NetCDF files I had in mind the NASA L1B data files for the satellite instrument VIIRS onboard Suomi-NPP and NOAA-20 satellites. You can see an example file here.

The summary of the ncdump is:

``` netcdf VNP02IMG.A2018008.0000.001.2018061001540 { dimensions: number_of_scans = 202 ; number_of_lines = 6464 ; number_of_pixels = 6400 ; number_of_LUT_values = 65536 ;

... lots of global attributes ...

group: scan_line_attributes { variables: double scan_start_time(number_of_scans) ; scan_start_time:long_name = "Scan start time (TAI93)" ; scan_start_time:units = "seconds" ; scan_start_time:_FillValue = -999.9 ; scan_start_time:valid_min = 0. ; scan_start_time:valid_max = 2000000000. ; ... lots of other variables in this group ...

group: observation_data { variables: ushort I04(number_of_lines, number_of_pixels) ; I04:long_name = "I-band 04 earth view radiance" ; I04:units = "Watts/meter^2/steradian/micrometer" ; I04:_FillValue = 65535US ; I04:valid_min = 0US ; I04:valid_max = 65527US ; I04:scale_factor = 6.104354e-05f ; I04:add_offset = 0.0016703f ; I04:flag_values = 65532US, 65533US, 65534US ; I04:flag_meanings = "Missing_EV Bowtie_Deleted Cal_Fail" ;

```

When I first started out with xarray I assumed I would be able to do something like:

import xarray as xr nc = xr.open_dataset('VNP02IMG.A2018008.0000.001.2018061001540.nc') band_data = nc['observation_data/I04']

Which I can't do, but can do with the python netcdf4 library:

``` In [7]: from netCDF4 import Dataset

In [8]: nc = Dataset('VNP02IMG.A2018008.0000.001.2018061001540.nc')

In [9]: nc['observation_data/I04'] Out[9]: <class 'netCDF4._netCDF4.Variable'> ```

I understand that I can provide the group keyword to open_dataset but then I have to open the file twice if I want to get the global attributes. So any interface I had set up in my code to pass around one object with all of the file contents won't work. That isn't xarray's fault and shouldn't necessarily be something xarray has to solve, but it is a type of NetCDF4 file that is valid and can't be read "perfectly" with xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
417152163 https://github.com/pydata/xarray/issues/2288#issuecomment-417152163 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQxNzE1MjE2Mw== djhoese 1828519 2018-08-30T00:37:51Z 2018-08-30T00:37:51Z CONTRIBUTOR

@karimbahgat Thanks for the info and questions. As for xarray, it is a generic container format (array + dimensions + coordinates for those dimensions + attributes) but resembles the format of data stored in netcdf files. It can technically hold any N-dimensional data. This issue in particular is what is a good "standard" way for multiple libraries to represent CRS information in xarray's objects.

I think the lack of documentation is pycrs is my biggest hurdle right now as I don't know how I'm supposed to use the library, but I want to. It may also be that my use cases for CRS information are different than yours, but the structure of the package is not intuitive to me. But again a simple example of passing a PROJ.4 string to something and getting a CRS object would solve all that. I'll make some issues on pycrs when I get a chance (add travis/appveyor tests, add documentation, base classes for certain things, etc). For geotiff's CRS I think with most geotiff-reading libraries you load the CRS info as a PROJ.4 string.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
415982335 https://github.com/pydata/xarray/issues/2288#issuecomment-415982335 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQxNTk4MjMzNQ== djhoese 1828519 2018-08-25T16:52:47Z 2018-08-25T16:52:47Z CONTRIBUTOR

I wouldn't mind that. There seems like there is already a package that handles this: https://github.com/karimbahgat/PyCRS

@karimbahgat I'd love your input on this issue as a whole too.

There are a couple things that I had in mind for a "pycrs" library to handle that the PyCRS library doesn't do (converting to other libraries' CRS objects), but maybe that is a good thing.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
415844110 https://github.com/pydata/xarray/issues/2288#issuecomment-415844110 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQxNTg0NDExMA== djhoese 1828519 2018-08-24T18:29:51Z 2018-08-24T18:29:51Z CONTRIBUTOR

The question is what is the purpose of that new package? I wouldn't mind a new package like that, but then that becomes something like what was discussed in https://github.com/pangeo-data/pangeo/issues/356. That package should probably cover CRS objects and Grid definitions. Then libraries like geoxarray, pyresample, cartopy, and metpy could all use that library. If that library ends up covering resampling/transforming at all I'd say I'll just absorb that logic in to pyresample and provide some new interfaces to things.

However, what does geoxarray become if it doesn't have that CRS logic in it. Just an xarray accessor? I guess that makes sense from a "Write programs that do one thing and do it well" philosophy point of view.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
415633883 https://github.com/pydata/xarray/issues/2288#issuecomment-415633883 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQxNTYzMzg4Mw== djhoese 1828519 2018-08-24T02:41:00Z 2018-08-24T02:41:00Z CONTRIBUTOR

FYI I've started a really basic layout of the CRS object in geoxarray: https://github.com/geoxarray/geoxarray

It doesn't actually do anything yet, but I copied all the utilities from pyresample that are useful (convert PROJ.4 to cartopy CRS, proj4 str to dict, etc). I decided that the CRS object should use the CF conventions naming for projection parameters based on a conversation I had with @dopplershift. The main factor being they are much more human readable than the PROJ.4 names. The issue with this is that I have much more experience dealing with PROJ.4 parameters.

I can probably also get a lot of information from metpy's CF plotting code: https://github.com/Unidata/MetPy/blob/master/metpy/plots/mapping.py

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
413849756 https://github.com/pydata/xarray/issues/2368#issuecomment-413849756 https://api.github.com/repos/pydata/xarray/issues/2368 MDEyOklzc3VlQ29tbWVudDQxMzg0OTc1Ng== djhoese 1828519 2018-08-17T12:26:42Z 2018-08-17T12:26:42Z CONTRIBUTOR

This is mentioned elsewhere (can't find the issue right now) and may be out of scope for this issue but I'm going to say it anyway: opening a NetCDF file with groups was not as easy as I wanted it to be when first starting out with xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
412302416 https://github.com/pydata/xarray/issues/2288#issuecomment-412302416 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQxMjMwMjQxNg== djhoese 1828519 2018-08-11T21:24:05Z 2018-08-11T21:24:05Z CONTRIBUTOR

Would anyone that is watching this thread hate if I made geoxarray python 3.6+? I doubt there are any features that are needed in 3.6, but also am not going to support python 2.

Additionally, @shoyer @fmaussion and any other xarray-dev, I've been thinking about the case where I have 2D image data and 2D longitude and latitude arrays (one lon/lat pair for each image pixel). Is there a way in xarray to associate these three arrays in a DataArray so that slicing is handled automatically but also not put the arrays in the coordinates? As mentioned above I don't want to put these lon/lat arrays in the .coords because they have to be fully computed if they are dask arrays (or at least that is my understanding). For my use cases this could mean a good chunk of memory being dedicated to these coordinates. From what I can tell my options are .coords or a .Dataset with all 3.

Similarly, is there any concept like a "hidden" coordinate where utilities like to_netcdf ignore the coordinate and don't write it? Maybe something like .coords['_crs'] = "blah blah blah"? I could always add this logic myself to geoxarray's version of to_netcdf.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
410308698 https://github.com/pydata/xarray/issues/2288#issuecomment-410308698 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQxMDMwODY5OA== djhoese 1828519 2018-08-03T16:36:55Z 2018-08-03T16:36:55Z CONTRIBUTOR

@wy2136 Very cool. We have the ability in satpy (via pyresample) to create cartopy CRS objects and therefore cartopy plots from our xarray DataArray objects: https://github.com/pytroll/pytroll-examples/blob/master/satpy/Cartopy%20Plot.ipynb

It would be nice if we could work together in the future since it looks like you do a lot of the same stuff. When I make an official "geoxarray" library I think I'm going to make a "geo" accessor for some of these same operations (see above conversations).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
410068839 https://github.com/pydata/xarray/issues/2288#issuecomment-410068839 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQxMDA2ODgzOQ== djhoese 1828519 2018-08-02T21:09:38Z 2018-08-02T21:09:38Z CONTRIBUTOR

For the user base I think if we can cover as many groups as possible that would be best. I know there are plenty of people who need to describe CRS information in their data, but don't use geotiffs and therefore don't really need rasterio/gdal. The group I immediately thought of was the metpy group which is why I talked to @dopplershift in the first place. The immediate need for this group (based on his scipy talk) will be people reading NetCDF files and putting the data on a cartopy plot. I think @dopplershift and I agreed that when it comes problems building/distributing software dealing with this type of data the cause is almost always gdal/libgdal. I'm in favor of making it optional if possible.

For the to_netcdf stuff I think anything that needs to be "adjusted" before writing to a NetCDF file can be handled by required users to call my_data.geo.to_netcdf(...). I'm not a huge fan of the accessor adding information automatically that the user didn't specifically request. Side effects on your data just for importing a module is not good. I will try to put together a package skeleton and lay out some of the stuff in my head in the next month, but am still catching up on work after SciPy and a week of vacation so not sure when exactly I'll get to it.

I just did a search for "geoxarray" on github and @wy2136's repositories came up where they are importing a geoxarray package. @wy2136, is there another geoxarray project that we are not aware of? Do you have anything else to add to this discussion?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
408613922 https://github.com/pydata/xarray/issues/2288#issuecomment-408613922 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQwODYxMzkyMg== djhoese 1828519 2018-07-28T15:06:04Z 2018-07-28T15:06:04Z CONTRIBUTOR

I was talking with @dopplershift the other day on gitter and he brought up a very important point: no matter how CRS information is represented the user should be able to access the individual parameters (reference longitude, datum, etc). This lead me to think that a new CRS class is probably needed, even though I wanted to avoid it, because it would likely be one of the easiest ways to provide access to the individual parameters. There are already cartopy CRS objects that IMO are difficult to create and rasterio CRS objects that require gdal which is a pretty huge dependency to require users to install just to describe their data. That said, I think no matter how it is coded I don't want to duplicate all the work that has been done in rasterio/gdal for handling WKT and converting between different CRS formats.

The other thing I've been pondering during idle brain time is: is it better for this library to require an xarray object to have projection information described in one and only one way (a CRS object instance for example) or does the xarray accessor handling multiple forms of this projection information. Does having a CRS object in .coords allow some functionality that a simple string would not have? Does not having a required .coords CRS element stop the accessor from adding one later? In the latter case of the accessor parsing existing attrs/coords/dims of the xarray object, I was thinking it could handle a PROJ.4 string and the CF "grid_mapping" specification to start. The main functionality that would be available here is that with little to no work a user could import geoxarray and have access to this to whatever functionality can be provided in a .geo accessor. Or if they load a netcdf file with xr.open_dataset there is no extra work required for a user to supply that data to another library that uses geoxarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
407753531 https://github.com/pydata/xarray/issues/2288#issuecomment-407753531 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQwNzc1MzUzMQ== djhoese 1828519 2018-07-25T13:26:45Z 2018-07-25T13:26:45Z CONTRIBUTOR

I was talking about open_dataset not reading standard CF files in the way we want, at least not the way it is now. I understand that setting the CRS in .coords will write out the CRS when you use to_netcdf. The issue is that a standard CF netcdf file created by someone else that strictly follows the CF standard will not be read in the same way. Put another way, you could not load a CF NetCDF file with open_dataset and write it out with to_netcdf and get the same output.

Also note that having the grid_mapping as a coordinate in xarray object results in it being listed in the coordinates attribute in the output netcdf file which is technically not part of the CF standard.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
407742230 https://github.com/pydata/xarray/issues/2288#issuecomment-407742230 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQwNzc0MjIzMA== djhoese 1828519 2018-07-25T12:46:57Z 2018-07-25T12:50:04Z CONTRIBUTOR

The files I have created have the crs coordinate variable inside

Ok so the netcdf files that you have created and are reading with xarray.open_dataset have grid_mapping set to "crs" for your data variables, right? Do you also include a special "crs" dimension? I believe having this dimension would cause xarray to automatically consider "crs" a coordinate, but this is not CF standard from what I can tell. As I mentioned in your other issue the CF standard files I have for GOES-16 ABI L1B data do not have this "crs" dimension (or similarly named dimension) which means that the variable specified by the grid_mapping attribute is not considered a coordinate for the associated DataArray/Dataset.

This means that to properly associate a CRS with a DataArray/Dataset this new library would require its own version of open_dataset to assign these things correctly based on grid_mapping. Since the library would require users to use this function instead of xarray's then I don't think it would be out of the question for it to also have a custom to_netcdf method if we chose to use a non-CF representation of the CRS information. Not saying I feel strongly about it, just pointing out that it isn't a huge leap to require users to use the new/custom methods.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
407739349 https://github.com/pydata/xarray/issues/2308#issuecomment-407739349 https://api.github.com/repos/pydata/xarray/issues/2308 MDEyOklzc3VlQ29tbWVudDQwNzczOTM0OQ== djhoese 1828519 2018-07-25T12:35:52Z 2018-07-25T12:35:52Z CONTRIBUTOR

@fmaussion I completely agree except now that all of this is being brought up I see why it may have been better to put the 'crs' in the coordinates of the DataArray returned by open_rasterio. Since two DataArrays in different projections are not and should not be considered "mergeable". But I can also see how this walks the line of special handling of a data format by trying to interpret things in a certain way, but that line is already pretty blurry in the case of reading rasterio compatible datasets.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Proposal: Update rasterio backend to store CRS/nodata information in standard locations. 344058811
407613401 https://github.com/pydata/xarray/issues/2308#issuecomment-407613401 https://api.github.com/repos/pydata/xarray/issues/2308 MDEyOklzc3VlQ29tbWVudDQwNzYxMzQwMQ== djhoese 1828519 2018-07-25T02:30:38Z 2018-07-25T02:30:38Z CONTRIBUTOR

This is the output of using xarray with a standard GOES-16 ABI L1B data file:

```python In [2]: import xarray as xr

In [3]: nc = xr.open_dataset('OR_ABI-L1b-RadF-M3C01_G16_s20181741200454_e20181741211221_c20181741211264.nc')

In [4]: nc.data_vars['Rad'] Out[4]: <xarray.DataArray 'Rad' (y: 10848, x: 10848)> array([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], dtype=float32) Coordinates: t datetime64[ns] ... * y (y) float32 0.151858 0.15183 0.151802 0.151774 0.151746 ... * x (x) float32 -0.151858 -0.15183 -0.151802 -0.151774 -0.151746 ... y_image float32 ... x_image float32 ... Attributes: long_name: ABI L1b Radiances standard_name: toa_outgoing_radiance_per_unit_wavelength sensor_band_bit_depth: 10 valid_range: [ 0 1022] units: W m-2 sr-1 um-1 resolution: y: 0.000028 rad x: 0.000028 rad grid_mapping: goes_imager_projection cell_methods: t: point area: point ancillary_variables: DQF

In [5]: nc.data_vars['goes_imager_projection'] Out[5]: <xarray.DataArray 'goes_imager_projection' ()> array(-2147483647, dtype=int32) Coordinates: t datetime64[ns] ... y_image float32 ... x_image float32 ... Attributes: long_name: GOES-R ABI fixed grid projection grid_mapping_name: geostationary perspective_point_height: 35786023.0 semi_major_axis: 6378137.0 semi_minor_axis: 6356752.31414 inverse_flattening: 298.2572221 latitude_of_projection_origin: 0.0 longitude_of_projection_origin: -75.0 sweep_angle_axis: x ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Proposal: Update rasterio backend to store CRS/nodata information in standard locations. 344058811
407564039 https://github.com/pydata/xarray/issues/2288#issuecomment-407564039 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQwNzU2NDAzOQ== djhoese 1828519 2018-07-24T21:52:01Z 2018-07-24T21:52:50Z CONTRIBUTOR

Regarding non-uniform datasets, I think we have a small misunderstanding. I'm talking about things like data from polar-orbiting satellites where the original data is only geolocated by longitude/latitude values per pixel and the spacing between these pixels is not uniform so you need every original longitude and latitude coordinate to properly geolocate the data (data, longitude, and latitude arrays all have the same shape). When it comes to the topics in this issue this is an problem because you would expect the lat/lon arrays to be set as coordinates but if you are dealing with dask arrays that means that these values are now fully computed (correct me if I'm wrong).

For your example of adding a crs attribute, I understand that that is how one could do it, but I'm saying it is not already done in xarray's open_dataset. In my opinion this is one of the biggest downsides of the CF way of specifying projections, they are a special case that doesn't fit the rest of the NetCDF model well (a scalar with all valuable data in the attributes that is indirectly specified on data variables).

In your example of methods is to_projection a remapping/resampling operation? If not, how does it differ from set_crs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
407560681 https://github.com/pydata/xarray/issues/2308#issuecomment-407560681 https://api.github.com/repos/pydata/xarray/issues/2308 MDEyOklzc3VlQ29tbWVudDQwNzU2MDY4MQ== djhoese 1828519 2018-07-24T21:38:34Z 2018-07-24T21:38:34Z CONTRIBUTOR

I wouldn't expect it to add crs if there wasn't a grid_mapping specified, but if it was then I would. In a simple test where I did xr.open_dataset('my_nc.nc') which has a grid_mapping attribute, xarray does nothing special to create a crs or other named coordinate variable referencing the associated grid_mapping variable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Proposal: Update rasterio backend to store CRS/nodata information in standard locations. 344058811
407522819 https://github.com/pydata/xarray/issues/2308#issuecomment-407522819 https://api.github.com/repos/pydata/xarray/issues/2308 MDEyOklzc3VlQ29tbWVudDQwNzUyMjgxOQ== djhoese 1828519 2018-07-24T19:23:51Z 2018-07-24T19:23:51Z CONTRIBUTOR

@snowman2 This should mean that open_dataset should handle CRS specially too, right? Currently it doesn't seem to do anything special for the coordinate variable pointed to by grid_mapping.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Proposal: Update rasterio backend to store CRS/nodata information in standard locations. 344058811
407522046 https://github.com/pydata/xarray/issues/2288#issuecomment-407522046 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQwNzUyMjA0Ng== djhoese 1828519 2018-07-24T19:21:05Z 2018-07-24T19:21:05Z CONTRIBUTOR

@snowman2 Awesome. Thanks for the info, this is really good stuff to know. In your own projects and use of raster-like data, do you ever deal with non-uniform/non-projected data? How do you prefer to handle/store individual lon/lat values for each pixel? Also it looks like xarray would have to be updated to add the "crs" coordinate since currently it is not considered a coordinate variable. So a new library may need to have custom to_netcdf/open_dataset methods, right?

It kind of seems like a new library may be needed for this although I was hoping to avoid it. All of the conversions we've talked about could be really useful to a lot of people. I'm not aware of an existing library that handles these conversions as one of its main purposes and they always end up as a "nice utility" that helps the library as a whole. It seems like a library to solve this issue should be able to do the following:

  1. Store CRS information in xarray objects
  2. Write properly geolocated netcdf and geotiff files from xarray objects.
  3. Read netcdf and geotiff files as properly described xarray objects.
  4. Convert CRS information from one format to another: WKT, EPSG (if available), PROJ.4 str/dict, rasterio CRS, cartopy CRS
  5. Optionally (why not) be able to resample datasets to other projections.

Beyond reading/writing NetCDF and geotiff files I would be worried that this new library could easily suffer from major scope creep. Especially since this is one of the main purposes of the satpy library, even if it is dedicated to satellite imagery right now. @snowman2 I'm guessing the data cube project has similar use cases. If the reading/writing is limited to a specific set of formats then I could see pyresample being a playground for this type of functionality. The main reason for a playground versus a new from-scratch package would be the use of existing utilities in pyresample assuming resampling is a major feature of this new specification. Yet another braindump...complete.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
407394381 https://github.com/pydata/xarray/issues/2288#issuecomment-407394381 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQwNzM5NDM4MQ== djhoese 1828519 2018-07-24T12:47:55Z 2018-07-24T12:47:55Z CONTRIBUTOR

@snowman2 I thought about that too, but here are the reasons I came up with for why this might not be the best idea:

  1. CF conventions change over time and depending on which version of the standard you are using, things can be represented differently. This would tie a geoxarray-like library to a specific version of the CF standard which may be confusing and would require adjustments when writing to a NetCDF file to match the user's desired version of the standard.
  2. Using a CF standard CRS description would require conversion to something more useful for just about every use case (that I can think of) that isn't saving to a netcdf file. For example, a PROJ.4 string can be passed to pyproj.Proj or in the near future cartopy to convert to a cartopy CRS object.
  3. If we have to add more information to the crs coordinate to make it more useful like a PROJ.4 string then we end up with multiple representations of the same thing, making maintenance of the information harder.

The result of this github issue should either be a new package that solves all (90+%) of these topics or an easy to implement, easy to use, geolocation description best practice so that libraries can more easily communicate. I think with the CF standard CRS object we would definitely need a new library to provide all the utilities for converting to and from various things.

Lastly, I don't know if I trust CF to be the one source of truth for stuff like this. If I've missed some other obvious benefits of this or if working with WKT or the CF standard CRS attributes isn't actually that complicated let me know.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
406704673 https://github.com/pydata/xarray/issues/2288#issuecomment-406704673 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQwNjcwNDY3Mw== djhoese 1828519 2018-07-20T19:30:53Z 2018-07-20T19:30:53Z CONTRIBUTOR

I've thought about this a little more and I agree with @fmaussion that this doesn't need to be added to xarray. I think if "we", developers who work with projected datasets, can agree that "crs" in an xarray objects coordinates is a PROJ.4 string then that's half the battle of passing them between libraries. If not a PROJ.4 string, other ideas (dict?)?

I initially had the idea to start a new geoxarray type library but the more I thought about what features I would want in it, it started looking a lot like a new interface on pyresample via an xarray accessor. If not an accessor then a subclass but that defeats the purpose (easy collaboration between libraries). I'd also like to use the name "geo" for the accessor but have a feeling that won't jive well with everyone so I will likely fall back to "pyresample".

One thing that just came to mind while typing this that is another difficulty is that there will still be the need to have an object like pyresample's AreaDefinition to represent a geographic region (projection, extents, size). These could then be passed to things like a .resample method as a target projection or slicing based on another projection's coordinates.

When I started typing this I thought I had it all laid out in my head, not anymore. 😢

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
406696890 https://github.com/pydata/xarray/issues/2042#issuecomment-406696890 https://api.github.com/repos/pydata/xarray/issues/2042 MDEyOklzc3VlQ29tbWVudDQwNjY5Njg5MA== djhoese 1828519 2018-07-20T18:57:38Z 2018-07-20T18:58:10Z CONTRIBUTOR

I'd like to add to this discussion the issue I brought up here #2288. It is something that could/should probably result in a new xarray add-on package for doing these type of operations. For example, I work on the pyresample and satpy projects. Pyresample uses its own "AreaDefinition" objects to define the geolocation/projection information. SatPy uses these AreaDefinitions by setting DataArray.attrs['area'] and using then when necessary. This includes the ability to write geotiffs using rasterio and a custom array-like class for writing dask chunks to the geotiff between separate threads (does not work multiprocess, yet).

Edit: by "add-on" I mean something like "geoxarray" where it is an optional dependency for a user that depends completely on xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Anyone working on a to_tiff? Alternatively, how do you write an xarray to a geotiff?  312203596
405616219 https://github.com/pydata/xarray/issues/2288#issuecomment-405616219 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQwNTYxNjIxOQ== djhoese 1828519 2018-07-17T15:06:54Z 2018-07-17T15:06:54Z CONTRIBUTOR

@shoyer I haven't read all of #1092 but that is another related issue for satpy where some satellite data formats use groups in NetCDF files which makes it difficult to use xr.open_dataset to access all the variables inside the file.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
405263631 https://github.com/pydata/xarray/issues/2288#issuecomment-405263631 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQwNTI2MzYzMQ== djhoese 1828519 2018-07-16T14:20:43Z 2018-07-16T14:20:43Z CONTRIBUTOR

@fmaussion I guess you're right. And that set of attributes to keep during certain operations would be very nice in my satpy library. We currently have to do a lot of special handling of that.

The one thing that a crs coordinate (PROJ.4 dict or str) doesn't handle is specifying what other coordinates define the X/Y projection coordinates. This logic also helps with non-uniform datasets where a longitude and latitude coordinate are needed. Of course, a downstream library could just define some type of standard for this. However, there are edge cases where I think the default handling of these coordinates by xarray would be bad. For example, satpy doesn't currently use Dataset objects directly and only uses DataArrays because of how coordinates have to be handled in a Dataset:

```In [3]: a = xr.DataArray(np.zeros((5, 10), dtype=np.float32), coords={'y': np.arange(5.), 'x': np.arange(10.)}, dims=('y', 'x'))

In [4]: b = xr.DataArray(np.zeros((5, 10), dtype=np.float32), coords={'y': np.arange(2., 7.), 'x': np.arange(2., 12.)}, dims=('y', 'x'))

In [6]: ds = xr.Dataset({'a': a, 'b': b})

In [7]: ds.coords Out[7]: Coordinates: * y (y) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 * x (x) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 ```

But I guess that is intended behavior and if the crs is a coordinate then joining things from different projections would not be allowed and raise an exception. However that is exactly what satpy wants/needs to handle in some cases (satellite datasets at different resolutions, multiple 'regions' of from the same overall instrument, two channels from the same instrument with slightly shifted geolocation, etc). I'm kind of just thinking out loud here, but I'll think about this more in my idle brain cycles today.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
405110142 https://github.com/pydata/xarray/issues/2288#issuecomment-405110142 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQwNTExMDE0Mg== djhoese 1828519 2018-07-15T18:48:01Z 2018-07-15T18:48:01Z CONTRIBUTOR

Also I should add the geopandas library as another reference.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
405109909 https://github.com/pydata/xarray/issues/2288#issuecomment-405109909 https://api.github.com/repos/pydata/xarray/issues/2288 MDEyOklzc3VlQ29tbWVudDQwNTEwOTkwOQ== djhoese 1828519 2018-07-15T18:43:43Z 2018-07-15T18:44:33Z CONTRIBUTOR

@fmaussion Note that I am the one who started the PROJ.4 CRS in cartopy pull request (https://github.com/SciTools/cartopy/pull/1023) and that it was this work that I copied to pyresample for my own pyresample work since I didn't want to wait for everything to be flushed out in cartopy. You can see an example of the to_cartopy_crs method here: https://github.com/pytroll/pytroll-examples/blob/master/satpy/Cartopy%20Plot.ipynb

It's also these cartopy CRS issues that make me think that Cartopy CRS objects aren't the right solution for this type of logic as a "how to represent CRS objects". In my experience (see: my cartopy PR :wink:) and watching and talking with people at SciPy 2018 is that multiple projects have work arounds for passing their CRS/projection information to cartopy.

In my biased experience/opinion PROJ.4 is or can be used in quite a few libraries/fields. If PROJ.4 or something that accepts PROJ.4 isn't used then we might as well come up with a new standard way of defining projections...just kidding.

Side note: FYI the geotiff format does not currently accept the sweep axis parameter +sweep that PROJ.4 needs to properly describe the geos projection used by GOES-16 ABI satellite instrument data. I've contacted some of the geotiff library people at some point and from what I remember it was a dead end without a lot of work behind fixing it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add CRS/projection information to xarray objects 341331807
381366584 https://github.com/pydata/xarray/issues/1829#issuecomment-381366584 https://api.github.com/repos/pydata/xarray/issues/1829 MDEyOklzc3VlQ29tbWVudDM4MTM2NjU4NA== djhoese 1828519 2018-04-14T22:56:34Z 2018-04-14T22:56:34Z CONTRIBUTOR

Looks like it is related to pip 10.0, with pip 9.0.3 it seems to install pandas fine on Python 3.4. I'll continue debugging this with pandas and the pip projects. Thanks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Drop support for Python 3.4 288465429
381365561 https://github.com/pydata/xarray/issues/1829#issuecomment-381365561 https://api.github.com/repos/pydata/xarray/issues/1829 MDEyOklzc3VlQ29tbWVudDM4MTM2NTU2MQ== djhoese 1828519 2018-04-14T22:38:33Z 2018-04-14T22:39:05Z CONTRIBUTOR

I just ran in to an issue testing Python 3.4 on Travis where xarray asked for pandas >0.18.0 which pulls in a version of pandas that is not compatible with Python 3.4 (https://github.com/pandas-dev/pandas/issues/20697). It also seems like this could be related to pip 10.0.

I'm ok dropping Python 3.4 from my tests, but is this python version check something pip/pypi should handle or is it something that xarray has to check in its setup.py?

Edit: I should have just made a new issue, sorry.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Drop support for Python 3.4 288465429
373221321 https://github.com/pydata/xarray/issues/1989#issuecomment-373221321 https://api.github.com/repos/pydata/xarray/issues/1989 MDEyOklzc3VlQ29tbWVudDM3MzIyMTMyMQ== djhoese 1828519 2018-03-15T00:37:26Z 2018-03-15T00:38:04Z CONTRIBUTOR

@shoyer In my examples rows = cols = 1000 (xarray 0.10.1).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistent type conversion when doing numpy.sum gvies different results 305373563
373219624 https://github.com/pydata/xarray/issues/1989#issuecomment-373219624 https://api.github.com/repos/pydata/xarray/issues/1989 MDEyOklzc3VlQ29tbWVudDM3MzIxOTYyNA== djhoese 1828519 2018-03-15T00:27:35Z 2018-03-15T00:27:35Z CONTRIBUTOR

Example:

``` import numpy as np import xarray as xr a = xr.DataArray(np.random.random((rows, cols)).astype(np.float32), dims=('y', 'x')) In [65]: np.sum(a).data Out[65]: array(499858.0625)

In [66]: np.sum(a.data) Out[66]: 499855.19

In [67]: np.sum(a.data.astype(np.float64)) Out[67]: 499855.21635645436

In [68]: np.sum(a.data.astype(np.float32)) Out[68]: 499855.19 ```

I realized after making this example that nansum gives expected results: ``` a = xr.DataArray(np.random.random((rows, cols)).astype(np.float32), dims=('y', 'x')) In [83]: np.nansum(a.data) Out[83]: 500027.81

In [84]: np.nansum(a) Out[84]: 500027.81

In [85]: np.nansum(a.data.astype(np.float64)) Out[85]: 500027.77103802469

In [86]: np.nansum(a.astype(np.float64)) Out[86]: 500027.77103802469 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistent type conversion when doing numpy.sum gvies different results 305373563
327894887 https://github.com/pydata/xarray/issues/1560#issuecomment-327894887 https://api.github.com/repos/pydata/xarray/issues/1560 MDEyOklzc3VlQ29tbWVudDMyNzg5NDg4Nw== djhoese 1828519 2017-09-07T19:07:40Z 2017-09-07T19:07:40Z CONTRIBUTOR

@shoyer As for the equals shortcut, isn't that what this line is doing: https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexes/multi.py#L1864

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.unstack taking unreasonable amounts of memory 255989233
327849071 https://github.com/pydata/xarray/issues/1560#issuecomment-327849071 https://api.github.com/repos/pydata/xarray/issues/1560 MDEyOklzc3VlQ29tbWVudDMyNzg0OTA3MQ== djhoese 1828519 2017-09-07T16:15:06Z 2017-09-07T16:15:06Z CONTRIBUTOR

I was able to reproduce this on my mac by watching Activity Monitor and saw a peak of ~8GB of memory during the unstack call.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.unstack taking unreasonable amounts of memory 255989233

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 74.973ms · About: xarray-datasette