home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

104 rows where user = 90008 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

issue >30

  • Opening from zarr.ZipStore fails to read (store???) unicode characters 11
  • Avoid loading entire dataset by getting the nbytes in an array 10
  • Remove debugging slow assert statement 6
  • Expand benchmarks for dataset insertion and creation 6
  • FutureWarning: creation of DataArrays w/ coords Dataset 5
  • 🐛 NetCDF4 RuntimeWarning if xarray is imported before netCDF4 5
  • expand_dims erases named dim in the array's coordinates 4
  • Performance: numpy indexes small amounts of data 1000 faster than xarray 4
  • Actually make the fast code path return early for Aligner.align 4
  • coordinates not removed for variable encoding during reset_coords 4
  • Append along an unlimited dimension to an existing netCDF file 3
  • Serialization of just coordinates 3
  • Test failure with TestValidateAttrs.test_validating_attrs 3
  • [WIP] [DEMO] Add tests for ZipStore for zarr 3
  • WIP: Ensure that zarr.ZipStores are closed 3
  • [WIP] Support nano second time encoding. 3
  • netcdf roundtrip fails to preserve the shape of numpy arrays in attributes 2
  • Lazy indexing arrays as a stand-alone package 2
  • Keyword only args for arguments like "drop" 2
  • Use base ImportError not MoudleNotFoundError when testing for plugins 2
  • Read/Write performance optimizations for netcdf files 2
  • get_data or get_varibale method 2
  • Lazy import dask.distributed to reduce import time of xarray 2
  • Insertion speed of new dataset elements 2
  • Unnamed dimensions 1
  • Unable to decode a date in nanoseconds 1
  • [FEATURE]: to_netcdf and additional keyword arguments 1
  • Long import time 1
  • decorator to deprecate positional arguments 1
  • Lazy Imports 1
  • …

user 1

  • hmaarrfk · 104 ✖

author_association 1

  • CONTRIBUTOR 104
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1515539273 https://github.com/pydata/xarray/issues/7770#issuecomment-1515539273 https://api.github.com/repos/pydata/xarray/issues/7770 IC_kwDOAMm_X85aVUtJ hmaarrfk 90008 2023-04-20T00:15:23Z 2023-04-20T00:15:23Z CONTRIBUTOR

Understood. Thank you for your prompt replies.

I'll read up on ask again if I have any questions.

I guess I was trying to accommodate in the past users that were not using our wrappers to to_netcdf

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Provide a public API for adding new backends 1675299031
1484222279 https://github.com/pydata/xarray/pull/4400#issuecomment-1484222279 https://api.github.com/repos/pydata/xarray/issues/4400 IC_kwDOAMm_X85Yd29H hmaarrfk 90008 2023-03-26T20:59:00Z 2023-03-26T20:59:00Z CONTRIBUTOR

nice!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Support nano second time encoding. 690546795
1434780029 https://github.com/pydata/xarray/issues/4079#issuecomment-1434780029 https://api.github.com/repos/pydata/xarray/issues/4079 IC_kwDOAMm_X85VhQF9 hmaarrfk 90008 2023-02-17T15:08:50Z 2023-02-17T15:08:50Z CONTRIBUTOR

I know it is "stale" but aligning to these "surprise dimensions" creates "late stage" bugs that are hard to pinpoint.

I'm not sure if it is possible to mark these dimensions as "unnamed" and as such, they should be "merged" into new "unnamed" dimensions that the user isn't tracking at this point in time.

Our workaround have included calling these dimensions something related to the datarray d1_i, or simply making small small "arrays" a countable number of scalar variables (d1_min, d1_max) instead of a single array containing two values d1_limits.

```python import xarray as xr d1 = xr.DataArray(data=[1, 2]) assert 'dim_0' in d1.dims d2 = xr.DataArray(data=[1, 2, 3]) assert 'dim_0' in d2.dims

xr.Dataset({'d1': d1, 'd2': d2}) ```

Stack trace ``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[2], line 7 4 d2 = xr.DataArray(data=[1, 2, 3]) 5 assert 'dim_0' in d2.dims ----> 7 xr.Dataset({'d1': d1, 'd2': d2}) File ~/mambaforge/envs/dev/lib/python3.9/site-packages/xarray/core/dataset.py:612, in Dataset.__init__(self, data_vars, coords, attrs) 609 if isinstance(coords, Dataset): 610 coords = coords.variables --> 612 variables, coord_names, dims, indexes, _ = merge_data_and_coords( 613 data_vars, coords, compat="broadcast_equals" 614 ) 616 self._attrs = dict(attrs) if attrs is not None else None 617 self._close = None File ~/mambaforge/envs/dev/lib/python3.9/site-packages/xarray/core/merge.py:564, in merge_data_and_coords(data_vars, coords, compat, join) 562 objects = [data_vars, coords] 563 explicit_coords = coords.keys() --> 564 return merge_core( 565 objects, 566 compat, 567 join, 568 explicit_coords=explicit_coords, 569 indexes=Indexes(indexes, coords), 570 ) File ~/mambaforge/envs/dev/lib/python3.9/site-packages/xarray/core/merge.py:741, in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value) 738 _assert_compat_valid(compat) 740 coerced = coerce_pandas_values(objects) --> 741 aligned = deep_align( 742 coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value 743 ) 744 collected = collect_variables_and_indexes(aligned, indexes=indexes) 745 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat) File ~/mambaforge/envs/dev/lib/python3.9/site-packages/xarray/core/alignment.py:848, in deep_align(objects, join, copy, indexes, exclude, raise_on_invalid, fill_value) 845 else: 846 out.append(variables) --> 848 aligned = align( 849 *targets, 850 join=join, 851 copy=copy, 852 indexes=indexes, 853 exclude=exclude, 854 fill_value=fill_value, 855 ) 857 for position, key, aligned_obj in zip(positions, keys, aligned): 858 if key is no_key: File ~/mambaforge/envs/dev/lib/python3.9/site-packages/xarray/core/alignment.py:785, in align(join, copy, indexes, exclude, fill_value, *objects) 589 """ 590 Given any number of Dataset and/or DataArray objects, returns new 591 objects with aligned indexes and dimension sizes. (...) 775 776 """ 777 aligner = Aligner( 778 objects, 779 join=join, (...) 783 fill_value=fill_value, 784 ) --> 785 aligner.align() 786 return aligner.results File ~/mambaforge/envs/dev/lib/python3.9/site-packages/xarray/core/alignment.py:573, in Aligner.align(self) 571 self.assert_no_index_conflict() 572 self.align_indexes() --> 573 self.assert_unindexed_dim_sizes_equal() 575 if self.join == "override": 576 self.override_indexes() File ~/mambaforge/envs/dev/lib/python3.9/site-packages/xarray/core/alignment.py:472, in Aligner.assert_unindexed_dim_sizes_equal(self) 470 add_err_msg = "" 471 if len(sizes) > 1: --> 472 raise ValueError( 473 f"cannot reindex or align along dimension {dim!r} " 474 f"because of conflicting dimension sizes: {sizes!r}" + add_err_msg 475 ) ValueError: cannot reindex or align along dimension 'dim_0' because of conflicting dimension sizes: {2, 3} ```

cc: @claydugo

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unnamed dimensions 621078539
1421384646 https://github.com/pydata/xarray/issues/7513#issuecomment-1421384646 https://api.github.com/repos/pydata/xarray/issues/7513 IC_kwDOAMm_X85UuJvG hmaarrfk 90008 2023-02-07T20:15:42Z 2023-02-07T20:15:42Z CONTRIBUTOR

I kinda think this reminds me of

https://github.com/pydata/xarray/discussions/7359

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  intermittent failures with h5netcdf, h5py on macos 1574694462
1412388313 https://github.com/pydata/xarray/issues/5081#issuecomment-1412388313 https://api.github.com/repos/pydata/xarray/issues/5081 IC_kwDOAMm_X85UL1XZ hmaarrfk 90008 2023-02-01T16:54:53Z 2023-02-01T16:54:53Z CONTRIBUTOR

As a followup question, is the LazilyIndexedArray part of the 'public api'. That is when you do decide to refactor, https://docs.xarray.dev/en/stable/generated/xarray.core.indexing.LazilyIndexedArray.html

Will you try to warn us users that choose to

from xarray.core.indexing import LazilyIndexedArray

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Lazy indexing arrays as a stand-alone package 842436143
1412379773 https://github.com/pydata/xarray/issues/5081#issuecomment-1412379773 https://api.github.com/repos/pydata/xarray/issues/5081 IC_kwDOAMm_X85ULzR9 hmaarrfk 90008 2023-02-01T16:49:15Z 2023-02-01T16:49:15Z CONTRIBUTOR

I'm going to say, the LazilyIndexedArray is pretty cool.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Lazy indexing arrays as a stand-alone package 842436143
1411104404 https://github.com/pydata/xarray/pull/4395#issuecomment-1411104404 https://api.github.com/repos/pydata/xarray/issues/4395 IC_kwDOAMm_X85UG76U hmaarrfk 90008 2023-01-31T21:39:15Z 2023-01-31T21:39:15Z CONTRIBUTOR

Ultimately, I'm not sure how you want to manage resources. This zarr store could be considered a resource and thus, may have an owner. Or maybe zarr should close itself upon garbage cleanup.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Ensure that zarr.ZipStores are closed 689502005
1411102778 https://github.com/pydata/xarray/pull/4395#issuecomment-1411102778 https://api.github.com/repos/pydata/xarray/issues/4395 IC_kwDOAMm_X85UG7g6 hmaarrfk 90008 2023-01-31T21:38:23Z 2023-01-31T21:38:23Z CONTRIBUTOR

I'm not sure. I decided not to use zarr (not now) so i lost interest sorry.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Ensure that zarr.ZipStores are closed 689502005
1383192335 https://github.com/pydata/xarray/issues/7245#issuecomment-1383192335 https://api.github.com/repos/pydata/xarray/issues/7245 IC_kwDOAMm_X85ScdcP hmaarrfk 90008 2023-01-15T16:23:15Z 2023-01-15T16:23:15Z CONTRIBUTOR

Thank you for your explination.

Do you think it is safe to "strip" encoding after "loading" the data? or is it still used after the initial call to open_dataset?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  coordinates not removed for variable encoding during reset_coords 1432388736
1369001951 https://github.com/pydata/xarray/issues/7245#issuecomment-1369001951 https://api.github.com/repos/pydata/xarray/issues/7245 IC_kwDOAMm_X85RmU_f hmaarrfk 90008 2023-01-02T14:41:45Z 2023-01-02T14:41:45Z CONTRIBUTOR

Kind bump

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  coordinates not removed for variable encoding during reset_coords 1432388736
1362322800 https://github.com/pydata/xarray/pull/7356#issuecomment-1362322800 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85RM2Vw hmaarrfk 90008 2022-12-22T02:40:59Z 2022-12-22T02:40:59Z CONTRIBUTOR

Any chance of a release, this is quite breaking for large datasets that can only be out of memory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1346924547 https://github.com/pydata/xarray/pull/7356#issuecomment-1346924547 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85QSHAD hmaarrfk 90008 2022-12-12T17:27:47Z 2022-12-12T17:27:47Z CONTRIBUTOR

👍🏾

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1339624818 https://github.com/pydata/xarray/pull/7356#issuecomment-1339624818 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85P2Q1y hmaarrfk 90008 2022-12-06T16:19:19Z 2022-12-06T16:19:19Z CONTRIBUTOR

Yes, without chunks of anything

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1339624418 https://github.com/pydata/xarray/pull/7356#issuecomment-1339624418 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85P2Qvi hmaarrfk 90008 2022-12-06T16:18:59Z 2022-12-06T16:18:59Z CONTRIBUTOR

Very smart test!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1339457617 https://github.com/pydata/xarray/pull/7356#issuecomment-1339457617 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85P1oBR hmaarrfk 90008 2022-12-06T14:18:11Z 2022-12-06T14:18:11Z CONTRIBUTOR

The data is loaded from an NetCDF store through open_dataset

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1339452942 https://github.com/pydata/xarray/pull/7356#issuecomment-1339452942 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85P1m4O hmaarrfk 90008 2022-12-06T14:14:57Z 2022-12-06T14:14:57Z CONTRIBUTOR

No explicit test was added to ensure that the data wasn't loaded. I just experienced this bug enough (we would accidentally load 100GB files in our code base) that I knew exactly how to fix it.

If you want i can add a test to ensure that future optimizations to nbytes do not trigger a data load.

I was hoping the 1 line fix would be a shoe in.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1336731702 https://github.com/pydata/xarray/pull/7356#issuecomment-1336731702 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85PrOg2 hmaarrfk 90008 2022-12-05T04:20:08Z 2022-12-05T04:20:08Z CONTRIBUTOR

It seems that checking hasattr on the _data variable achieves both purposes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1336711830 https://github.com/pydata/xarray/pull/7356#issuecomment-1336711830 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85PrJqW hmaarrfk 90008 2022-12-05T03:58:50Z 2022-12-05T03:58:50Z CONTRIBUTOR

I think that at the very lease, the current implementation works as well as the old one for arrays that are defined by the sparse package.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1336700669 https://github.com/pydata/xarray/pull/7356#issuecomment-1336700669 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85PrG79 hmaarrfk 90008 2022-12-05T03:36:31Z 2022-12-05T03:36:31Z CONTRIBUTOR

Looking into the history a little more. I seem to be proposing to revert: https://github.com/pydata/xarray/commit/60f8c3d3488d377b0b21009422c6121e1c8f1f70

I think this is important since many users have arrays that are larger than memory. For me, I found this bug when trying to access the number of bytes in a 16GB dataset that I'm trying to load on my wimpy laptop. Not fun to start swapping. I feel like others might be hitting this too.

xref: https://github.com/pydata/xarray/pull/6797 https://github.com/pydata/xarray/issues/4842

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1336696899 https://github.com/pydata/xarray/pull/7356#issuecomment-1336696899 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85PrGBD hmaarrfk 90008 2022-12-05T03:30:31Z 2022-12-05T03:30:31Z CONTRIBUTOR

I personally do not even think the hasattr is really that useful. You might as well use size and itemsize

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1320956883 https://github.com/pydata/xarray/issues/7259#issuecomment-1320956883 https://api.github.com/repos/pydata/xarray/issues/7259 IC_kwDOAMm_X85OvDPT hmaarrfk 90008 2022-11-19T19:51:27Z 2022-11-19T19:51:27Z CONTRIBUTOR

I'm really not sure. It seems to happen with a large swath of versions from my recent search.

Also running from the python REPL, i don't see the warning. which makes me feel like numpy/cython/netcdf4 are trying to suppress the harmless warning.

https://github.com/cython/cython/blob/0.29.x/Cython/Utility/ImportExport.c#L365

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  🐛 NetCDF4 RuntimeWarning if xarray is imported before netCDF4 1437481995
1320953994 https://github.com/pydata/xarray/issues/7259#issuecomment-1320953994 https://api.github.com/repos/pydata/xarray/issues/7259 IC_kwDOAMm_X85OvCiK hmaarrfk 90008 2022-11-19T19:33:37Z 2022-11-19T19:33:37Z CONTRIBUTOR

one or the other.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  🐛 NetCDF4 RuntimeWarning if xarray is imported before netCDF4 1437481995
1320953155 https://github.com/pydata/xarray/issues/7259#issuecomment-1320953155 https://api.github.com/repos/pydata/xarray/issues/7259 IC_kwDOAMm_X85OvCVD hmaarrfk 90008 2022-11-19T19:28:20Z 2022-11-19T19:28:53Z CONTRIBUTOR

I think it is a numpy thing bash mamba create --name np numpy netcdf4 --channel conda-forge --override-channels conda activate np python -c "import numpy; import warnings; warnings.filterwarnings('error'); import netCDF4" python Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/mark/mambaforge/envs/np/lib/python3.11/site-packages/netCDF4/__init__.py", line 3, in <module> from ._netCDF4 import * File "src/netCDF4/_netCDF4.pyx", line 1, in init netCDF4._netCDF4 RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 16 from C header, got 96 from PyObject

``` $ mamba list # packages in environment at /home/mark/mambaforge/envs/np: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.18.1 h7f98852_0 conda-forge ca-certificates 2022.9.24 ha878542_0 conda-forge cftime 1.6.2 py311h4c7f6c3_1 conda-forge curl 7.86.0 h2283fc2_1 conda-forge hdf4 4.2.15 h9772cbc_5 conda-forge hdf5 1.12.2 nompi_h4df4325_100 conda-forge icu 70.1 h27087fc_0 conda-forge jpeg 9e h166bdaf_2 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge krb5 1.19.3 h08a2579_0 conda-forge ld_impl_linux-64 2.39 hc81fddc_0 conda-forge libblas 3.9.0 16_linux64_openblas conda-forge libcblas 3.9.0 16_linux64_openblas conda-forge libcurl 7.86.0 h2283fc2_1 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 12.2.0 h65d4601_19 conda-forge libgfortran-ng 12.2.0 h69a702a_19 conda-forge libgfortran5 12.2.0 h337968e_19 conda-forge libgomp 12.2.0 h65d4601_19 conda-forge libiconv 1.17 h166bdaf_0 conda-forge liblapack 3.9.0 16_linux64_openblas conda-forge libnetcdf 4.8.1 nompi_h261ec11_106 conda-forge libnghttp2 1.47.0 hff17c54_1 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge libsqlite 3.40.0 h753d276_0 conda-forge libssh2 1.10.0 hf14f497_3 conda-forge libstdcxx-ng 12.2.0 h46fd767_19 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libxml2 2.10.3 h7463322_0 conda-forge libzip 1.9.2 hc929e4a_1 conda-forge libzlib 1.2.13 h166bdaf_4 conda-forge ncurses 6.3 h27087fc_1 conda-forge netcdf4 1.6.2 nompi_py311hc6fcf29_100 conda-forge numpy 1.23.4 py311h7d28db0_1 conda-forge openssl 3.0.7 h166bdaf_0 conda-forge pip 22.3.1 pyhd8ed1ab_0 conda-forge python 3.11.0 ha86cf86_0_cpython conda-forge python_abi 3.11 2_cp311 conda-forge readline 8.1.2 h0f457ee_0 conda-forge setuptools 65.5.1 pyhd8ed1ab_0 conda-forge tk 8.6.12 h27826a3_0 conda-forge tzdata 2022f h191b570_0 conda-forge wheel 0.38.4 pyhd8ed1ab_0 conda-forge xz 5.2.6 h166bdaf_0 conda-forge zlib 1.2.13 h166bdaf_4 conda-forge ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  🐛 NetCDF4 RuntimeWarning if xarray is imported before netCDF4 1437481995
1320950377 https://github.com/pydata/xarray/issues/7259#issuecomment-1320950377 https://api.github.com/repos/pydata/xarray/issues/7259 IC_kwDOAMm_X85OvBpp hmaarrfk 90008 2022-11-19T19:14:04Z 2022-11-19T19:14:04Z CONTRIBUTOR

mamba create --name xr numpy pandas xarray netcdf4 --channel conda-forge --override-channels conda activate xr python -c "import xarray; import warnings; warnings.filterwarnings('error'); import netCDF4"

Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/mark/mambaforge/envs/xr/lib/python3.11/site-packages/netCDF4/__init__.py", line 3, in <module> from ._netCDF4 import * File "src/netCDF4/_netCDF4.pyx", line 1, in init netCDF4._netCDF4 RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 16 from C header, got 96 from PyObject

`mamba list` ``` mamba list # packages in environment at /home/mark/mambaforge/envs/xr: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.18.1 h7f98852_0 conda-forge ca-certificates 2022.9.24 ha878542_0 conda-forge cftime 1.6.2 py311h4c7f6c3_1 conda-forge curl 7.86.0 h2283fc2_1 conda-forge hdf4 4.2.15 h9772cbc_5 conda-forge hdf5 1.12.2 nompi_h4df4325_100 conda-forge icu 70.1 h27087fc_0 conda-forge jpeg 9e h166bdaf_2 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge krb5 1.19.3 h08a2579_0 conda-forge ld_impl_linux-64 2.39 hc81fddc_0 conda-forge libblas 3.9.0 16_linux64_openblas conda-forge libcblas 3.9.0 16_linux64_openblas conda-forge libcurl 7.86.0 h2283fc2_1 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 12.2.0 h65d4601_19 conda-forge libgfortran-ng 12.2.0 h69a702a_19 conda-forge libgfortran5 12.2.0 h337968e_19 conda-forge libgomp 12.2.0 h65d4601_19 conda-forge libiconv 1.17 h166bdaf_0 conda-forge liblapack 3.9.0 16_linux64_openblas conda-forge libnetcdf 4.8.1 nompi_h261ec11_106 conda-forge libnghttp2 1.47.0 hff17c54_1 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge libsqlite 3.40.0 h753d276_0 conda-forge libssh2 1.10.0 hf14f497_3 conda-forge libstdcxx-ng 12.2.0 h46fd767_19 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libxml2 2.10.3 h7463322_0 conda-forge libzip 1.9.2 hc929e4a_1 conda-forge libzlib 1.2.13 h166bdaf_4 conda-forge ncurses 6.3 h27087fc_1 conda-forge netcdf4 1.6.2 nompi_py311hc6fcf29_100 conda-forge numpy 1.23.4 py311h7d28db0_1 conda-forge openssl 3.0.7 h166bdaf_0 conda-forge packaging 21.3 pyhd8ed1ab_0 conda-forge pandas 1.5.1 py311h8b32b4d_1 conda-forge pip 22.3.1 pyhd8ed1ab_0 conda-forge pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge python 3.11.0 ha86cf86_0_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python_abi 3.11 2_cp311 conda-forge pytz 2022.6 pyhd8ed1ab_0 conda-forge readline 8.1.2 h0f457ee_0 conda-forge setuptools 65.5.1 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge tk 8.6.12 h27826a3_0 conda-forge tzdata 2022f h191b570_0 conda-forge wheel 0.38.4 pyhd8ed1ab_0 conda-forge xarray 2022.11.0 pyhd8ed1ab_0 conda-forge xz 5.2.6 h166bdaf_0 conda-forge zlib 1.2.13 h166bdaf_4 conda-forge ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  🐛 NetCDF4 RuntimeWarning if xarray is imported before netCDF4 1437481995
1320945794 https://github.com/pydata/xarray/issues/7259#issuecomment-1320945794 https://api.github.com/repos/pydata/xarray/issues/7259 IC_kwDOAMm_X85OvAiC hmaarrfk 90008 2022-11-19T18:51:01Z 2022-11-19T18:51:01Z CONTRIBUTOR

It is also reproducible on binder:

It seems that the binder uses conda-forge, which is why i'm commenting here.

It is really strange in the sense that xarray doesn't compile anything.

https://github.com/conda-forge/xarray-feedstock/blob/main/recipe/meta.yaml#L16

So it must be something that gets lazy loaded that triggers things.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  🐛 NetCDF4 RuntimeWarning if xarray is imported before netCDF4 1437481995
1306327743 https://github.com/pydata/xarray/issues/2799#issuecomment-1306327743 https://api.github.com/repos/pydata/xarray/issues/2799 IC_kwDOAMm_X85N3Pq_ hmaarrfk 90008 2022-11-07T22:45:07Z 2022-11-07T22:45:07Z CONTRIBUTOR

As I've been recently going down this performance rabbit hole, I think the discussion around https://github.com/pydata/xarray/issues/7045 is relevant and provides some additional historical context as to "why" this performance penalty might be happening.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
1300527716 https://github.com/pydata/xarray/issues/7245#issuecomment-1300527716 https://api.github.com/repos/pydata/xarray/issues/7245 IC_kwDOAMm_X85NhHpk hmaarrfk 90008 2022-11-02T14:27:04Z 2022-11-02T14:27:04Z CONTRIBUTOR

While the above "fix" addresses the issues with renaming coordinates, I think there are plenty of usecases where we would still end up with strange, or unexpected results. For example.

  1. Load a dataset with many non-indexing coordinates.
  2. Dropping variables (that happen to be coordinates).
  3. Then adding back a variable with the same name.
  4. Upon save, encoding would dictate that it is a coordinate of a particular variable and will promote it to a coordinate instead of data.

We could apply the "fix" to the drop_vars method as well, but I think it may be hard (though not impossible) to hit all the cases.

I think a more "generic", albeit breaking" fix would be to remove the "coordinates" entirely from encoding after the dataset has been loaded. That said, this only "works" if dataset['variable_name'].encoding['coordinates'] is considered a private variable. That is, users are not supposed to be adding to it at will.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  coordinates not removed for variable encoding during reset_coords 1432388736
1299492524 https://github.com/pydata/xarray/issues/7245#issuecomment-1299492524 https://api.github.com/repos/pydata/xarray/issues/7245 IC_kwDOAMm_X85NdK6s hmaarrfk 90008 2022-11-02T02:49:58Z 2022-11-02T02:57:37Z CONTRIBUTOR

And if you want to have a clean encoding dictionary, you may want to do the following:

python names = set(names) for _, variable in obj._variables.items(): if 'coordinates' in variable.encoding: coords_in_encoding = set(variable.encoding.get('coordinates').split(' ')) remaining_coords = coords_in_encoding - names if len(remaining_coords) == 0: del variable.encoding['coordinates'] else: variable.encoding['coordinates'] = ' '.join(remaining_coords)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  coordinates not removed for variable encoding during reset_coords 1432388736
1299369449 https://github.com/pydata/xarray/issues/7239#issuecomment-1299369449 https://api.github.com/repos/pydata/xarray/issues/7239 IC_kwDOAMm_X85Ncs3p hmaarrfk 90008 2022-11-01T23:54:07Z 2022-11-01T23:54:07Z CONTRIBUTOR

I think these are good alternatives.

From my experiments (and I'm still trying to create a minimum reproducible code that shows the real problem behind the slowdowns) reindexing can be quite an expensive. We used to have many coordinates (to ensure that critical metdata stays with data_variables) and those coordinates were causing slowdowns on reindexing operations.

Thus the two calls update and expand_dims might cause two reindex merges to occur.

However, for this particular issue, I think that documenting the strategies proposed in the docstring is good enough. I have a feeling if one can get to the bottom of 7224, the performance concerns here will be mitigated too.

We can leave the performance discussion to: https://github.com/pydata/xarray/issues/7224

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  include/exclude lists in Dataset.expand_dims 1429172192
1296269381 https://github.com/pydata/xarray/pull/7238#issuecomment-1296269381 https://api.github.com/repos/pydata/xarray/issues/7238 IC_kwDOAMm_X85NQ4BF hmaarrfk 90008 2022-10-30T14:10:23Z 2022-10-30T14:10:23Z CONTRIBUTOR

Hmm...I was kind of hoping we could avoid something like adding a _stacklevel_increment argument.

Right. thank you for finding that example. I was going to try to construct one.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improve non-nanosecond warning 1428748922
1296006560 https://github.com/pydata/xarray/issues/7224#issuecomment-1296006560 https://api.github.com/repos/pydata/xarray/issues/7224 IC_kwDOAMm_X85NP32g hmaarrfk 90008 2022-10-29T22:39:39Z 2022-10-29T22:39:39Z CONTRIBUTOR

xref: https://github.com/pandas-dev/pandas/pull/49393

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Insertion speed of new dataset elements 1423948375
1296006402 https://github.com/pydata/xarray/issues/7224#issuecomment-1296006402 https://api.github.com/repos/pydata/xarray/issues/7224 IC_kwDOAMm_X85NP30C hmaarrfk 90008 2022-10-29T22:39:01Z 2022-10-29T22:39:01Z CONTRIBUTOR

Ok, I don't think I have the right tools to really get to the bottom of this. The spyder profiler just seems to slowdown code too much. Any other tools to recommend?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Insertion speed of new dataset elements 1423948375
1295999237 https://github.com/pydata/xarray/pull/7236#issuecomment-1295999237 https://api.github.com/repos/pydata/xarray/issues/7236 IC_kwDOAMm_X85NP2EF hmaarrfk 90008 2022-10-29T22:11:33Z 2022-10-29T22:11:33Z CONTRIBUTOR

Well now the benchmarks look like they make more sense: [ 50.00%] ··· ==================== ========== ========== ========== ========== ========== -- count -------------------- ------------------------------------------------------ strategy 0 1 10 100 1000 ==================== ========== ========== ========== ========== ========== dict_of_DataArrays 1.56±0ms 3.60±0ms 5.83±0ms 16.6±0ms 67.3±0ms dict_of_Variables 1.65±0ms 3.11±0ms 4.03±0ms 6.22±0ms 18.9±0ms dict_of_Tuples 2.42±0ms 3.11±0ms 982±0μs 5.17±0ms 17.2±0ms ==================== ========== ========== ========== ========== ==========

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expand benchmarks for dataset insertion and creation 1428274982
1295937569 https://github.com/pydata/xarray/pull/7236#issuecomment-1295937569 https://api.github.com/repos/pydata/xarray/issues/7236 IC_kwDOAMm_X85NPnAh hmaarrfk 90008 2022-10-29T18:58:35Z 2022-10-29T18:58:35Z CONTRIBUTOR

[ 50.00%] ··· ==================== ========== ========== ========== ========== ========== -- count -------------------- ------------------------------------------------------ strategy 0 1 10 100 1000 ==================== ========== ========== ========== ========== ========== dict_of_DataArrays 1.65±0ms 3.83±0ms 4.03±0ms 6.14±0ms 16.6±0ms dict_of_Variables 3.04±0ms 3.24±0ms 3.38±0ms 4.04±0ms 9.91±0ms dict_of_Tuples 2.90±0ms 3.03±0ms 3.32±0ms 3.22±0ms 3.22±0ms ==================== ========== ========== ========== ========== ==========

as you though, the numbers improve quite a bit.

I kinda want to understand why a no-op takes 1 ms! ^_^

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expand benchmarks for dataset insertion and creation 1428274982
1295937364 https://github.com/pydata/xarray/pull/7236#issuecomment-1295937364 https://api.github.com/repos/pydata/xarray/issues/7236 IC_kwDOAMm_X85NPm9U hmaarrfk 90008 2022-10-29T18:57:54Z 2022-10-29T18:57:54Z CONTRIBUTOR

What about just specifying "dims"?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expand benchmarks for dataset insertion and creation 1428274982
1295905591 https://github.com/pydata/xarray/pull/7236#issuecomment-1295905591 https://api.github.com/repos/pydata/xarray/issues/7236 IC_kwDOAMm_X85NPfM3 hmaarrfk 90008 2022-10-29T17:11:30Z 2022-10-29T17:11:30Z CONTRIBUTOR

With the right window size it looks like: [ 50.00%] ··· ==================== ========== ========== ========== ========== ========== -- count -------------------- ------------------------------------------------------ strategy 0 1 10 100 1000 ==================== ========== ========== ========== ========== ========== dict_of_DataArrays 1.32±0ms 5.87±0ms 7.58±0ms 18.7±0ms 98.6±0ms dict_of_Variables 2.70±0ms 2.91±0ms 3.01±0ms 3.91±0ms 7.04±0ms dict_of_Tuples 2.84±0ms 3.02±0ms 3.22±0ms 3.42±0ms 3.02±0ms ==================== ========== ========== ========== ========== ==========

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expand benchmarks for dataset insertion and creation 1428274982
1295852860 https://github.com/pydata/xarray/pull/7236#issuecomment-1295852860 https://api.github.com/repos/pydata/xarray/issues/7236 IC_kwDOAMm_X85NPSU8 hmaarrfk 90008 2022-10-29T14:28:25Z 2022-10-29T14:28:25Z CONTRIBUTOR

On the CI, it reports similar findings: ``` [ 67.73%] ··· ...dVariable.time_dict_of_dataarrays_to_dataset ok [ 67.73%] ··· =================== ============= existing_elements
------------------- ------------- 0 269±0.9μs
10 2.21±0.01ms 100 16.5±0.07ms 1000 153±0.9ms
=================== =============

[ 67.88%] ··· ...etAddVariable.time_dict_of_tuples_to_dataset ok [ 67.88%] ··· =================== =========== existing_elements
------------------- ----------- 0 269±0.6μs 10 289±0.4μs 100 293±1μs
1000 346±0.4μs =================== ===========

[ 68.02%] ··· ...ddVariable.time_dict_of_variables_to_dataset ok [ 68.02%] ··· =================== ============= existing_elements
------------------- ------------- 0 270±1μs
10 329±0.6μs
100 636±1μs
1000 3.70±0.01ms =================== =============

[ 68.17%] ··· ...e.DatasetAddVariable.time_merge_two_datasets ok [ 68.17%] ··· =================== ============= existing_elements
------------------- ------------- 0 104±0.5μs
10 235±0.6μs
100 1.05±0ms
1000 9.02±0.02ms =================== =============

[ 68.31%] ··· ...e.DatasetAddVariable.time_variable_insertion ok [ 68.31%] ··· =================== ============= existing_elements
------------------- ------------- 0 119±1μs
10 225±0.7μs
100 1.04±0ms
1000 9.03±0.03ms =================== ============= ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expand benchmarks for dataset insertion and creation 1428274982
1295843798 https://github.com/pydata/xarray/pull/7236#issuecomment-1295843798 https://api.github.com/repos/pydata/xarray/issues/7236 IC_kwDOAMm_X85NPQHW hmaarrfk 90008 2022-10-29T13:55:33Z 2022-10-29T13:55:33Z CONTRIBUTOR

``` $ asv run -E existing --quick --bench merge · Discovering benchmarks · Running 5 total benchmarks (1 commits * 1 environments * 5 benchmarks) [ 0.00%] ·· Benchmarking existing-py_home_mark_mambaforge_envs_mcam_dev_bin_python [ 10.00%] ··· merge.DatasetAddVariable.time_dict_of_dataarrays_to_dataset ok [ 10.00%] ··· =================== ========== existing_elements ------------------- ---------- 0 762±0μs 10 7.18±0ms 100 12.6±0ms 1000 89.1±0ms =================== ==========

[ 20.00%] ··· merge.DatasetAddVariable.time_dict_of_tuples_to_dataset ok [ 20.00%] ··· =================== ========== existing_elements ------------------- ---------- 0 889±0μs 10 2.01±0ms 100 1.34±0ms 1000 605±0μs =================== ==========

[ 30.00%] ··· merge.DatasetAddVariable.time_dict_of_variables_to_dataset ok [ 30.00%] ··· =================== ========== existing_elements ------------------- ---------- 0 2.48±0ms 10 2.06±0ms 100 2.13±0ms 1000 2.38±0ms =================== ==========

[ 40.00%] ··· merge.DatasetAddVariable.time_merge_two_datasets ok [ 40.00%] ··· =================== ========== existing_elements ------------------- ---------- 0 814±0μs 10 945±0μs 100 2.42±0ms 1000 5.23±0ms =================== ==========

[ 50.00%] ··· merge.DatasetAddVariable.time_variable_insertion ok [ 50.00%] ··· =================== ========== existing_elements ------------------- ---------- 0 1.10±0ms 10 954±0μs 100 1.88±0ms 1000 5.29±0ms =================== ========== ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expand benchmarks for dataset insertion and creation 1428274982
1295257627 https://github.com/pydata/xarray/pull/7179#issuecomment-1295257627 https://api.github.com/repos/pydata/xarray/issues/7179 IC_kwDOAMm_X85NNBAb hmaarrfk 90008 2022-10-28T17:21:40Z 2022-10-28T17:21:40Z CONTRIBUTOR

Exciting improvements on usability for the next version!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Lazy Imports 1412019155
1293689561 https://github.com/pydata/xarray/pull/7222#issuecomment-1293689561 https://api.github.com/repos/pydata/xarray/issues/7222 IC_kwDOAMm_X85NHCLZ hmaarrfk 90008 2022-10-27T15:15:45Z 2022-10-27T15:15:45Z CONTRIBUTOR

but that would be a lot of work especially for such a critical piece of code in Xarray.

Agreed. I'll take the small wins where I can :D.

Great! I think this will be a good addition with: https://github.com/pydata/xarray/pull/7223#discussion_r1007023769

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Actually make the fast code path return early for Aligner.align 1423321834
1292299499 https://github.com/pydata/xarray/pull/7223#issuecomment-1292299499 https://api.github.com/repos/pydata/xarray/issues/7223 IC_kwDOAMm_X85NBuzr hmaarrfk 90008 2022-10-26T16:24:56Z 2022-10-26T16:24:56Z CONTRIBUTOR

ok naming is always hard. I tried to pick a good name.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset insertion benchmark 1423916687
1291948502 https://github.com/pydata/xarray/pull/7221#issuecomment-1291948502 https://api.github.com/repos/pydata/xarray/issues/7221 IC_kwDOAMm_X85NAZHW hmaarrfk 90008 2022-10-26T12:19:49Z 2022-10-26T12:23:46Z CONTRIBUTOR

I know it is not comparable, but I was really curious what "dictionary insertion" costs, in order to be able to understand if my comparisons were fair:

code ```python from tqdm import tqdm import xarray as xr from time import perf_counter import numpy as np N = 1000 # Everybody is lazy loading now, so lets force modules to get instantiated dummy_dataset = xr.Dataset() dummy_dataset['a'] = 1 dummy_dataset['b'] = 1 del dummy_dataset time_elapsed = np.zeros(N) # dataset = xr.Dataset() dataset = {} for i in tqdm(range(N)): # for i in range(N): time_start = perf_counter() dataset[f"var{i}"] = i time_end = perf_counter() time_elapsed[i] = time_end - time_start # %% from matplotlib import pyplot as plt plt.plot(np.arange(N), time_elapsed * 1E6, label='Time to add one variable') plt.xlabel("Number of existing variables") plt.ylabel("Time to add a variables (us)") plt.ylim([0, 10]) plt.title("Dictionary insertion") plt.grid(True) ```

I think xarray gives me 3 order of magnitude of "thinking" benefit, so I'll take it! python --version Python 3.9.13

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove debugging slow assert statement 1423312198
1291894024 https://github.com/pydata/xarray/pull/7221#issuecomment-1291894024 https://api.github.com/repos/pydata/xarray/issues/7221 IC_kwDOAMm_X85NAL0I hmaarrfk 90008 2022-10-26T11:32:32Z 2022-10-26T11:32:32Z CONTRIBUTOR

Ok. I'll want to rethink them.

I know it looks quadratic time, but i really would like to test n=1000 and i have an idea

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove debugging slow assert statement 1423312198
1291450556 https://github.com/pydata/xarray/pull/7221#issuecomment-1291450556 https://api.github.com/repos/pydata/xarray/issues/7221 IC_kwDOAMm_X85M-fi8 hmaarrfk 90008 2022-10-26T03:32:53Z 2022-10-26T03:32:53Z CONTRIBUTOR

I'm somewhat ocnfused, I can run the benchmark locally

``` [ 1.80%] ··· dataset_creation.Creation.time_dataset_creation 4.37±0s

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove debugging slow assert statement 1423312198
1291447746 https://github.com/pydata/xarray/pull/7221#issuecomment-1291447746 https://api.github.com/repos/pydata/xarray/issues/7221 IC_kwDOAMm_X85M-e3C hmaarrfk 90008 2022-10-26T03:27:36Z 2022-10-26T03:27:36Z CONTRIBUTOR

:/ not fun, the benchmark is failing. not sure why.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove debugging slow assert statement 1423312198
1291405225 https://github.com/pydata/xarray/pull/7222#issuecomment-1291405225 https://api.github.com/repos/pydata/xarray/issues/7222 IC_kwDOAMm_X85M-Uep hmaarrfk 90008 2022-10-26T02:19:23Z 2022-10-26T02:19:23Z CONTRIBUTOR

I think the rapid return, helps by about 40% is still pretty good.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Actually make the fast code path return early for Aligner.align 1423321834
1291402576 https://github.com/pydata/xarray/pull/7222#issuecomment-1291402576 https://api.github.com/repos/pydata/xarray/issues/7222 IC_kwDOAMm_X85M-T1Q hmaarrfk 90008 2022-10-26T02:17:45Z 2022-10-26T02:17:45Z CONTRIBUTOR

hmm ok. it seems i can't blatently avoid the copy like that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Actually make the fast code path return early for Aligner.align 1423321834
1291399714 https://github.com/pydata/xarray/pull/7221#issuecomment-1291399714 https://api.github.com/repos/pydata/xarray/issues/7221 IC_kwDOAMm_X85M-TIi hmaarrfk 90008 2022-10-26T02:14:40Z 2022-10-26T02:14:40Z CONTRIBUTOR

Would be interesting to see whether this was covered by our existing asv benchmarks.

I wasn't able to find something that really benchmarked "large" datasets.

Would be a good benchmark to add if we don't have one already.

Added one.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove debugging slow assert statement 1423312198
1291391647 https://github.com/pydata/xarray/pull/7222#issuecomment-1291391647 https://api.github.com/repos/pydata/xarray/issues/7222 IC_kwDOAMm_X85M-RKf hmaarrfk 90008 2022-10-26T02:03:41Z 2022-10-26T02:03:41Z CONTRIBUTOR

The reason this is a separate merge request, is that I agree that this is more contentious as a change.

However, I will argue that Aligner should really not be a class.

Using ripgrep you find that the only instances of Aligner exist internally: ` xarray/core/dataset.py 2775: aligner: alignment.Aligner, 2783: """Callback called fromAligner`` to create a new reindexed Dataset."""

xarray/core/alignment.py 107:class Aligner(Generic[DataAlignable]): 114: aligner = Aligner(objects, *kwargs) <------- Example 767: aligner = Aligner( <----------- Used and consumed for the method align 881: aligner = Aligner( <----------- Used and consumed for the method reindex 909: # This check is not performed in Aligner.

xarray/core/dataarray.py 1752: aligner: alignment.Aligner, 1760: """Callback called from Aligner to create a new reindexed DataArray.""" ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Actually make the fast code path return early for Aligner.align 1423321834
1291389702 https://github.com/pydata/xarray/pull/7221#issuecomment-1291389702 https://api.github.com/repos/pydata/xarray/issues/7221 IC_kwDOAMm_X85M-QsG hmaarrfk 90008 2022-10-26T01:59:57Z 2022-10-26T01:59:57Z CONTRIBUTOR

out of interest, how did you find this?

Spyder profiler

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove debugging slow assert statement 1423312198
1281117607 https://github.com/pydata/xarray/pull/7172#issuecomment-1281117607 https://api.github.com/repos/pydata/xarray/issues/7172 IC_kwDOAMm_X85MXE2n hmaarrfk 90008 2022-10-17T16:11:37Z 2022-10-17T16:11:37Z CONTRIBUTOR

Thank you all for taking the time to study, and worry about these improvements.

Now i have to figure out how my software went from 2 sec loading time to 12 ;) Totally unrelated to this. But one day I'll have benchmarking in place to monitor it :D.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Lazy import dask.distributed to reduce import time of xarray 1410575877
1280208522 https://github.com/pydata/xarray/pull/7172#issuecomment-1280208522 https://api.github.com/repos/pydata/xarray/issues/7172 IC_kwDOAMm_X85MTm6K hmaarrfk 90008 2022-10-17T02:59:41Z 2022-10-17T02:59:41Z CONTRIBUTOR

Separate issue, but do these need to be imported into xarray/init.py

At this point removing testing and tutorial would be strange and break things. Stefan in the discussion linked above speaks about the reasoning behind importing submodules in the top level namespace.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Lazy import dask.distributed to reduce import time of xarray 1410575877
1280072309 https://github.com/pydata/xarray/issues/6726#issuecomment-1280072309 https://api.github.com/repos/pydata/xarray/issues/6726 IC_kwDOAMm_X85MTFp1 hmaarrfk 90008 2022-10-16T22:33:17Z 2022-10-16T22:33:17Z CONTRIBUTOR

In developing https://github.com/pydata/xarray/pull/7172, there are also some places where class types are used to check for features: https://github.com/pydata/xarray/blob/main/xarray/core/pycompat.py#L35

Dask and sparse and big contributors due to their need to resolve the class name in question.

Ultimately. I think it is important to maybe constrain the problem.

Are we ok with 100 ms over numpy + pandas? 20 ms?

On my machines, the 0.5 s that xarray is close to seems long... but everytime I look at it, it seems to "just be a python problem".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Long import time 1284475176
1185946473 https://github.com/pydata/xarray/issues/6791#issuecomment-1185946473 https://api.github.com/repos/pydata/xarray/issues/6791 IC_kwDOAMm_X85GsBtp hmaarrfk 90008 2022-07-15T21:11:19Z 2022-09-12T22:48:50Z CONTRIBUTOR

I guess the code: ```python import xarray as xr dataset = xr.Dataset()

my_variable = np.asarray(dataset.get('my_variable', np.asarray(1.0))) ``` coerces things as an array.

Talking things out made me find this one. Though it doesn't read very well.

Feel free to close.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  get_data or get_varibale method 1306457778
1213089164 https://github.com/pydata/xarray/pull/6910#issuecomment-1213089164 https://api.github.com/repos/pydata/xarray/issues/6910 IC_kwDOAMm_X85ITkWM hmaarrfk 90008 2022-08-12T13:04:19Z 2022-08-12T13:04:19Z CONTRIBUTOR

Are the functions you are considering using this functions that never had keyword arguments before? When I wrote a similar decorator before i had an explicit list of arguments that were allowed to be converted.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  decorator to deprecate positional arguments 1337166287
1213031961 https://github.com/pydata/xarray/issues/5531#issuecomment-1213031961 https://api.github.com/repos/pydata/xarray/issues/5531 IC_kwDOAMm_X85ITWYZ hmaarrfk 90008 2022-08-12T11:53:07Z 2022-08-12T11:53:07Z CONTRIBUTOR

These decorators are kinda fun to write and are quite taylored to a certain release philosophy.

It might be warranted to just write your own ;)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Keyword only args for arguments like "drop" 929840699
1186019342 https://github.com/pydata/xarray/issues/6791#issuecomment-1186019342 https://api.github.com/repos/pydata/xarray/issues/6791 IC_kwDOAMm_X85GsTgO hmaarrfk 90008 2022-07-15T23:23:30Z 2022-07-15T23:23:30Z CONTRIBUTOR

Interesting.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  get_data or get_varibale method 1306457778
1102962705 https://github.com/pydata/xarray/issues/5531#issuecomment-1102962705 https://api.github.com/repos/pydata/xarray/issues/5531 IC_kwDOAMm_X85BveAR hmaarrfk 90008 2022-04-19T18:34:07Z 2022-04-19T18:34:07Z CONTRIBUTOR

I think in my readme i suggest vedoring the code.

Happy to give you a license for it so you don't need to credit me in addition to your own license.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Keyword only args for arguments like "drop" 929840699
1094086518 https://github.com/pydata/xarray/issues/6309#issuecomment-1094086518 https://api.github.com/repos/pydata/xarray/issues/6309 IC_kwDOAMm_X85BNm92 hmaarrfk 90008 2022-04-09T17:06:13Z 2022-04-09T17:06:13Z CONTRIBUTOR

@max-sixty unfortunately, I think the way hdf5 is designed, it doesn't try to be too smart about what would be the best fine tuning for your particular system. In some ways, this is the correct approach.

The current constructor pathway: https://github.com/pydata/xarray/blob/main/xarray/backends/h5netcdf_.py#L164

Doesn't provide a user with a catchall-kwargs. I think this would be an acceptable solution.

I should say that the the performance of the direct driver is terrible without aligned data: https://github.com/Unidata/netcdf-c/pull/2206#issuecomment-1054855769

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read/Write performance optimizations for netcdf files 1152047670
1052386013 https://github.com/pydata/xarray/issues/6309#issuecomment-1052386013 https://api.github.com/repos/pydata/xarray/issues/6309 IC_kwDOAMm_X84-uiLd hmaarrfk 90008 2022-02-26T17:57:33Z 2022-02-26T17:57:33Z CONTRIBUTOR

I have to elaborate that this may be even more important for users that READ the data back alot. Reading with the standard Xarray operands hits other limits, but one limit that it definitely hits is that of the HDF5 driver used.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read/Write performance optimizations for netcdf files 1152047670
1009823872 https://github.com/pydata/xarray/pull/6154#issuecomment-1009823872 https://api.github.com/repos/pydata/xarray/issues/6154 IC_kwDOAMm_X848MLCA hmaarrfk 90008 2022-01-11T10:28:51Z 2022-01-11T10:28:51Z CONTRIBUTOR

Thanks for merging so quickly

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use base ImportError not MoudleNotFoundError when testing for plugins 1098924491
1009820092 https://github.com/pydata/xarray/issues/6153#issuecomment-1009820092 https://api.github.com/repos/pydata/xarray/issues/6153 IC_kwDOAMm_X848MKG8 hmaarrfk 90008 2022-01-11T10:24:37Z 2022-01-11T10:24:37Z CONTRIBUTOR

Thank you @kmuehlbauer for the explicit PR link.

I do plan on adding alignment features to h5py then to bring it toward h5netcdf. So I think something like this will be useful in the future.

Feature request link: https://github.com/h5py/h5py/issues/2034

{
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  [FEATURE]: to_netcdf and additional keyword arguments 1098915891
1009802137 https://github.com/pydata/xarray/pull/6154#issuecomment-1009802137 https://api.github.com/repos/pydata/xarray/issues/6154 IC_kwDOAMm_X848MFuZ hmaarrfk 90008 2022-01-11T10:14:09Z 2022-01-11T10:14:09Z CONTRIBUTOR

ImportError is a superset of ModuleNotFoundError. https://github.com/python/cpython/blob/f4c03484da59049eb62a9bf7777b963e2267d187/Lib/test/exception_hierarchy.txt#L19

So it depends what question you care about asking:

  1. Does the python exist? You should test for ModuleNotFoundError
  2. Is the package usable? You should probably test for ImportError

I think question 2 is friendlier to xarray users.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use base ImportError not MoudleNotFoundError when testing for plugins 1098924491
1008227895 https://github.com/pydata/xarray/issues/2347#issuecomment-1008227895 https://api.github.com/repos/pydata/xarray/issues/2347 IC_kwDOAMm_X848GFY3 hmaarrfk 90008 2022-01-09T04:28:49Z 2022-01-09T04:28:49Z CONTRIBUTOR

This is likely true. Thanks for looking back into this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Serialization of just coordinates 347962055
786813358 https://github.com/pydata/xarray/issues/2799#issuecomment-786813358 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDc4NjgxMzM1OA== hmaarrfk 90008 2021-02-26T18:19:28Z 2021-02-26T18:19:28Z CONTRIBUTOR

I hope the following can help users that struggle with the speed of xarray:

I've found that when doing numerical computation, I often use the xarray to grab all the metadata relevant to my computation. Scale, chromaticity, experimental information.

Eventually, i create a function that acts as a barrier: - Xarray input (high level experimental data) - Computation parameters output (low level implementation detail relevant information).

The low level implementation can operate on the fast numpy arrays. I've found this to be the struggle with creating high level APIs that do things like sanitize inputs (xarray routines like _validate_indexers and _broadcast_indexes) and low level APIs that are simply interested in moving and computing data.

For the example that @nbren12 brought up originally, it might be better to create xarray routines (if they don't exist already) that can create fast iterators for the underlying numpy arrays given a set of dimensions that the user cares about.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
735759416 https://github.com/pydata/xarray/pull/4400#issuecomment-735759416 https://api.github.com/repos/pydata/xarray/issues/4400 MDEyOklzc3VlQ29tbWVudDczNTc1OTQxNg== hmaarrfk 90008 2020-11-30T12:33:33Z 2020-11-30T12:33:33Z CONTRIBUTOR

I think you should be able to define your own custom encoder if you want it to be a datetime.

But inevitably, you will have to define your own save and load functions.

Python, by definition of being such a loose language, allows you to do things that the original developers never really imagined.

this can sometimes lead to silent corruption.like the one you've experienced.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Support nano second time encoding. 690546795
735428830 https://github.com/pydata/xarray/issues/1672#issuecomment-735428830 https://api.github.com/repos/pydata/xarray/issues/1672 MDEyOklzc3VlQ29tbWVudDczNTQyODgzMA== hmaarrfk 90008 2020-11-29T17:34:44Z 2020-11-29T17:35:04Z CONTRIBUTOR

It isn't really part of any library. I don't really have plans of making it into a public library. I think the discussion is really around the xarray API, and what functions to implement at first.

Then somebody can take the code and integrate it into the decided upon API.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Append along an unlimited dimension to an existing netCDF file 269700511
735428578 https://github.com/pydata/xarray/pull/4400#issuecomment-735428578 https://api.github.com/repos/pydata/xarray/issues/4400 MDEyOklzc3VlQ29tbWVudDczNTQyODU3OA== hmaarrfk 90008 2020-11-29T17:32:37Z 2020-11-29T17:32:37Z CONTRIBUTOR

yeah, i'm not too sure. I think the idea is that this breaks compatibility with netcdf times, so the resulting file is thus not standard.

For my application, us timing is enough.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Support nano second time encoding. 690546795
685222909 https://github.com/pydata/xarray/issues/1672#issuecomment-685222909 https://api.github.com/repos/pydata/xarray/issues/1672 MDEyOklzc3VlQ29tbWVudDY4NTIyMjkwOQ== hmaarrfk 90008 2020-09-02T01:17:05Z 2020-09-02T01:17:05Z CONTRIBUTOR

Small prototype, but maybe it can help boost the development.

```python import netCDF4 def _expand_variable(nc_variable, data, expanding_dim, nc_shape, added_size): # For time deltas, we must ensure that we use the same encoding as # what was previously stored. # We likely need to do this as well for variables that had custom # econdings too if hasattr(nc_variable, 'calendar'): data.encoding = { 'units': nc_variable.units, 'calendar': nc_variable.calendar, } data_encoded = xr.conventions.encode_cf_variable(data) # , name=name) left_slices = data.dims.index(expanding_dim) right_slices = data.ndim - left_slices - 1 nc_slice = (slice(None),) * left_slices + (slice(nc_shape, nc_shape + added_size),) + (slice(None),) * (right_slices) nc_variable[nc_slice] = data_encoded.data def append_to_netcdf(filename, ds_to_append, unlimited_dims): if isinstance(unlimited_dims, str): unlimited_dims = [unlimited_dims] if len(unlimited_dims) != 1: # TODO: change this so it can support multiple expanding dims raise ValueError( "We only support one unlimited dim for now, " f"got {len(unlimited_dims)}.") unlimited_dims = list(set(unlimited_dims)) expanding_dim = unlimited_dims[0] with netCDF4.Dataset(filename, mode='a') as nc: nc_dims = set(nc.dimensions.keys()) nc_coord = nc[expanding_dim] nc_shape = len(nc_coord) added_size = len(ds_to_append[expanding_dim]) variables, attrs = xr.conventions.encode_dataset_coordinates(ds_to_append) for name, data in variables.items(): if expanding_dim not in data.dims: # Nothing to do, data assumed to the identical continue nc_variable = nc[name] _expand_variable(nc_variable, data, expanding_dim, nc_shape, added_size) from xarray.tests.test_dataset import create_append_test_data from xarray.testing import assert_equal ds, ds_to_append, ds_with_new_var = create_append_test_data() filename = 'test_dataset.nc' ds.to_netcdf(filename, mode='w', unlimited_dims=['time']) append_to_netcdf('test_dataset.nc', ds_to_append, unlimited_dims='time') loaded = xr.load_dataset('test_dataset.nc') assert_equal(xr.concat([ds, ds_to_append], dim="time"), loaded) ```
{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Append along an unlimited dimension to an existing netCDF file 269700511
685200043 https://github.com/pydata/xarray/issues/4183#issuecomment-685200043 https://api.github.com/repos/pydata/xarray/issues/4183 MDEyOklzc3VlQ29tbWVudDY4NTIwMDA0Mw== hmaarrfk 90008 2020-09-02T00:13:30Z 2020-09-02T00:13:30Z CONTRIBUTOR

i ran into this problem trying to round trip time to the nanosecond (even though i don't need it, sub micro second would be nice)

but unfrotunately, you run into the fact that cftime doesn't support nanoseconds https://github.com/Unidata/cftime/blob/master/cftime/_cftime.pyx

Seems like they discussed a nanosecond issue a while back too https://github.com/Unidata/cftime/issues/77

Their ultimate point was that there was little point in having precision down to the nano second given that python datetime objects only have microseconds. I guess they are right.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unable to decode a date in nanoseconds 646038170
684833575 https://github.com/pydata/xarray/issues/1672#issuecomment-684833575 https://api.github.com/repos/pydata/xarray/issues/1672 MDEyOklzc3VlQ29tbWVudDY4NDgzMzU3NQ== hmaarrfk 90008 2020-09-01T12:58:52Z 2020-09-01T12:58:52Z CONTRIBUTOR

I think I got a basic prototype working.

That said, I think a real challenge lies in supporting the numerous backends and lazy arrays.

For example, I was only able to add data in peculiar fashions using the netcdf4 library which may trigger complex computations many times.

Is this a use case that we must optimize for now?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Append along an unlimited dimension to an existing netCDF file 269700511
684064522 https://github.com/pydata/xarray/pull/4395#issuecomment-684064522 https://api.github.com/repos/pydata/xarray/issues/4395 MDEyOklzc3VlQ29tbWVudDY4NDA2NDUyMg== hmaarrfk 90008 2020-08-31T21:59:28Z 2020-08-31T21:59:28Z CONTRIBUTOR

I'm not too sure about this anymore.

with the way the test is written now, it is unclear to me if the store should be closed afterward.

I'm also unsure of how to deal with the case where the user passed it a ZipStore instead of a string. Will have to keep thinking.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Ensure that zarr.ZipStores are closed 689502005
680060278 https://github.com/pydata/xarray/issues/2803#issuecomment-680060278 https://api.github.com/repos/pydata/xarray/issues/2803 MDEyOklzc3VlQ29tbWVudDY4MDA2MDI3OA== hmaarrfk 90008 2020-08-25T14:29:18Z 2020-08-25T14:29:18Z CONTRIBUTOR

Sorry for noise. It seems that 1D arrays are still supported.

I still had a 2D array lingering in my codebase.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Test failure with TestValidateAttrs.test_validating_attrs 417542619
679399158 https://github.com/pydata/xarray/issues/2803#issuecomment-679399158 https://api.github.com/repos/pydata/xarray/issues/2803 MDEyOklzc3VlQ29tbWVudDY3OTM5OTE1OA== hmaarrfk 90008 2020-08-24T22:31:09Z 2020-08-24T22:31:09Z CONTRIBUTOR

With the netcdf4 back end, I'm not able to save a 1D attr dataset.

I can save my dataset with the h5netcdf backend

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Test failure with TestValidateAttrs.test_validating_attrs 417542619
679348131 https://github.com/pydata/xarray/issues/2803#issuecomment-679348131 https://api.github.com/repos/pydata/xarray/issues/2803 MDEyOklzc3VlQ29tbWVudDY3OTM0ODEzMQ== hmaarrfk 90008 2020-08-24T20:26:49Z 2020-08-24T20:26:49Z CONTRIBUTOR

Sorry for posting on such an old thread.

Are attrs supposed to support 1D arrays?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Test failure with TestValidateAttrs.test_validating_attrs 417542619
604220931 https://github.com/pydata/xarray/pull/3888#issuecomment-604220931 https://api.github.com/repos/pydata/xarray/issues/3888 MDEyOklzc3VlQ29tbWVudDYwNDIyMDkzMQ== hmaarrfk 90008 2020-03-26T04:23:05Z 2020-03-26T04:23:05Z CONTRIBUTOR

xfail just gets forgotten, so i'll leave it for now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] [DEMO] Add tests for ZipStore for zarr 587398134
604181264 https://github.com/pydata/xarray/issues/3815#issuecomment-604181264 https://api.github.com/repos/pydata/xarray/issues/3815 MDEyOklzc3VlQ29tbWVudDYwNDE4MTI2NA== hmaarrfk 90008 2020-03-26T01:49:45Z 2020-03-26T01:49:45Z CONTRIBUTOR

And actually, zarr provides a data argument in create_dataset that actually encounters the same bug python import zarr import numpy as np name = 'hello' data = np.array('world', dtype='<U5') store = zarr.ZipStore('test_store.zip', mode='w') root = zarr.open(store , mode='w') zarr_array = root.create_dataset(name, data=data, shape=data.shape, dtype=data.dtype) zarr_array[...]

I guess i can open upstream in zarr, but I think for catching the 0 sized array case, it is probably best to use the data argument.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening from zarr.ZipStore fails to read (store???) unicode characters 573577844
604180511 https://github.com/pydata/xarray/issues/3815#issuecomment-604180511 https://api.github.com/repos/pydata/xarray/issues/3815 MDEyOklzc3VlQ29tbWVudDYwNDE4MDUxMQ== hmaarrfk 90008 2020-03-26T01:46:52Z 2020-03-26T01:46:52Z CONTRIBUTOR

I think the reason is that for zero sized arrays, you technically aren't allowed to write data to them.

https://github.com/pydata/xarray/blob/6378a711d50ba7f1ba9b2a451d4d1f5e1fb37353/xarray/backends/zarr.py#L449

This means that when you create the 0 sized array, you can't actually change the value.

Here is a reproducer without xarray python import zarr import numpy as np name = 'hello' data = np.array('world', dtype='<U5') store = zarr.ZipStore('test_store.zip', mode='w') root = zarr.open(store , mode='w') zarr_array = root.create_dataset(name, shape=data.shape, dtype=data.dtype) root[name][...] = data zarr_array[...]

Though the code path follows what xarray does in the backend.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening from zarr.ZipStore fails to read (store???) unicode characters 573577844
604169147 https://github.com/pydata/xarray/pull/3888#issuecomment-604169147 https://api.github.com/repos/pydata/xarray/issues/3888 MDEyOklzc3VlQ29tbWVudDYwNDE2OTE0Nw== hmaarrfk 90008 2020-03-26T01:05:05Z 2020-03-26T01:05:05Z CONTRIBUTOR

Alright, it probably makes more sense to reopen this when the issue gets fixed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] [DEMO] Add tests for ZipStore for zarr 587398134
604160143 https://github.com/pydata/xarray/pull/3888#issuecomment-604160143 https://api.github.com/repos/pydata/xarray/issues/3888 MDEyOklzc3VlQ29tbWVudDYwNDE2MDE0Mw== hmaarrfk 90008 2020-03-26T00:31:15Z 2020-03-26T00:31:15Z CONTRIBUTOR

wouldn't this be a useful to test to have? I think the ability to save things in a zip store is quite useful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] [DEMO] Add tests for ZipStore for zarr 587398134
604009141 https://github.com/pydata/xarray/issues/3815#issuecomment-604009141 https://api.github.com/repos/pydata/xarray/issues/3815 MDEyOklzc3VlQ29tbWVudDYwNDAwOTE0MQ== hmaarrfk 90008 2020-03-25T18:26:58Z 2020-03-25T18:26:58Z CONTRIBUTOR

@jakirkham not sure if you have any thoughts on why the code above is bugging out.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening from zarr.ZipStore fails to read (store???) unicode characters 573577844
604008658 https://github.com/pydata/xarray/issues/3815#issuecomment-604008658 https://api.github.com/repos/pydata/xarray/issues/3815 MDEyOklzc3VlQ29tbWVudDYwNDAwODY1OA== hmaarrfk 90008 2020-03-25T18:26:10Z 2020-03-25T18:26:10Z CONTRIBUTOR

Honestly, i've found that __ coordinates are treated as less strict coordinates, so I've been using those.

Keeping things like: * The version of different software that was used.

Seems more like an attribute, but really, it is all data, so :/.

Regarding the originally issue, it seems that you are right in the sense that a 0 dimension string might be buggy in zarr itself.

I guess we (when we have time) will have to dig down to find a MVC example that reproduces the issue without xarray to submit to Zarr.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening from zarr.ZipStore fails to read (store???) unicode characters 573577844
603921577 https://github.com/pydata/xarray/issues/3815#issuecomment-603921577 https://api.github.com/repos/pydata/xarray/issues/3815 MDEyOklzc3VlQ29tbWVudDYwMzkyMTU3Nw== hmaarrfk 90008 2020-03-25T15:54:37Z 2020-03-25T15:54:37Z CONTRIBUTOR

Hmm, interesting!

I've avoided attrs since they often get "lost" in computation, and don't get dragged along as rigorously as coordinates.

I do have some real coordinates that are stored as strings.

Thanks for the quickfeedback.

Here is the reproducing code without using context managers (which auto clsoe things you know)

```python import xarray as xr import zarr x = xr.Dataset() x['hello'] = 'world' x with zarr.ZipStore('test_store.zip', mode='w') as store: x.to_zarr(store)

read_store = zarr.ZipStore('test_store.zip', mode='r')
x_read = xr.open_zarr(read_store).compute()

The error will happen before this line is executed

read_store.close()

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening from zarr.ZipStore fails to read (store???) unicode characters 573577844
603608958 https://github.com/pydata/xarray/issues/3815#issuecomment-603608958 https://api.github.com/repos/pydata/xarray/issues/3815 MDEyOklzc3VlQ29tbWVudDYwMzYwODk1OA== hmaarrfk 90008 2020-03-25T02:45:12Z 2020-03-25T02:45:12Z CONTRIBUTOR

I will have to try the debugging things you mentionned some later time :/

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening from zarr.ZipStore fails to read (store???) unicode characters 573577844
603608822 https://github.com/pydata/xarray/issues/3815#issuecomment-603608822 https://api.github.com/repos/pydata/xarray/issues/3815 MDEyOklzc3VlQ29tbWVudDYwMzYwODgyMg== hmaarrfk 90008 2020-03-25T02:44:40Z 2020-03-25T02:44:40Z CONTRIBUTOR

Not sure if the builds in https://github.com/pydata/xarray/pull/3888 help reproduce things or not?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening from zarr.ZipStore fails to read (store???) unicode characters 573577844
603601762 https://github.com/pydata/xarray/issues/3815#issuecomment-603601762 https://api.github.com/repos/pydata/xarray/issues/3815 MDEyOklzc3VlQ29tbWVudDYwMzYwMTc2Mg== hmaarrfk 90008 2020-03-25T02:16:29Z 2020-03-25T02:16:29Z CONTRIBUTOR

hmm i didn't realize this. I"m running from conda-forge + linux.

Let me try on your CIs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening from zarr.ZipStore fails to read (store???) unicode characters 573577844
603556048 https://github.com/pydata/xarray/issues/3815#issuecomment-603556048 https://api.github.com/repos/pydata/xarray/issues/3815 MDEyOklzc3VlQ29tbWVudDYwMzU1NjA0OA== hmaarrfk 90008 2020-03-24T23:24:53Z 2020-03-24T23:24:53Z CONTRIBUTOR

See the zipstore example in my first comment

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening from zarr.ZipStore fails to read (store???) unicode characters 573577844
603555953 https://github.com/pydata/xarray/issues/3815#issuecomment-603555953 https://api.github.com/repos/pydata/xarray/issues/3815 MDEyOklzc3VlQ29tbWVudDYwMzU1NTk1Mw== hmaarrfk 90008 2020-03-24T23:24:34Z 2020-03-24T23:24:34Z CONTRIBUTOR

I thought I provided it, but in either case, here is my traceback

```python In [3]: import xarray as xr ...: import zarr ...: x = xr.Dataset() ...: x['hello'] = 'world' ...: x ...: with zarr.ZipStore('test_store.zip', mode='w') as store: ...: x.to_zarr(store) ...: with zarr.ZipStore('test_store.zip', mode='r') as store: ...: x_read = xr.open_zarr(store).compute() ...: --------------------------------------------------------------------------- BadZipFile Traceback (most recent call last) <ipython-input-3-5ad5a0456766> in <module> 7 x.to_zarr(store) 8 with zarr.ZipStore('test_store.zip', mode='r') as store: ----> 9 x_read = xr.open_zarr(store).compute() 10 ~/miniconda3/envs/mcam_dev/lib/python3.7/site-packages/xarray/core/dataset.py in compute(self, **kwargs) 805 """ 806 new = self.copy(deep=False) --> 807 return new.load(**kwargs) 808 809 def _persist_inplace(self, **kwargs) -> "Dataset": ~/miniconda3/envs/mcam_dev/lib/python3.7/site-packages/xarray/core/dataset.py in load(self, **kwargs) 657 for k, v in self.variables.items(): 658 if k not in lazy_data: --> 659 v.load() 660 661 return self ~/miniconda3/envs/mcam_dev/lib/python3.7/site-packages/xarray/core/variable.py in load(self, **kwargs) 373 self._data = as_compatible_data(self._data.compute(**kwargs)) 374 elif not hasattr(self._data, "__array_function__"): --> 375 self._data = np.asarray(self._data) 376 return self 377 ~/miniconda3/envs/mcam_dev/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~/miniconda3/envs/mcam_dev/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype) 555 def __array__(self, dtype=None): 556 array = as_indexable(self.array) --> 557 return np.asarray(array[self.key], dtype=None) 558 559 def transpose(self, order): ~/miniconda3/envs/mcam_dev/lib/python3.7/site-packages/xarray/backends/zarr.py in __getitem__(self, key) 47 array = self.get_array() 48 if isinstance(key, indexing.BasicIndexer): ---> 49 return array[key.tuple] 50 elif isinstance(key, indexing.VectorizedIndexer): 51 return array.vindex[ ~/miniconda3/envs/mcam_dev/lib/python3.7/site-packages/zarr/core.py in __getitem__(self, selection) 570 571 fields, selection = pop_fields(selection) --> 572 return self.get_basic_selection(selection, fields=fields) 573 574 def get_basic_selection(self, selection=Ellipsis, out=None, fields=None): ~/miniconda3/envs/mcam_dev/lib/python3.7/site-packages/zarr/core.py in get_basic_selection(self, selection, out, fields) 693 if self._shape == (): 694 return self._get_basic_selection_zd(selection=selection, out=out, --> 695 fields=fields) 696 else: 697 return self._get_basic_selection_nd(selection=selection, out=out, ~/miniconda3/envs/mcam_dev/lib/python3.7/site-packages/zarr/core.py in _get_basic_selection_zd(self, selection, out, fields) 709 # obtain encoded data for chunk 710 ckey = self._chunk_key((0,)) --> 711 cdata = self.chunk_store[ckey] 712 713 except KeyError: ~/miniconda3/envs/mcam_dev/lib/python3.7/site-packages/zarr/storage.py in __getitem__(self, key) 1249 with self.mutex: 1250 with self.zf.open(key) as f: # will raise KeyError -> 1251 return f.read() 1252 1253 def __setitem__(self, key, value): ~/miniconda3/envs/mcam_dev/lib/python3.7/zipfile.py in read(self, n) 914 self._offset = 0 915 while not self._eof: --> 916 buf += self._read1(self.MAX_N) 917 return buf 918 ~/miniconda3/envs/mcam_dev/lib/python3.7/zipfile.py in _read1(self, n) 1018 if self._left <= 0: 1019 self._eof = True -> 1020 self._update_crc(data) 1021 return data 1022 ~/miniconda3/envs/mcam_dev/lib/python3.7/zipfile.py in _update_crc(self, newdata) 946 # Check the CRC if we're at the end of the file 947 if self._eof and self._running_crc != self._expected_crc: --> 948 raise BadZipFile("Bad CRC-32 for file %r" % self.name) 949 950 def read1(self, n): BadZipFile: Bad CRC-32 for file 'hello/0' ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening from zarr.ZipStore fails to read (store???) unicode characters 573577844
603190621 https://github.com/pydata/xarray/issues/3815#issuecomment-603190621 https://api.github.com/repos/pydata/xarray/issues/3815 MDEyOklzc3VlQ29tbWVudDYwMzE5MDYyMQ== hmaarrfk 90008 2020-03-24T11:41:37Z 2020-03-24T11:41:37Z CONTRIBUTOR

My guess is that that xarray might be trying to write to the store character by character???

Otherwise, not too sure.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening from zarr.ZipStore fails to read (store???) unicode characters 573577844
552652019 https://github.com/pydata/xarray/issues/2799#issuecomment-552652019 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDU1MjY1MjAxOQ== hmaarrfk 90008 2019-11-11T22:47:47Z 2019-11-11T22:47:47Z CONTRIBUTOR

Sure, I just wanted to make the note that this operation should be more or less constant time, as opposed to dependent on the size of the array. Somebody had mentionned it should increase with the size of the array.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
552619589 https://github.com/pydata/xarray/issues/2799#issuecomment-552619589 https://api.github.com/repos/pydata/xarray/issues/2799 MDEyOklzc3VlQ29tbWVudDU1MjYxOTU4OQ== hmaarrfk 90008 2019-11-11T21:16:36Z 2019-11-11T21:16:36Z CONTRIBUTOR

Hmm, slicing should basically be a no-op.

The fact that xarray makes it about 100x slower is a real killer. It seems from this conversation that it might be hard to workaround

```python import xarray as xr import numpy as np n = np.zeros(shape=(1024, 1024)) x = xr.DataArray(n, dims=('y', 'x')) the_slice = np.s_[256:512, 256:512] %timeit n[the_slice] %timeit x[the_slice] 186 ns ± 0.778 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) 70.3 µs ± 593 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Performance: numpy indexes small amounts of data 1000 faster than xarray 416962458
451767431 https://github.com/pydata/xarray/issues/2347#issuecomment-451767431 https://api.github.com/repos/pydata/xarray/issues/2347 MDEyOklzc3VlQ29tbWVudDQ1MTc2NzQzMQ== hmaarrfk 90008 2019-01-06T19:25:53Z 2019-01-06T19:25:53Z CONTRIBUTOR

mind blown!!!! thanks for that pointer I haven't touched my serialization code in a while, kinda scared to go back to it now, but I will keep that library in mind.

I saw Zarr a while back, looks cool. I hope to see it grow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Serialization of just coordinates 347962055
451765999 https://github.com/pydata/xarray/issues/2347#issuecomment-451765999 https://api.github.com/repos/pydata/xarray/issues/2347 MDEyOklzc3VlQ29tbWVudDQ1MTc2NTk5OQ== hmaarrfk 90008 2019-01-06T19:06:53Z 2019-01-06T19:06:53Z CONTRIBUTOR

no need to be sorry. These two functions were easy enough for me to do myself in my own codebase.

There are few issues that I've found doing this though. Mainly, I can't find a good way to serialize numpy arrays in a round-trippable fashion. It is difficult to get back lists of arrays, or arrays of unit8. I don't know if you have a good way to solvle this problem.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Serialization of just coordinates 347962055
416994400 https://github.com/pydata/xarray/issues/2251#issuecomment-416994400 https://api.github.com/repos/pydata/xarray/issues/2251 MDEyOklzc3VlQ29tbWVudDQxNjk5NDQwMA== hmaarrfk 90008 2018-08-29T15:24:07Z 2018-08-29T15:24:07Z CONTRIBUTOR

@shoyer, @fmaussion thank you for your answers. I'm OK with this issue being closed. I'm no expert on netcdf4, so I don't know if I could express the issue in a concise manner there.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  netcdf roundtrip fails to preserve the shape of numpy arrays in attributes 335608017
410759337 https://github.com/pydata/xarray/pull/2344#issuecomment-410759337 https://api.github.com/repos/pydata/xarray/issues/2344 MDEyOklzc3VlQ29tbWVudDQxMDc1OTMzNw== hmaarrfk 90008 2018-08-06T16:02:09Z 2018-08-06T16:02:09Z CONTRIBUTOR

Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  FutureWarning: creation of DataArrays w/ coords Dataset 347712372
410575268 https://github.com/pydata/xarray/pull/2344#issuecomment-410575268 https://api.github.com/repos/pydata/xarray/issues/2344 MDEyOklzc3VlQ29tbWVudDQxMDU3NTI2OA== hmaarrfk 90008 2018-08-06T02:55:12Z 2018-08-06T02:55:12Z CONTRIBUTOR

Maybe the issue that I am facing is that I want to deal with the storage of my metadata and data seperately.

I used to have my own library that was replicating much of xarray's functionality, but your code is much nicer than anything I would be able to write in a finite time. :smile:

Following the information here: http://xarray.pydata.org/en/stable/data-structures.html#coordinates-methods

Currently, my serialization pipeline is: ```python import xarray as xr import numpy as np

Setup an array with coordinates

n = np.zeros(3) coords={'x': np.arange(3)} m = xr.DataArray(n, dims=['x'], coords=coords)

coords_dataset_dict = m.coords.to_dataset().to_dict() coords_dict = coords_dataset_dict['coords']

Read/Write dictionary to JSON file

This works, but I'm essentially creating an emtpy dataset for it

coords_set = xr.Dataset.from_dict(coords_dataset_dict) coords2 = coords_set.coords # so many coords :D m2 = xr.DataArray(np.zeros(shape=m.shape), dims=m.dims, coords=coords2)

I used to just pass the dataset to "coords"

m3 = xr.DataArray(np.zeros(shape=m.shape), dims=m.dims, coords=coords_set) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  FutureWarning: creation of DataArrays w/ coords Dataset 347712372
410572206 https://github.com/pydata/xarray/pull/2344#issuecomment-410572206 https://api.github.com/repos/pydata/xarray/issues/2344 MDEyOklzc3VlQ29tbWVudDQxMDU3MjIwNg== hmaarrfk 90008 2018-08-06T02:31:02Z 2018-08-06T02:31:02Z CONTRIBUTOR

Is there a better way to serialize coordinates only?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  FutureWarning: creation of DataArrays w/ coords Dataset 347712372
410572013 https://github.com/pydata/xarray/pull/2344#issuecomment-410572013 https://api.github.com/repos/pydata/xarray/issues/2344 MDEyOklzc3VlQ29tbWVudDQxMDU3MjAxMw== hmaarrfk 90008 2018-08-06T02:29:34Z 2018-08-06T02:29:34Z CONTRIBUTOR

It seems like this warning isn't benign though. I will take your suggestion though (coords=dataset.coords)

I feel like I'm not the only one who probably did this. Should you raise an other warning explicitly?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  FutureWarning: creation of DataArrays w/ coords Dataset 347712372
410532428 https://github.com/pydata/xarray/pull/2344#issuecomment-410532428 https://api.github.com/repos/pydata/xarray/issues/2344 MDEyOklzc3VlQ29tbWVudDQxMDUzMjQyOA== hmaarrfk 90008 2018-08-05T16:45:27Z 2018-08-05T16:45:27Z CONTRIBUTOR

I came across this when serializing/deserializing my coordinates to a json file.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  FutureWarning: creation of DataArrays w/ coords Dataset 347712372
410488222 https://github.com/pydata/xarray/issues/2340#issuecomment-410488222 https://api.github.com/repos/pydata/xarray/issues/2340 MDEyOklzc3VlQ29tbWVudDQxMDQ4ODIyMg== hmaarrfk 90008 2018-08-05T01:15:39Z 2018-08-05T01:15:49Z CONTRIBUTOR

Finishing up this line of though: without the assumption that the relative order of dimensions is maintained across arrays in a set, this feature is impossible to implement as a neat function call. You would have to specify exactly how to expand each of the coordinates which can get pretty long.

I wrote some code, that I think should have worked if relative ordering was a valid assumption: Here it is for reference https://github.com/hmaarrfk/xarray/pull/1

To obtain the desired effect, you have to expand the dimensions of the coordinates individually: ```python import xarray as xr import numpy as np

Setup an array with coordinates

n = np.arange(1, 13).reshape(3, 2, 2) coords={'y': np.arange(1, 4), 'x': np.arange(1, 3), 'xi': np.arange(2)}

%%

z = xr.DataArray(n[..., 0]2, dims=['y', 'x']) a = xr.DataArray(n, dims=['y', 'x', 'xi'], coords={*coords, 'z': z})

sliced = a[0] print("The original xarray") print(a.z) print("The sliced xarray") print(sliced.z)

%%

expanded = sliced.expand_dims('y', 0) expanded['z'] = expanded.z.expand_dims('y', 0) print(expanded) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  expand_dims erases named dim in the array's coordinates 347558405

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 30.44ms · About: xarray-datasette