issues
58 rows where repo = 13221727, type = "issue" and "updated_at" is on date 2022-04-09 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: comments, author_association, created_at (date), updated_at (date), closed_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1177665302 | I_kwDOAMm_X85GMb8W | 6401 | Unnecessary warning when specifying `chunks` opening dataset with empty dimension | jaicher 4666753 | closed | 0 | 0 | 2022-03-23T06:38:25Z | 2022-04-09T20:27:40Z | 2022-04-09T20:27:40Z | CONTRIBUTOR | What happened?I receive unnecessary warnings when opening Zarr datasets with empty dimensions/arrays using the If an array has zero size (due to an empty dimension), it is saved as a single chunk regardless of Dask chunking on other dimensions (#5742). If the What did you expect to happen?I expect no warning to be raised when there is no data:
Minimal Complete Verifiable Example```Python import xarray as xr import numpy as np each
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6401/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1167883842 | I_kwDOAMm_X85FnH5C | 6352 | to_netcdf from subsetted Dataset with strings loaded from char array netCDF can sometimes fail | DocOtak 868027 | open | 0 | 0 | 2022-03-14T04:52:38Z | 2022-04-09T16:59:52Z | CONTRIBUTOR | What happened?Not quite sure what to actually title this, so feel free to edit it. I have some netcdf files modeled after the Argo _prof file format (CF Discrete sampling geometry incomplete multidimensional array representation). While working on splitting these into individual profiles, I would occasionally get exceptions thrown complaining about broadcasting. I eventually narrowed this down to some string variables we maintain for historic purposes. Depending on the row split apart, the string data in each cell could be shorter which would result in a stringN having some different N (e.g. string4 = 3 in the CDL). If while serializing, a different string variable is being encoded that actually has length 4, it would reuse the now incorrect string4 dim name. The above situation seems to only occur when a netCDF file is read back into xarray and the What did you expect to happen?Successful serialization to netCDF. Minimal Complete Verifiable Example```Python setupimport numpy as np import xarray as xr one_two = xr.DataArray(np.array(["a", "aa"], dtype="object"), dims=["dim0"]) two_two = xr.DataArray(np.array(["aa", "aa"], dtype="object"), dims=["dim0"]) ds = xr.Dataset({"var0": one_two, "var1": two_two}) ds.var0.encoding["dtype"] = "S1" ds.var1.encoding["dtype"] = "S1" need to write out and read back inds.to_netcdf("test.nc") only selecting the shorter string will failds1 = xr.load_dataset("test.nc") ds1[{"dim0": 1}].to_netcdf("ok.nc") ds1[{"dim0": 0}].to_netcdf("error.nc") will work if the char dim name is removed from encoding of the now shorter arrds1 = xr.load_dataset("test.nc") del ds1.var0.encoding["char_dim_name"] ds1[{"dim0": 0}].to_netcdf("will_work.nc") ``` Relevant log output```PythonIndexError Traceback (most recent call last) /var/folders/y1/63dlf4614h5d2cgr5g1t_5lh0000gn/T/ipykernel_64155/447008818.py in <module> 2 ds1 = xr.load_dataset("test.nc") 3 ds1[{"dim0": 1}].to_netcdf("ok.nc") ----> 4 ds1[{"dim0": 0}].to_netcdf("error.nc") ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1899 from ..backends.api import to_netcdf 1900 -> 1901 return to_netcdf( 1902 self, 1903 path, ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1070 # TODO: allow this work (setting up the file for writing array data) 1071 # to be parallelized with dask -> 1072 dump_to_store( 1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1074 ) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1117 variables, attrs = encoder(variables, attrs) 1118 -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1120 1121 ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 263 self.set_attributes(attributes) 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims) --> 265 self.set_variables( 266 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 267 ) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 305 ) 306 --> 307 writer.add(source, target) 308 309 def set_dimensions(self, variables, unlimited_dims=None): ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in add(self, source, target, region) 154 target[region] = source 155 else: --> 156 target[...] = source 157 158 def sync(self, compute=True): ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in setitem(self, key, value) 70 with self.datastore.lock: 71 data = self.get_array(needs_lock=False) ---> 72 data[key] = value 73 if self.datastore.autoclose: 74 self.datastore.close(needs_lock=False) src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.setitem() src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._put() IndexError: size of data array does not conform to slice ``` Anything else we need to know?I've been unable to recreate the specific error I'm getting in a minimal example. However, removing the When digging in the xarray issues, these looked maybe relevant: #2219 #2895 Actual traceback I get with my data```python --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /var/folders/y1/63dlf4614h5d2cgr5g1t_5lh0000gn/T/ipykernel_64155/3328648456.py in <module> ----> 1 ds[{"N_PROF": 0}].to_netcdf("test.nc") ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1899 from ..backends.api import to_netcdf 1900 -> 1901 return to_netcdf( 1902 self, 1903 path, ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1070 # TODO: allow this work (setting up the file for writing array data) 1071 # to be parallelized with dask -> 1072 dump_to_store( 1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1074 ) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1117 variables, attrs = encoder(variables, attrs) 1118 -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1120 1121 ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 263 self.set_attributes(attributes) 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims) --> 265 self.set_variables( 266 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 267 ) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 305 ) 306 --> 307 writer.add(source, target) 308 309 def set_dimensions(self, variables, unlimited_dims=None): ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/common.py in add(self, source, target, region) 154 target[region] = source 155 else: --> 156 target[...] = source 157 158 def sync(self, compute=True): ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in __setitem__(self, key, value) 70 with self.datastore.lock: 71 data = self.get_array(needs_lock=False) ---> 72 data[key] = value 73 if self.datastore.autoclose: 74 self.datastore.close(needs_lock=False) src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__setitem__() ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/netCDF4/utils.py in _StartCountStride(elem, shape, dimensions, grp, datashape, put, use_get_vars) 354 fullslice = False 355 if fullslice and datashape and put and not hasunlim: --> 356 datashape = broadcasted_shape(shape, datashape) 357 358 # pad datashape with zeros for dimensions not being sliced (issue #906) ~/.dotfiles/pyenv/versions/3.9.9/envs/jupyter/lib/python3.9/site-packages/netCDF4/utils.py in broadcasted_shape(shp1, shp2) 962 a = as_strided(x, shape=shp1, strides=[0] * len(shp1)) 963 b = as_strided(x, shape=shp2, strides=[0] * len(shp2)) --> 964 return np.broadcast(a, b).shape ValueError: shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (5,) and arg 1 with shape (6,). ```EnvironmentINSTALLED VERSIONScommit: None python: 3.9.9 (main, Jan 5 2022, 11:21:18) [Clang 13.0.0 (clang-1300.0.29.30)] python-bits: 64 OS: Darwin OS-release: 21.3.0 machine: arm64 processor: arm byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.13.0 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.3.5 numpy: 1.22.0 scipy: None netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: 0.18 sparse: None setuptools: 58.1.0 pip: 21.2.4 conda: None pytest: 6.2.5 IPython: 7.31.0 sphinx: 4.4.0 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6352/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
558455147 | MDU6SXNzdWU1NTg0NTUxNDc= | 3740 | Error during slicing of a dataarray | ankitesh97 16163706 | closed | 0 | 1 | 2020-02-01T01:26:55Z | 2022-04-09T15:52:32Z | 2022-04-09T15:52:31Z | NONE | MCVE Code Sample```python Your code here``` loaded the dataset using ds = xr.open_mfdataset(in_fns, decode_times=False, decode_cf=False, concat_dim='time') Expected OutputProblem Descriptionthis my data array (da) <xarray.DataArray 'QAP' (time: 5184, lev: 30, lat: 64, lon: 128)> dask.array<concatenate, shape=(5184, 30, 64, 128), dtype=float32, chunksize=(48, 30, 64, 128), chunktype=numpy.ndarray> Coordinates: * lev (lev) float64 3.643 7.595 14.36 ... 957.5 976.3 992.6 * lon (lon) float64 0.0 2.812 5.625 8.438 ... 351.6 354.4 357.2 * lat (lat) float64 -87.86 -85.1 -82.31 ... 82.31 85.1 87.86 * time (time) float64 365.0 365.0 365.0 ... 707.9 708.0 708.0 Attributes: units: kg/kg long_name: Q after physics when I am trying to slice it via da[1:] it throws an error saying conflicting sizes for dimension 'time': length 96 on 'this-array' and length 5183 on 'time' Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3740/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
596606599 | MDU6SXNzdWU1OTY2MDY1OTk= | 3957 | Sort DataArray by data values along one dim | zxdawn 30388627 | closed | 0 | 10 | 2020-04-08T14:05:44Z | 2022-04-09T15:52:20Z | 2022-04-09T15:52:20Z | NONE |
MCVE Code Sample```python import xarray as xr import numpy as np x = 4 y = 2 z = 4 data = np.arange(xyz).reshape(z, y, x) 3d array with coordscld_1 = xr.DataArray(data, dims=['z', 'y', 'x'], coords={'z': np.arange(z)}) 2d array without coordscld_2 = xr.DataArray(np.arange(xy).reshape(y, x)1.5+1, dims=['y', 'x']) expand 2d to 3dcld_2 = cld_2.expand_dims(z=[4]) concatcld = xr.concat([cld_1, cld_2], dim='z') paired arraypair = cld.copy(data=np.arange(xy(z+1)).reshape(z+1, y, x)) print(cld) print(pair) ``` Output``` <xarray.DataArray (z: 5, y: 2, x: 4)> array([[[ 0. , 1. , 2. , 3. ], [ 4. , 5. , 6. , 7. ]],
Coordinates: * z (z) int64 0 1 2 3 4 Dimensions without coordinates: y, x <xarray.DataArray (z: 5, y: 2, x: 4)> array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7]],
Coordinates: * z (z) int64 0 1 2 3 4 Dimensions without coordinates: y, x ``` Problem DescriptionI've tried
Coordinates: * z (z) int64 0 1 2 3 4 Dimensions without coordinates: y, x ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3957/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
621177286 | MDU6SXNzdWU2MjExNzcyODY= | 4082 | "write to read-only" Error in xarray.open_mfdataset() with opendap datasets | EliT1626 65610153 | closed | 0 | 26 | 2020-05-19T18:00:58Z | 2022-04-09T15:51:46Z | 2022-04-09T15:51:46Z | NONE | Error in loading in data from a THREDDS server. Can't find any info on what might be causing it based on the error messages themselves. Code Sample ``` def list_dates(start, end): num_days = (end - start).days return [start + dt.timedelta(days=x) for x in range(num_days)] start_date = dt.date(2017, 3, 1) end_date = dt.date(2017, 3, 31) date_list = list_dates(start_date, end_date) window = dt.timedelta(days=5) url = 'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.0/AVHRR/{0:%Y%m}/avhrr-only-v2.{0:%Y%m%d}.nc' data = [] cur_date = start_date for cur_date in date_list:
dataf=xr.concat(data, dim=pd.DatetimeIndex(date_list, name='time')) ``` Expected Output No error with dataf containing a data array with the dates listed above. Error Description Error 1:
Error 2:
Versions python: 3.7.4 xarray: 0.15.0 pandas: 0.25.1 numpy: 1.16.5 scipy: 1.3.1 netcdf4: 1.5.3 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4082/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
643035732 | MDU6SXNzdWU2NDMwMzU3MzI= | 4169 | "write to read-only" Error in xarray.open_mfdataset() when trying to write to a netcdf file | EliT1626 65610153 | closed | 0 | 4 | 2020-06-22T12:35:57Z | 2022-04-09T15:50:51Z | 2022-04-09T15:50:51Z | NONE | Code Sample ``` xr.set_options(file_cache_maxsize=10) Assumes daily incrementsdef list_dates(start, end): num_days = (end - start).days return [start + dt.timedelta(days=x) for x in range(num_days)] def list_dates1(start, end): num_days = (end - start).days dates = [start + dt.timedelta(days=x) for x in range(num_days)] sorted_dates = sorted(dates, key=lambda date: (date.month, date.day)) grouped_dates = [list(g) for _, g in groupby(sorted_dates, key=lambda date: (date.month, date.day))] return grouped_dates start_date = dt.date(2010, 1, 1) end_date = dt.date(2019, 12, 31) date_list = list_dates1(start_date, end_date) window1 = dt.timedelta(days=5) window2 = dt.timedelta(days=6) url = 'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/{0:%Y%m}/oisst-avhrr-v02r01.{0:%Y%m%d}.nc' end_date2 = dt.date(2010, 1, 2) sst_mean=[] cur_date = start_date for cur_date in date_list:
sst_mean_calc = []
for i in cur_date:
date_window=list_dates(i - window1, i + window2)
url_list_window = [url.format(x) for x in date_window]
window_data=xr.open_mfdataset(url_list_window).sst
sst_mean_calc.append(window_data.mean('time')) sst_mean_climo_test=xr.concat(sst_mean, dim='time') sst_std=xr.concat(sst_std_calc, dim=pd.DatetimeIndex(date_list, name='time'))sst_min = xr.concat(sst_min_calc, dim=pd.DatetimeIndex(date_list, name='time'))sst_max = xr.concat(sst_max_calc, dim=pd.DatetimeIndex(date_list, name='time'))sst_mean_climo_test.to_netcdf(path='E:/Riskpulse_HD/SST_stuff/sst_mean_climo_test') ``` Explanation of Code This code (climatology for SSTs) creates a list of dates between the specified start and end dates that contains the same day number for every month through the year span. For example, date_list[0] contains 10 datetime dates that start with 1-1-2010, 1-1-2011...1-1-2019. I then request OISST data from an opendap server and take a centered mean of the date in question (this case I did it for the first and second of January). In other words, I am opening the files for Dec 27-Jan 6 and averaging all of them together. The final xarray dataset then contains two 'times', which is 10 years worth of data for Jan 1 and Jan 2. I want to then send this to a netcdf file such that I can save it on my local machine and use to create plots down the road. Hope this makes sense. Error Messages ``` KeyError Traceback (most recent call last) ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 197 try: --> 198 file = self._cache[self._key] 199 except KeyError: ~\Anaconda3\lib\site-packages\xarray\backends\lru_cache.py in getitem(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key) KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/201801/oisst-avhrr-v02r01.20180106.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))] During handling of the above exception, another exception occurred: RuntimeError Traceback (most recent call last) <ipython-input-3-f8395dcffb5e> in <module> 1 #xr.set_options(file_cache_maxsize=500) ----> 2 sst_mean_climo_test.to_netcdf(path='E:/Riskpulse_HD/SST_stuff/sst_mean_climo_test') ~\Anaconda3\lib\site-packages\xarray\core\dataarray.py in to_netcdf(self, args, kwargs) 2356 dataset = self.to_dataset() 2357 -> 2358 return dataset.to_netcdf(args, **kwargs) 2359 2360 def to_dict(self, data: bool = True) -> dict: ~\Anaconda3\lib\site-packages\xarray\core\dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1552 unlimited_dims=unlimited_dims, 1553 compute=compute, -> 1554 invalid_netcdf=invalid_netcdf, 1555 ) 1556 ~\Anaconda3\lib\site-packages\xarray\backends\api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1095 return writer, store 1096 -> 1097 writes = writer.sync(compute=compute) 1098 1099 if path_or_file is None: ~\Anaconda3\lib\site-packages\xarray\backends\common.py in sync(self, compute) 202 compute=compute, 203 flush=True, --> 204 regions=self.regions, 205 ) 206 self.sources = [] ~\Anaconda3\lib\site-packages\dask\array\core.py in store(sources, targets, lock, regions, compute, return_stored, kwargs) 943 944 if compute: --> 945 result.compute(kwargs) 946 return None 947 else: ~\Anaconda3\lib\site-packages\dask\base.py in compute(self, kwargs) 164 dask.base.compute 165 """ --> 166 (result,) = compute(self, traverse=False, kwargs) 167 return result 168 ~\Anaconda3\lib\site-packages\dask\base.py in compute(args, kwargs) 442 postcomputes.append(x.dask_postcompute()) 443 --> 444 results = schedule(dsk, keys, kwargs) 445 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) 446 ~\Anaconda3\lib\site-packages\dask\threaded.py in get(dsk, result, cache, num_workers, pool, kwargs) 82 get_id=_thread_get_id, 83 pack_exception=pack_exception, ---> 84 kwargs 85 ) 86 ~\Anaconda3\lib\site-packages\dask\local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs) 484 _execute_task(task, data) # Re-execute locally 485 else: --> 486 raise_exception(exc, tb) 487 res, worker_id = loads(res_info) 488 state["cache"][key] = res ~\Anaconda3\lib\site-packages\dask\local.py in reraise(exc, tb) 314 if exc.traceback is not tb: 315 raise exc.with_traceback(tb) --> 316 raise exc 317 318 ~\Anaconda3\lib\site-packages\dask\local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception) 220 try: 221 task, data = loads(task_info) --> 222 result = _execute_task(task, data) 223 id = get_id() 224 result = dumps((result, id)) ~\Anaconda3\lib\site-packages\dask\core.py in _execute_task(arg, cache, dsk) 119 # temporaries by their reference count and can execute certain 120 # operations in-place. --> 121 return func(*(_execute_task(a, cache) for a in args)) 122 elif not ishashable(arg): 123 return arg ~\Anaconda3\lib\site-packages\dask\array\core.py in getter(a, b, asarray, lock) 98 c = a[b] 99 if asarray: --> 100 c = np.asarray(c) 101 finally: 102 if lock: ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 489 490 def array(self, dtype=None): --> 491 return np.asarray(self.array, dtype=dtype) 492 493 def getitem(self, key): ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 651 652 def array(self, dtype=None): --> 653 return np.asarray(self.array, dtype=dtype) 654 655 def getitem(self, key): ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 555 def array(self, dtype=None): 556 array = as_indexable(self.array) --> 557 return np.asarray(array[self.key], dtype=None) 558 559 def transpose(self, order): ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in array(self, dtype) 70 71 def array(self, dtype=None): ---> 72 return self.func(self.array) 73 74 def repr(self): ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in _scale_offset_decoding(data, scale_factor, add_offset, dtype) 216 217 def _scale_offset_decoding(data, scale_factor, add_offset, dtype): --> 218 data = np.array(data, dtype=dtype, copy=True) 219 if scale_factor is not None: 220 data *= scale_factor ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in array(self, dtype) 70 71 def array(self, dtype=None): ---> 72 return self.func(self.array) 73 74 def repr(self): ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in _apply_mask(data, encoded_fill_values, decoded_fill_value, dtype) 136 ) -> np.ndarray: 137 """Mask all matching values in a NumPy arrays.""" --> 138 data = np.asarray(data, dtype=dtype) 139 condition = False 140 for fv in encoded_fill_values: ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 555 def array(self, dtype=None): 556 array = as_indexable(self.array) --> 557 return np.asarray(array[self.key], dtype=None) 558 559 def transpose(self, order): ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in getitem(self, key) 71 def getitem(self, key): 72 return indexing.explicit_indexing_adapter( ---> 73 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 74 ) 75 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method) 835 """ 836 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) --> 837 result = raw_indexing_method(raw_key.tuple) 838 if numpy_indices.tuple: 839 # index the loaded np.ndarray ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in _getitem(self, key) 82 try: 83 with self.datastore.lock: ---> 84 original_array = self.get_array(needs_lock=False) 85 array = getitem(original_array, key) 86 except IndexError: ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in get_array(self, needs_lock) 61 62 def get_array(self, needs_lock=True): ---> 63 ds = self.datastore._acquire(needs_lock) 64 variable = ds.variables[self.variable_name] 65 variable.set_auto_maskandscale(False) ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in _acquire(self, needs_lock) 359 360 def _acquire(self, needs_lock=True): --> 361 with self._manager.acquire_context(needs_lock) as root: 362 ds = _nc4_require_group(root, self._group, self._mode) 363 return ds ~\Anaconda3\lib\contextlib.py in enter(self) 110 del self.args, self.kwds, self.func 111 try: --> 112 return next(self.gen) 113 except StopIteration: 114 raise RuntimeError("generator didn't yield") from None ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in acquire_context(self, needs_lock) 184 def acquire_context(self, needs_lock=True): 185 """Context manager for acquiring a file.""" --> 186 file, cached = self._acquire_with_cache_info(needs_lock) 187 try: 188 yield file ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 206 # ensure file doesn't get overriden when opened again 207 self._mode = "a" --> 208 self._cache[self._key] = file 209 return file, False 210 else: ~\Anaconda3\lib\site-packages\xarray\backends\lru_cache.py in setitem(self, key, value) 71 elif self._maxsize: 72 # make room if necessary ---> 73 self._enforce_size_limit(self._maxsize - 1) 74 self._cache[key] = value 75 elif self._on_evict is not None: ~\Anaconda3\lib\site-packages\xarray\backends\lru_cache.py in _enforce_size_limit(self, capacity) 61 key, value = self._cache.popitem(last=False) 62 if self._on_evict is not None: ---> 63 self._on_evict(key, value) 64 65 def setitem(self, key: K, value: V) -> None: ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in <lambda>(k, v) 12 # Global cache for storing open files. 13 FILE_CACHE: LRUCache[str, io.IOBase] = LRUCache( ---> 14 maxsize=cast(int, OPTIONS["file_cache_maxsize"]), on_evict=lambda k, v: v.close() 15 ) 16 assert FILE_CACHE.maxsize, "file cache must be at least size one" netCDF4_netCDF4.pyx in netCDF4._netCDF4.Dataset.close() netCDF4_netCDF4.pyx in netCDF4._netCDF4.Dataset._close() netCDF4_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success() RuntimeError: NetCDF: HDF error ``` I also tried changing setting xr.set_options(file_cache_maxsize=500) outside of the loop before trying to create the netcdf file and received this error: ``` KeyError Traceback (most recent call last) ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 197 try: --> 198 file = self._cache[self._key] 199 except KeyError: ~\Anaconda3\lib\site-packages\xarray\backends\lru_cache.py in getitem(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key) KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/201512/oisst-avhrr-v02r01.20151231.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))] During handling of the above exception, another exception occurred: OSError Traceback (most recent call last) <ipython-input-4-474cdce51e60> in <module> 1 xr.set_options(file_cache_maxsize=500) ----> 2 sst_mean_climo_test.to_netcdf(path='E:/Riskpulse_HD/SST_stuff/sst_mean_climo_test') ~\Anaconda3\lib\site-packages\xarray\core\dataarray.py in to_netcdf(self, args, kwargs) 2356 dataset = self.to_dataset() 2357 -> 2358 return dataset.to_netcdf(args, **kwargs) 2359 2360 def to_dict(self, data: bool = True) -> dict: ~\Anaconda3\lib\site-packages\xarray\core\dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1552 unlimited_dims=unlimited_dims, 1553 compute=compute, -> 1554 invalid_netcdf=invalid_netcdf, 1555 ) 1556 ~\Anaconda3\lib\site-packages\xarray\backends\api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1095 return writer, store 1096 -> 1097 writes = writer.sync(compute=compute) 1098 1099 if path_or_file is None: ~\Anaconda3\lib\site-packages\xarray\backends\common.py in sync(self, compute) 202 compute=compute, 203 flush=True, --> 204 regions=self.regions, 205 ) 206 self.sources = [] ~\Anaconda3\lib\site-packages\dask\array\core.py in store(sources, targets, lock, regions, compute, return_stored, kwargs) 943 944 if compute: --> 945 result.compute(kwargs) 946 return None 947 else: ~\Anaconda3\lib\site-packages\dask\base.py in compute(self, kwargs) 164 dask.base.compute 165 """ --> 166 (result,) = compute(self, traverse=False, kwargs) 167 return result 168 ~\Anaconda3\lib\site-packages\dask\base.py in compute(args, kwargs) 442 postcomputes.append(x.dask_postcompute()) 443 --> 444 results = schedule(dsk, keys, kwargs) 445 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) 446 ~\Anaconda3\lib\site-packages\dask\threaded.py in get(dsk, result, cache, num_workers, pool, kwargs) 82 get_id=_thread_get_id, 83 pack_exception=pack_exception, ---> 84 kwargs 85 ) 86 ~\Anaconda3\lib\site-packages\dask\local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs) 484 _execute_task(task, data) # Re-execute locally 485 else: --> 486 raise_exception(exc, tb) 487 res, worker_id = loads(res_info) 488 state["cache"][key] = res ~\Anaconda3\lib\site-packages\dask\local.py in reraise(exc, tb) 314 if exc.traceback is not tb: 315 raise exc.with_traceback(tb) --> 316 raise exc 317 318 ~\Anaconda3\lib\site-packages\dask\local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception) 220 try: 221 task, data = loads(task_info) --> 222 result = _execute_task(task, data) 223 id = get_id() 224 result = dumps((result, id)) ~\Anaconda3\lib\site-packages\dask\core.py in _execute_task(arg, cache, dsk) 119 # temporaries by their reference count and can execute certain 120 # operations in-place. --> 121 return func(*(_execute_task(a, cache) for a in args)) 122 elif not ishashable(arg): 123 return arg ~\Anaconda3\lib\site-packages\dask\array\core.py in getter(a, b, asarray, lock) 98 c = a[b] 99 if asarray: --> 100 c = np.asarray(c) 101 finally: 102 if lock: ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 489 490 def array(self, dtype=None): --> 491 return np.asarray(self.array, dtype=dtype) 492 493 def getitem(self, key): ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 651 652 def array(self, dtype=None): --> 653 return np.asarray(self.array, dtype=dtype) 654 655 def getitem(self, key): ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 555 def array(self, dtype=None): 556 array = as_indexable(self.array) --> 557 return np.asarray(array[self.key], dtype=None) 558 559 def transpose(self, order): ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in array(self, dtype) 70 71 def array(self, dtype=None): ---> 72 return self.func(self.array) 73 74 def repr(self): ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in _scale_offset_decoding(data, scale_factor, add_offset, dtype) 216 217 def _scale_offset_decoding(data, scale_factor, add_offset, dtype): --> 218 data = np.array(data, dtype=dtype, copy=True) 219 if scale_factor is not None: 220 data *= scale_factor ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in array(self, dtype) 70 71 def array(self, dtype=None): ---> 72 return self.func(self.array) 73 74 def repr(self): ~\Anaconda3\lib\site-packages\xarray\coding\variables.py in _apply_mask(data, encoded_fill_values, decoded_fill_value, dtype) 136 ) -> np.ndarray: 137 """Mask all matching values in a NumPy arrays.""" --> 138 data = np.asarray(data, dtype=dtype) 139 condition = False 140 for fv in encoded_fill_values: ~\Anaconda3\lib\site-packages\numpy\core_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in array(self, dtype) 555 def array(self, dtype=None): 556 array = as_indexable(self.array) --> 557 return np.asarray(array[self.key], dtype=None) 558 559 def transpose(self, order): ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in getitem(self, key) 71 def getitem(self, key): 72 return indexing.explicit_indexing_adapter( ---> 73 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 74 ) 75 ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method) 835 """ 836 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) --> 837 result = raw_indexing_method(raw_key.tuple) 838 if numpy_indices.tuple: 839 # index the loaded np.ndarray ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in _getitem(self, key) 82 try: 83 with self.datastore.lock: ---> 84 original_array = self.get_array(needs_lock=False) 85 array = getitem(original_array, key) 86 except IndexError: ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in get_array(self, needs_lock) 61 62 def get_array(self, needs_lock=True): ---> 63 ds = self.datastore._acquire(needs_lock) 64 variable = ds.variables[self.variable_name] 65 variable.set_auto_maskandscale(False) ~\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py in _acquire(self, needs_lock) 359 360 def _acquire(self, needs_lock=True): --> 361 with self._manager.acquire_context(needs_lock) as root: 362 ds = _nc4_require_group(root, self._group, self._mode) 363 return ds ~\Anaconda3\lib\contextlib.py in enter(self) 110 del self.args, self.kwds, self.func 111 try: --> 112 return next(self.gen) 113 except StopIteration: 114 raise RuntimeError("generator didn't yield") from None ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in acquire_context(self, needs_lock) 184 def acquire_context(self, needs_lock=True): 185 """Context manager for acquiring a file.""" --> 186 file, cached = self._acquire_with_cache_info(needs_lock) 187 try: 188 yield file ~\Anaconda3\lib\site-packages\xarray\backends\file_manager.py in _acquire_with_cache_info(self, needs_lock) 202 kwargs = kwargs.copy() 203 kwargs["mode"] = self._mode --> 204 file = self._opener(self._args, *kwargs) 205 if self._mode == "w": 206 # ensure file doesn't get overriden when opened again netCDF4_netCDF4.pyx in netCDF4._netCDF4.Dataset.init() netCDF4_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success() OSError: [Errno -37] NetCDF: Write to read only: b'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/201512/oisst-avhrr-v02r01.20151231.nc' ``` I believe these errors have something to do with a post that I created a couple weeks ago (https://github.com/pydata/xarray/issues/4082). I'm not sure if you can @ users on here, but @rsignell-usgs found out something about the caching before hand. It seems that this is some sort of Windows issue. Versions python: 3.7.4 xarray: 0.15.1 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netcdf4: 1.4.2 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4169/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
924002003 | MDU6SXNzdWU5MjQwMDIwMDM= | 5483 | Cannot interpolate on a multifile .grib array. Single file works fine. | Alexander-Serov 22743277 | closed | 0 | 1 | 2021-06-17T14:36:57Z | 2022-04-09T15:50:24Z | 2022-04-09T15:50:23Z | NONE | What happened:
I have multiple .grib files that I am able to successfully open using the What you expected to happen: Interpolate the mutlifile grib array along latitude and longitude. Minimal Complete Verifiable Example:
Result:
Anything else we need to know?: Since the files are too big, I am unable to share for the moment, but I suspect the issue might be reproducible on any multifile grib combination. Environment: INSTALLED VERSIONScommit: None python: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 06:25:23) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('English_United Kingdom', '1252') libhdf5: 1.10.6 libnetcdf: 4.7.3 xarray: 0.18.2 pandas: 1.2.4 numpy: 1.20.3 scipy: 1.6.3 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.3 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.9.0 iris: None bottleneck: None dask: 2021.06.0 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.1.2 conda: None pytest: 6.2.4 IPython: None sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5483/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
878481461 | MDU6SXNzdWU4Nzg0ODE0NjE= | 5276 | open_mfdataset: Not a valid ID | minhhg 11815787 | closed | 0 | 4 | 2021-05-07T05:34:02Z | 2022-04-09T15:49:50Z | 2022-04-09T15:49:50Z | NONE | I have about 601 NETCDF4 files saved using xarray. We try to use open_mfdataset to access these files. The main code calls this function many times. At the first few calls, it works fine, after for a while it throw the following error message "RuntimeError: NetCDF: Not a valid ID"
Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.6.8.final.0 python-bits: 64 OS: Linux OS-release: 5.4.0-1047-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.11.0 pandas: 0.24.1 numpy: 1.15.4 scipy: 1.2.0 netCDF4: 1.4.2 h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: None iris: None bottleneck: 1.2.1 cyordereddict: None dask: 1.1.1 distributed: 1.25.3 matplotlib: 3.0.2 cartopy: None seaborn: 0.9.0 setuptools: 40.7.3 pip: 19.0.1 conda: None pytest: 4.2.0 IPython: 7.1.1 sphinx: 1.8.4This error also happens with xarray version 0.10.9 Error trace:
```python
2021-05-05 09:28:19,911, DEBUG 7621, sim_io.py:483 - load_unique_document(), xpa
th=/home/ubuntu/runs/20210331_001/nominal_dfs/uk
2021-05-05 09:28:42,774, ERROR 7621, run_gov_ret.py:33 - <module>(),
Unknown error=NetCDF: Not a valid ID
Traceback (most recent call last):
File "/home/ubuntu/dev/py36/python/ev/model/api3/run_gov_ret.py", line 31, in
<module>
res = govRet()
File "/home/ubuntu/dev/py36/python/ev/model/api3/returns.py", line 56, in __ca
ll__
decompose=self.decompose))
File "/home/ubuntu/dev/py36/python/ev/model/returns/returnsGenerator.py", line
70, in calc_returns
dfs_data = self.mongo_dfs.get_data(mats=[1,mat,mat-1])
File "/home/ubuntu/dev/py36/python/ev/model/api3/dfs.py", line 262, in get_dat
a
record = self.mdb.load_unique_document(self.dfs_collection_name, spec)
File "/home/ubuntu/dev/py36/python/ev/model/api3/sim_io.py", line 1109, in load_unique_document
return self.collections[collection].load_unique_document(query, *args, **kwargs)
File "/home/ubuntu/dev/py36/python/ev/model/api3/sim_io.py", line 501, in load_unique_document
doc['data'] = ar1.load().values
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/dataarray.py", line 631, in load
ds = self._to_temp_dataset().load(**kwargs)
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/dataset.py", line 494, in load
evaluated_data = da.compute(*lazy_data.values(), **kwargs)
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/base.py", line 398, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/threaded.py", line 76, in get
pack_exception=pack_exception, **kwargs)
pack_exception=pack_exception, **kwargs)
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/local
.py", line 459, in get_async
raise_exception(exc, tb)
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/compa
tibility.py", line 112, in reraise
raise exc
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/local
.py", line 230, in execute_task
result = _execute_task(task, data)
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/core.
py", line 119, in _execute_task
return func(*args2)
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/array
/core.py", line 82, in getter
c = np.asarray(c)
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/numpy/core
/numeric.py", line 501, in asarray
return array(a, dtype, copy=False, order=order)
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/cor
e/indexing.py", line 602, in __array__
return np.asarray(self.array, dtype=dtype)
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/numpy/core/numeric.py", line 501, in asarray
return array(a, dtype, copy=False, order=order)
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/indexing.py", line 508, in __array__
return np.asarray(array[self.key], dtype=None)
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 64, in __getitem__
self._getitem)
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/indexing.py", line 776, in explicit_indexing_adapter
result = raw_indexing_method(raw_key.tuple)
File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 76, in _getitem
array = getitem(original_array, key)
File "netCDF4/_netCDF4.pyx", line 4095, in netCDF4._netCDF4.Variable.__getitem__
File "netCDF4/_netCDF4.pyx", line 3798, in netCDF4._netCDF4.Variable.shape.__get__
File "netCDF4/_netCDF4.pyx", line 3746, in netCDF4._netCDF4.Variable._getdims
File "netCDF4/_netCDF4.pyx", line 1754, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Not a valid ID
```
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5276/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
427644858 | MDU6SXNzdWU0Mjc2NDQ4NTg= | 2861 | WHERE function, problems with memory operations? | rpnaut 30219501 | closed | 0 | 8 | 2019-04-01T11:09:11Z | 2022-04-09T15:41:51Z | 2022-04-09T15:41:51Z | NONE | I am facing with the where-functionality in xarray. I have two datasets
and
Applying something like this:
gives me a dataarray of time length zero:
Problem descriptionThe problem seems to be that 'ref' and 'proof' are not entirely consistent somehow regarding coordinates. But if a subtract the coordinates from each other I do not get a difference. However, as I always fight with getting datasets consistent to each other for mathematical calculations with xarray, I have figured out following workarounds:
Maybe, here I deal with a problem of incomplete operations in memory? The printout between datasets is maybe consistent but still an additional operation on the datasets is required to make the datasets consistent in memory? Thanks in advance for your help |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2861/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
469440752 | MDU6SXNzdWU0Njk0NDA3NTI= | 3139 | Change the signature of DataArray to DataArray(data, dims, coords, ...)? | shoyer 1217238 | open | 0 | 1 | 2019-07-17T20:54:57Z | 2022-04-09T15:28:51Z | MEMBER | Currently, the signature of DataArray is In the long term, I think My original reasoning for this argument order was that The challenge in making any change here would be to have a smooth deprecation process, and that ideally avoids requiring users to rewrite all of their code and avoids loads of pointless/extraneous warnings. I'm not entirely sure this is possible. We could likely use heuristics to distinguish between An alternative that might achieve some of the convenience of this change would be to allow for passing lists of strings in the |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3139/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
208312826 | MDU6SXNzdWUyMDgzMTI4MjY= | 1273 | replace a dim with a coordinate from another dataset | rabernat 1197350 | open | 0 | 4 | 2017-02-17T02:15:36Z | 2022-04-09T15:26:20Z | MEMBER | I often want a function that takes a dataarray / dataset and replaces a dimension with a coordinate from a different dataset. @shoyer proposed the following simple solution. ```python def replace_dim(da, olddim, newdim): renamed = da.rename({olddim: newdim.name})
``` Is this of broad enough interest to add a build in method for? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1273/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
995207525 | MDU6SXNzdWU5OTUyMDc1MjU= | 5790 | combining 2 arrays with xr.merge() causes temporary spike in memory usage ~3x the combined size of the arrays | zachglee 23262800 | closed | 0 | 6 | 2021-09-13T18:42:03Z | 2022-04-09T15:25:28Z | 2022-04-09T15:25:28Z | NONE | What happened:
When attempting to combine two arrays of sizes For small arrays this temporary spike in memory is fine, but for larger arrays this means we are essentially limited to combining arrays of total size below 1/3rd of an instance's memory limit. Anything above that and the temporary spike causes the instance to crash. What you expected to happen:
I expected there to be only a memory increase of Minimal Complete Verifiable Example: ```python Put your MCVE code hereimport numpy as np import xarray as xr import tracemalloc tracemalloc.start() print("(current, peak) memory at start:") print(tracemalloc.get_traced_memory()) create the test data (each is 100 by 100 by 10 array of random floats)Their A and B coordinates are completely matching. Their C coordinates are completely disjoint.data1 = np.random.rand(100, 100, 10) da1 = xr.DataArray( data1, dims=("A", "B", "C"), coords={ "A": [f"A{i}" for i in range(100)], "B": [f"B{i}" for i in range(100)], "C": [f"C{i}" for i in range(10)]}, ) da1.name = "da" data2 = np.random.rand(100, 100, 10) da2 = xr.DataArray( data2, dims=("A", "B", "C"), coords={ "A": [f"A{i}" for i in range(100)], "B": [f"B{i}" for i in range(100)], "C": [f"C{i+10}" for i in range(10)]}, ) da2.name = "da" print("(current, peak) memory after creation of arrays to be combined:") print(tracemalloc.get_traced_memory()) print(f"da1.nbytes = {da1.nbytes}") print(f"da2.nbytes = {da2.nbytes}") da_combined = xr.merge([da1, da2]).to_array() print("(current, peak) memory after merging. You should observe that the peak memory usage is now much higher.") print(tracemalloc.get_traced_memory()) print(f"da_combined.nbytes = {da_combined.nbytes}") print(da_combined) ``` Anything else we need to know?: Interestingly, when I try merging 3 arrays at once, (sizes If that's the case, it seems like it should be possible to make this operation more efficient by creating just one inflated array and adding the data from the input arrays to it in-place? Or is this an expected and unavoidable behavior with merging? (fwiw this also affects several other combination methods, presumably because they use Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.19.121-linuxkit machine: x86_64 processor: byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.17.0 pandas: 1.2.3 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.6 pydap: None h5netcdf: 0.11.0 h5py: 3.3.0 Nio: None zarr: None cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.4.2 cartopy: None seaborn: None numbagg: None pint: 0.16.1 setuptools: 57.4.0 pip: 21.2.4 conda: None pytest: 6.2.2 IPython: 7.23.1 sphinx: 3.5.2 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5790/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
257070215 | MDU6SXNzdWUyNTcwNzAyMTU= | 1569 | Grouping with multiple levels | jjpr-mit 25231875 | closed | 0 | 6 | 2017-09-12T14:46:12Z | 2022-04-09T15:25:07Z | 2022-04-09T15:25:06Z | NONE | http://xarray.pydata.org/en/stable/groupby.html says:
but when I supply the TypeError: groupby() got an unexpected keyword argument 'level' ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1569/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
438947247 | MDU6SXNzdWU0Mzg5NDcyNDc= | 2933 | Stack() & unstack() issues on Multindex | ray306 1559890 | closed | 0 | 4 | 2019-04-30T19:47:51Z | 2022-04-09T15:23:28Z | 2022-04-09T15:23:28Z | NONE | I would like to reshape the DataArray by one level in the Multindex, and I thought the Make a DataArray with Multindex:
Stack problem:I want a dimension merges into another one:
Unstack problem:Unstacking by the whole Multindex worked:
Coordinates:
* variable (variable) int32 0 1 2 3
* first (first) object 'bar' 'baz' 'foo'
* second (second) object 'one' 'two'
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2933/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
816540158 | MDU6SXNzdWU4MTY1NDAxNTg= | 4958 | to_zarr mode='a-', append_dim; if dim value exists raise error | ahuang11 15331990 | open | 0 | 1 | 2021-02-25T15:26:02Z | 2022-04-09T15:19:28Z | CONTRIBUTOR | If I have a ds with time, lat, lon and I call the same command twice:
Kind of like: ```python import numpy as np import xarray as xr ds = xr.tutorial.open_dataset('air_temperature') ds.to_zarr('test_air.zarr', append_dim='time') ds_tmp = xr.open_mfdataset('test_air.zarr', engine='zarr') overlap = np.intersect1d(ds['time'], ds_tmp['time']) if len(overlap) > 1: raise ValueError(f'Found overlapping values in datasets {overlap}') ds.to_zarr('test_air.zarr', append_dim='time') ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4958/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
707571360 | MDU6SXNzdWU3MDc1NzEzNjA= | 4452 | Change default for concat_characters to False in open_* functions | eric-czech 6130352 | open | 0 | 2 | 2020-09-23T18:06:07Z | 2022-04-09T03:21:43Z | NONE | I wanted to propose that concat_characters be False for I also find it to be confusing behavior (e.g. https://github.com/pydata/xarray/issues/4405) since no other arrays are automatically transformed like this when deserialized. If submit a PR for this, would anybody object? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4452/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
903922477 | MDU6SXNzdWU5MDM5MjI0Nzc= | 5386 | Add xr.open_dataset("file.tif", engine="rasterio") to docs | raybellwaves 17162724 | closed | 0 | 1 | 2021-05-27T15:39:29Z | 2022-04-09T03:15:45Z | 2022-04-09T03:15:45Z | CONTRIBUTOR | Kind of related to https://github.com/pydata/xarray/issues/4697 I see https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html#rioxarray shows
This could be added to https://xarray.pydata.org/en/latest/user-guide/io.html#rasterio |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5386/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
312203596 | MDU6SXNzdWUzMTIyMDM1OTY= | 2042 | Anyone working on a to_tiff? Alternatively, how do you write an xarray to a geotiff? | ebo 601025 | closed | 0 | 31 | 2018-04-07T12:43:41Z | 2022-04-09T03:14:41Z | 2022-04-09T01:19:10Z | NONE | Matthew Rocklin wrote a gist https://gist.github.com/mrocklin/3df315e93d4bdeccf76db93caca2a9bd to demonstrate using XArray to read tiled GeoTIFF datasets, but I am still confused as to how to write them to a GeoTIFF. I can easily create a tiff with "rasterio.open(out, 'w', **src.profile)", but the following does not seem like the best/cleanest way to do this: ``` ds = xr.open_rasterio('myfile.tif', chunks={'band': 1, 'x': 2048, 'y': 2048}) with rasterio.open('myfile.tif', 'r') as src: with rasterio.open('new_myfile.tif', 'w', **src.profile) as dst: for i in range(1, src.count + 1): dst.write(ds.variable.data[i-1].compute(), i) ``` Also, if the profile and tags were propagated through open_rasterio, then the second open would not be necessary and would be generally useful. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2042/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
956259734 | MDU6SXNzdWU5NTYyNTk3MzQ= | 5649 | xr.merge bug? when using combine_attrs='drop_conflicts' | jbusecke 14314623 | open | 0 | keewis 14808389 | 3 | 2021-07-29T22:47:43Z | 2022-04-09T03:14:24Z | CONTRIBUTOR | What happened: I have recently encountered a situation where combining two datasets failed, due to the datatype of their attributes. This example illustrates the situation: ```python ds1 = xr.Dataset(attrs={'a':[5]}) ds2 = xr.Dataset(attrs={'a':6}) xr.merge([ds1, ds2], combine_attrs='drop_conflicts')
TypeError Traceback (most recent call last) <ipython-input-12-1c8e82be0882> in <module> 2 ds2 = xr.Dataset(attrs={'a':6}) 3 ----> 4 xr.merge([ds1, ds2], combine_attrs='drop_conflicts') /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge(objects, compat, join, fill_value, combine_attrs) 898 dict_like_objects.append(obj) 899 --> 900 merge_result = merge_core( 901 dict_like_objects, 902 compat, /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value) 654 ) 655 --> 656 attrs = merge_attrs( 657 [var.attrs for var in coerced if isinstance(var, (Dataset, DataArray))], 658 combine_attrs, /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge_attrs(variable_attrs, combine_attrs, context) 544 } 545 ) --> 546 result = { 547 key: value 548 for key, value in result.items() /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in <dictcomp>(.0) 547 key: value 548 for key, value in result.items() --> 549 if key not in attrs or equivalent(attrs[key], value) 550 } 551 dropped_keys |= {key for key in attrs if key not in result} /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/utils.py in equivalent(first, second) 171 return duck_array_ops.array_equiv(first, second) 172 elif isinstance(first, list) or isinstance(second, list): --> 173 return list_equiv(first, second) 174 else: 175 return ( /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/utils.py in list_equiv(first, second) 182 def list_equiv(first, second): 183 equiv = True --> 184 if len(first) != len(second): 185 return False 186 else: TypeError: object of type 'int' has no len() ``` Took me a while to find out what the root cause of this was with a fully populated dataset, since the error is less than obvious. What you expected to happen:
In my understanding this should just drop the attribute Is there a way to handle this case more elegantly? Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.89+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.1.dev8+gda99a566 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.7.1 cftime: 1.4.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.2.2 cfgrib: 0.9.9.0 iris: None bottleneck: 1.3.2 dask: 2021.04.1 distributed: 2021.04.1 matplotlib: 3.4.1 cartopy: 0.19.0 seaborn: None numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 20.3.4 conda: None pytest: None IPython: 7.22.0 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5649/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | |||||||
576502871 | MDU6SXNzdWU1NzY1MDI4NzE= | 3834 | encode_cf_datetime() casts dask arrays to NumPy arrays | andersy005 13301940 | open | 0 | 2 | 2020-03-05T20:11:37Z | 2022-04-09T03:10:49Z | MEMBER | Currently, when ```python In [46]: import numpy as np In [47]: import xarray as xr In [48]: import pandas as pd In [49]: times = pd.date_range("2000-01-01", "2001-01-01", periods=11) In [50]: time_bounds = np.vstack((times[:-1], times[1:])).T In [51]: arr = xr.DataArray(time_bounds).chunk() In [52]: arr In [53]: xr.coding.times.encode_cf_datetime(arr) ``` Cc @jhamman |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3834/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
606165039 | MDU6SXNzdWU2MDYxNjUwMzk= | 4000 | Add hook to get progress of long-running operations | cwerner 13906519 | closed | 0 | 3 | 2020-04-24T09:13:02Z | 2022-04-09T03:08:45Z | 2022-04-09T03:08:45Z | NONE | Hi. I currently work on a large dataframe that I convert to a Xarray dataset. It works, but takes quite some (unknown) amount of time. MCVE Code Sample
Expected OutputA progress report/ bar about the operation Problem DescriptionIt would be nice to have some hook or other functionality to tap into the xr.from_dataframe() and return a progress status that I then could pass to tqdm or something similar... Versions0.15.1 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4000/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
607718350 | MDU6SXNzdWU2MDc3MTgzNTA= | 4011 | missing empty group when iterate over groupby_bins | miniufo 9312831 | open | 0 | 4 | 2020-04-27T17:22:31Z | 2022-04-09T03:08:14Z | NONE | When I try to iterate over the object one of these bins will be emptybins = [0,4,5] grouped = array.groupby_bins('dim_0', bins) for i, group in enumerate(grouped): print(str(i)+' '+group) ``` When a bin contains no samples (bin of (4, 5]), the empty group will be dropped. Then how to iterate over the full bins even when some bins contain nothing? I've read this related issue #1019. But my case here need the correct order in grouped and empty groups need to be iterated over. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4011/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
666896781 | MDU6SXNzdWU2NjY4OTY3ODE= | 4279 | intersphinx looks for implementation modules | crusaderky 6213168 | open | 0 | 0 | 2020-07-28T08:55:12Z | 2022-04-09T03:03:30Z | MEMBER | This is a widespread issue caused by the pattern of defining objects in private module and then exposing them to the final user by importing them in the top-level Exact same issue in different projects: - https://github.com/aio-libs/aiohttp/issues/3714 - https://jira.mongodb.org/browse/MOTOR-338 - https://github.com/tkem/cachetools/issues/178 - https://github.com/AmphoraInc/xarray_mongodb/pull/22 - https://github.com/jonathanslenders/asyncio-redis/issues/143 If a project
1. uses xarray, intersphinx, and autodoc
2. subclasses any of the classes exposed by Then Sphinx emits a warning and fails to create a hyperlink, because intersphinx uses the WorkaroundIn conf.py:
SolutionPut the above hack in |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4279/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
667203487 | MDU6SXNzdWU2NjcyMDM0ODc= | 4282 | Values change when writing combined Dataset loaded with open_mfdataset | chpolste 11723107 | closed | 0 | 1 | 2020-07-28T16:20:09Z | 2022-04-09T03:00:55Z | 2022-04-09T03:00:55Z | NONE | What happened: Loading two netcdf files with What you expected to happen: That the written file contains the same values than the in-memory Minimal Complete Verifiable Example: ```python
The files contain wind data from the ERA5 reanalysis, downloaded from CDS. Anything else we need to know?: The issue might be related to the scale and offset values of the variable. Continuing the example: ```python
Data from the first file seems to be correct. When writing the combined dataset, the scale and offset from the first file are written to the combined file: ```python
``` Maybe the data from the second file is not adjusted to fit the new scaling and offset. Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.15.0-107-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.0.4 numpy: 1.18.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.2 iris: None bottleneck: None dask: 2.18.1 distributed: 2.21.0 matplotlib: 3.2.1 cartopy: 0.18.0 seaborn: None numbagg: None pint: 0.14 setuptools: 49.2.0.post20200712 pip: 20.1.1 conda: 4.8.3 pytest: None IPython: 7.16.1 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4282/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
673682661 | MDU6SXNzdWU2NzM2ODI2NjE= | 4313 | Using Dependabot to manage doc build and CI versions | jthielen 3460034 | open | 0 | 4 | 2020-08-05T16:24:24Z | 2022-04-09T02:59:21Z | CONTRIBUTOR | As brought up on the bi-weekly community developers meeting, it sounds like Pandas v1.1.0 is breaking doc builds on RTD. One solution to the issues of frequent breakages in doc builds and CI due to upstream updates is having fixed version lists for all of these, which are then incrementally updated as new versions come out. @dopplershift has done a lot of great work in MetPy getting such a workflow set up with Dependabot (https://github.com/Unidata/MetPy/pull/1410) among other CI updates, and this could be adapted for use here in xarray. We've generally been quite happy with our updated CI configuration with Dependabot over the past couple weeks. The only major issue has been https://github.com/Unidata/MetPy/issues/1424 / https://github.com/dependabot/dependabot-core/issues/2198#issuecomment-649726022, which has required some contributors to have to delete and recreate their forks in order for Dependabot to not auto-submit PRs to the forked repos. Any thoughts that you had here @dopplershift would be appreciated! xref https://github.com/pydata/xarray/issues/4287, https://github.com/pydata/xarray/pull/4296 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4313/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
685739084 | MDU6SXNzdWU2ODU3MzkwODQ= | 4375 | allow using non-dimension coordinates in polyfit | mathause 10194086 | open | 0 | 1 | 2020-08-25T19:40:55Z | 2022-04-09T02:58:48Z | MEMBER |
Example: ```python da = xr.DataArray( [1, 3, 2], dims=["x"], coords=dict(x=["a", "b", "c"], y=("x", [0, 1, 2])) ) print(da) da.polyfit("y", 1)
KeyError Traceback (most recent call last) <ipython-input-80-9bb2dacf50f7> in <module> 5 print(da) 6 ----> 7 da.polyfit("y", 1) ~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/dataarray.py in polyfit(self, dim, deg, skipna, rcond, w, full, cov) 3507 """ 3508 return self._to_temp_dataset().polyfit( -> 3509 dim, deg, skipna=skipna, rcond=rcond, w=w, full=full, cov=cov 3510 ) 3511 ~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/dataset.py in polyfit(self, dim, deg, skipna, rcond, w, full, cov) 6005 skipna_da = skipna 6006 -> 6007 x = get_clean_interp_index(self, dim, strict=False) 6008 xname = "{}_".format(self[dim].name) 6009 order = int(deg) + 1 ~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/missing.py in get_clean_interp_index(arr, dim, use_coordinate, strict) 246 247 if use_coordinate is True: --> 248 index = arr.get_index(dim) 249 250 else: # string ~/.conda/envs/ipcc_ar6/lib/python3.7/site-packages/xarray/core/common.py in get_index(self, key) 378 """ 379 if key not in self.dims: --> 380 raise KeyError(key) 381 382 try: KeyError: 'y' ``` Describe the solution you'd like Would be nice if that worked. Describe alternatives you've considered One could just set the non-dimension coordinate as index, e.g.: Additional context Allowing this may be as easy as replacing by
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4375/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
792651098 | MDU6SXNzdWU3OTI2NTEwOTg= | 4840 | Opening a dataset doesn't display groups. | dklink 11861183 | open | 0 | 2 | 2021-01-23T21:16:32Z | 2022-04-09T02:31:03Z | NONE | ProblemI know xarray doesn't support netCDF4 Group functionality. That's fine, I bet it's incredibly thorny. My issue is, when you open the root group of a netCDF4 file which contains groups, xarray doesn't even tell you that there are groups; they are totally invisible. This seems like a big flaw; you've opened a file, shouldn't you at least be told what's in it? SolutionWhen you open a dataset with the netcdf4-python library, you get something like this:
"groups" shows up sort of like an auto-generated attribute. Surely xarray can do something similar:
WorkaroundThe workaround I am considering is to actually add an attribute to my root group which contains a list of the groups in the file, so people using xarray will see that there are more groups in the file. However, this is redundant considering the information is already in the netCDF file, and also brittle since there's no guarantee the attribute truly reflects the groups in the file. ConclusionConsidering that |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4840/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
770006670 | MDU6SXNzdWU3NzAwMDY2NzA= | 4704 | Retries for rare failures | eric-czech 6130352 | open | 0 | 2 | 2020-12-17T13:06:51Z | 2022-04-09T02:30:16Z | NONE | I recently ran into several issues with gcsfs (https://github.com/dask/gcsfs/issues/316, https://github.com/dask/gcsfs/issues/315, and https://github.com/dask/gcsfs/issues/318) where errors are occasionally thrown, but only in large worfklows where enough http calls are made for them to become probable. @martindurant suggested forcing dask to retry tasks that may fail like this with Example Traceback``` Traceback (most recent call last): File "scripts/convert_phesant_data.py", line 100, in <module> fire.Fire() File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 138, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "scripts/convert_phesant_data.py", line 96, in sort_zarr ds.to_zarr(fsspec.get_mapper(output_path), consolidated=True, mode="w") File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/core/dataset.py", line 1652, in to_zarr return to_zarr( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/api.py", line 1368, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/api.py", line 1128, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/zarr.py", line 417, in store self.set_variables( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/zarr.py", line 489, in set_variables writer.add(v.data, zarr_array, region=region) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/common.py", line 145, in add target[...] = source File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1115, in __setitem__ self.set_basic_selection(selection, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1210, in set_basic_selection return self._set_basic_selection_nd(selection, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1501, in _set_basic_selection_nd self._set_selection(indexer, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1550, in _set_selection self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1664, in _chunk_setitem self._chunk_setitem_nosync(chunk_coords, chunk_selection, value, File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1729, in _chunk_setitem_nosync self.chunk_store[ckey] = cdata File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/mapping.py", line 151, in __setitem__ self.fs.pipe_file(key, maybe_convert(value)) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 121, in wrapper return maybe_sync(func, self, *args, **kwargs) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 100, in maybe_sync return sync(loop, func, *args, **kwargs) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync raise exc.with_traceback(tb) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 55, in f result[0] = await future File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1007, in _pipe_file return await simple_upload( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1523, in simple_upload j = await fs._call( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 525, in _call raise e File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 507, in _call self.validate_response(status, contents, json, path, headers) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1228, in validate_response raise HttpError(error) gcsfs.utils.HttpError: Required ```Has there already been a discussion about how to address rare errors like this? Arguably, I could file the same issue with Zarr but it seemed more productive to start here at a higher level of abstraction. To be clear, the code for the example failure above typically succeeds and reproducing this failure is difficult. I have only seen it a couple times now like this, where the calling code does not include dask, but it did make me want to know if there were any plans to tolerate rare failures in Xarray as Dask does. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4704/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
517192343 | MDU6SXNzdWU1MTcxOTIzNDM= | 3482 | geo raster accessor | shaharkadmiel 6872529 | closed | 0 | 1 | 2019-11-04T14:34:27Z | 2022-04-09T02:28:38Z | 2022-04-09T02:28:25Z | NONE | Hi, I have put together a very simple package that provides a universal In addition, the package provides a I plan to also add reprojection and spatial resampling methods which will wrap either rasterio functionality or directly use gdal's api. I hope this is of interest to the geosciences community and perhaps even a broader community. Contributions and any other input from others is of course welcome. Have a quick look at the Demo section in the readme file to get some ideas as to what this package can do for you. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3482/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
532647948 | MDU6SXNzdWU1MzI2NDc5NDg= | 3593 | xr.open_dataset not reloading data in jupyter-notebook | lkroen 58510627 | closed | 0 | 1 | 2019-12-04T12:17:13Z | 2022-04-09T02:27:17Z | 2022-04-09T02:27:17Z | NONE | First, I reported this issue on Jupyter-Notebook and was told, that it might be an issue of xarry: https://github.com/jupyter/notebook/issues/5101 I load an .nc file and print it Cell 1
Cell 2
and I get the correct output:
Now I (re)move the data in a terminal so that does not exist under the same name
Cell 3
and I correctly get an error, that the file does not exist
Now I move the data in a terminal backwards so that it exits again under the correct name
Cell 4
Now I (re)emove the data in a terminal so that does not exist under the same name again
Cell 5
Now I expect again the error message that I follows after cell 3, which says that the file does not exist. But, I get the output as If the file would exist.
The same issue occurs, if I only change the file. Then the changed file isn't loaded anymore. Deleting the file is just a drastic example. The same issue occurs, if I just repeatedly run the cell which is supposed to load the file. Then the file change is not loaded anymore. This is a real issue since I would need to restart the kernel always which is just not practical.. Attached you'll find the simple .nc file. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3593/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
552987067 | MDU6SXNzdWU1NTI5ODcwNjc= | 3712 | [Documentation/API?] {DataArray,Dataset}.sortby is stable sort? | jaicher 4666753 | open | 0 | 0 | 2020-01-21T16:27:37Z | 2022-04-09T02:26:34Z | CONTRIBUTOR | I noticed that It is not explicitly stated in the docs that the sorting will be stable. If this function is meant to always be stable, I think the documentation should explicitly state this. If not, I think it would be helpful to have an optional argument to ensure that the sort is kept stable in case the implementation changes in the future. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3712/reactions", "total_count": 3, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 1 } |
xarray 13221727 | issue | ||||||||
559283550 | MDU6SXNzdWU1NTkyODM1NTA= | 3745 | groupby drops the variable used to group | malmans2 22245117 | open | 0 | 0 | 2020-02-03T19:25:06Z | 2022-04-09T02:25:17Z | CONTRIBUTOR | MCVE Code Sample
Seasonal meands_season = ds.groupby('time.season').mean() ds_season ``` <xarray.Dataset> Dimensions: (season: 4, x: 275, y: 205) Coordinates: yc (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51 xc (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91 * season (season) object 'DJF' 'JJA' 'MAM' 'SON' Dimensions without coordinates: x, y Data variables: Tair (season, y, x) float64 nan nan nan nan ... 23.13 22.06 21.72 21.94 ```python The seasons are ordered in alphabetical order.I want to sort them based on time.But time was dropped, so I have to do this:time_season = ds['time'].groupby('time.season').mean() ds_season.sortby(time_season) ``` <xarray.Dataset> Dimensions: (season: 4, x: 275, y: 205) Coordinates: yc (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51 xc (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91 * season (season) object 'SON' 'DJF' 'MAM' 'JJA' Dimensions without coordinates: x, y Data variables: Tair (season, y, x) float64 nan nan nan nan ... 29.27 28.39 27.94 28.05 Expected Output```python Why does groupby drop time?I would expect a dataset that looks like this:ds_season['time'] = time_season ds_season ``` <xarray.Dataset> Dimensions: (season: 4, x: 275, y: 205) Coordinates: yc (y, x) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51 xc (y, x) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91 * season (season) object 'DJF' 'JJA' 'MAM' 'SON' Dimensions without coordinates: x, y Data variables: Tair (season, y, x) float64 nan nan nan nan ... 23.13 22.06 21.72 21.94 time (season) object 1982-01-16 12:00:00 ... 1981-10-17 00:00:00 Problem DescriptionI often use Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3745/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
564240510 | MDU6SXNzdWU1NjQyNDA1MTA= | 3767 | ValueError when reading netCDF | jjm0022 16228337 | closed | 0 | 2 | 2020-02-12T20:08:45Z | 2022-04-09T02:24:48Z | 2022-04-09T02:24:48Z | NONE | MCVE Code Sample```python ds = xr.open_dataset('20090327_0600') ``` Problem DescriptionWhenever I try to read certain netCDF files it raises a The traceback looks like this: ```python KeyError Traceback (most recent call last) ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock) 197 try: --> 198 file = self._cache[self._key] 199 except KeyError: ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/lru_cache.py in getitem(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key) KeyError: [<function _open_scipy_netcdf at 0x11c8fc160>, ('/Users/jmiller/data/madis/20090327_0600',), 'r', (('mmap', None), ('version', 2))] During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) <ipython-input-26-04ef422e5840> in <module> ----> 1 ds = xr.open_dataset('madis/20090327_0600') ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime) 536 537 with close_on_error(store): --> 538 ds = maybe_decode_store(store) 539 540 # Ensure source filename always stored in dataset object (GH issue #2550) ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/api.py in maybe_decode_store(store, lock) 444 445 def maybe_decode_store(store, lock=False): --> 446 ds = conventions.decode_cf( 447 store, 448 mask_and_scale=mask_and_scale, ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime) 568 encoding = obj.encoding 569 elif isinstance(obj, AbstractDataStore): --> 570 vars, attrs = obj.load() 571 extra_coords = set() 572 file_obj = obj ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/common.py in load(self) 121 """ 122 variables = FrozenDict( --> 123 (_decode_variable_name(k), v) for k, v in self.get_variables().items() 124 ) 125 attributes = FrozenDict(self.get_attrs()) ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/scipy_.py in get_variables(self) 155 def get_variables(self): 156 return FrozenDict( --> 157 (k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items() 158 ) 159 ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/scipy_.py in ds(self) 144 @property 145 def ds(self): --> 146 return self._manager.acquire() 147 148 def open_store_variable(self, name, var): ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/file_manager.py in acquire(self, needs_lock)
178 An open file object, as returned by ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock) 202 kwargs = kwargs.copy() 203 kwargs["mode"] = self._mode --> 204 file = self._opener(self._args, *kwargs) 205 if self._mode == "w": 206 # ensure file doesn't get overriden when opened again ~/miniconda3/envs/proc/lib/python3.8/site-packages/xarray/backends/scipy_.py in _open_scipy_netcdf(filename, mode, mmap, version) 81 82 try: ---> 83 return scipy.io.netcdf_file(filename, mode=mode, mmap=mmap, version=version) 84 except TypeError as e: # netcdf3 message is obscure in this case 85 errmsg = e.args[0] ~/miniconda3/envs/proc/lib/python3.8/site-packages/scipy/io/netcdf.py in init(self, filename, mode, mmap, version, maskandscale) 282 283 if mode in 'ra': --> 284 self._read() 285 286 def setattr(self, attr, value): ~/miniconda3/envs/proc/lib/python3.8/site-packages/scipy/io/netcdf.py in _read(self) 614 self._read_dim_array() 615 self._read_gatt_array() --> 616 self._read_var_array() 617 618 def _read_numrecs(self): ~/miniconda3/envs/proc/lib/python3.8/site-packages/scipy/io/netcdf.py in _read_var_array(self) 720 # Build rec array. 721 if self.use_mmap: --> 722 rec_array = self._mm_buf[begin:begin+self._recs*self._recsize].view(dtype=dtypes) 723 rec_array.shape = (self._recs,) 724 else: ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array. ``` Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3767/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
361237908 | MDU6SXNzdWUzNjEyMzc5MDg= | 2419 | Document ways to reshape a DataArray | dimitryx2017 9844249 | open | 0 | 5 | 2018-09-18T10:27:36Z | 2022-04-09T02:21:15Z | NONE | Code Sample, a copy-pastable example if possibleA "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports ```python Your code heredef xr_reshape(A, dim, newdims, coords): """ Reshape DataArray A to convert its dimension dim into sub-dimensions given by newdims and the corresponding coords. Example: Ar = xr_reshape(A, 'time', ['year', 'month'], [(2017, 2018), np.arange(12)]) """
``` Problem description[this should explain why the current behavior is a problem and why the expected output is a better solution.] It would be great to have the above function as a DataArray's method. Expected OutputA reshaped DataArray. In the example in the function comment it would correspond to an array like In[1] Ar.dims Out[1]: ('year', 'month', 'lat', 'lon') Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2419/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
414641120 | MDU6SXNzdWU0MTQ2NDExMjA= | 2789 | Appending to zarr with string dtype | davidbrochart 4711805 | open | 0 | 2 | 2019-02-26T14:31:42Z | 2022-04-09T02:18:05Z | CONTRIBUTOR | ```python import xarray as xr da = xr.DataArray(['foo']) ds = da.to_dataset(name='da') ds.to_zarr('ds') # no special encoding specified ds = xr.open_zarr('ds') print(ds.da.values) ``` The following code prints
The problem is that if I want to append to the zarr archive, like so: ```python import zarr ds = zarr.open('ds', mode='a') da_new = xr.DataArray(['barbar']) ds.da.append(da_new) ds = xr.open_zarr('ds') print(ds.da.values) ``` It prints If I want to specify the encoding with the maximum length, e.g:
It solves the length problem, but now my strings are kept as bytes:
It is not taken into account. The zarr encoding is The solution with |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2789/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
415192339 | MDU6SXNzdWU0MTUxOTIzMzk= | 2790 | Bug in xarray.open_dataset with variables/coordinates of dtype 'timedelta64[ns]' | SK-E 48060979 | closed | 0 | 1 | 2019-02-27T15:48:14Z | 2022-04-09T02:17:56Z | 2022-04-09T02:17:56Z | NONE | Code Sample, a copy-pastable example if possible```python import xarray as xr import pandas as pd Create array, coordinate time's dtype is timedelta64[ns]time = pd.timedelta_range(f"{2.0}s",f"{2.05}s",freq="10ms",name="time") data = range(len(time)) arr = xr.DataArray(data=psi,coords={"time":time},dims="time",name="psi") Save arraysavefile = "/path/to/file/BugXarray.nc" arr.to_netcdf(savefile) Load arrayarr_loaded = xr.open_dataset(savefile) Show time-coordinate on arr and arr_loadedprint(arr.time.values) Output: [2000000000 2010000000 2020000000 2030000000 2040000000 2050000000]print(arr_loaded.time.values) Output: [2000000000 2009999999 2020000000 2029999999 2040000000 2049999999]Same problem with pandas to_timedeltatimedelta = np.arange(200,206,1)/100 timedelta = pd.to_timedelta(timedelta,unit="s") Show time and timedeltaprint(time.values) Output: [2000000000 2010000000 2020000000 2030000000 2040000000 2050000000]print(timedelta.values) Output: [2000000000 2009999999 2020000000 2029999999 2040000000 2049999999]``` Problem descriptionOpening a netcdf-file that contains variables/coordinates with a dtype that is supposed to be 'timedelta64[ns]' might cause errors due to a loss in precision. I realized that the pandas-function pandas.to_timedelta shows the same misbehavior, though I don't know if xarray.open_dataset uses that function internally. Expected OutputIn the example above arr_loaded.time.values should equal arr.time.values! Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2790/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
428180638 | MDU6SXNzdWU0MjgxODA2Mzg= | 2863 | Memory Error for simple operations on NETCDF4 internally zipped files | rpnaut 30219501 | closed | 0 | 3 | 2019-04-02T11:48:01Z | 2022-04-09T02:15:45Z | 2022-04-09T02:15:45Z | NONE | Assuming you want to make easy computations with a data array loaded from internally zipped NETCDF4 files, you need at first to load a dataset:
Afterwards I have tried to do this: ``` In [4]: datarray=eobs["T_2M"]+273.15 MemoryError Traceback (most recent call last) <ipython-input-4-eaff3bff5e27> in <module>() ----> 1 datarray=eobs["T_2M"]+273.15 /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/dataarray.py in func(self, other) 1539 1540 variable = (f(self.variable, other_variable) -> 1541 if not reflexive 1542 else f(other_variable, self.variable)) 1543 coords = self.coords._merge_raw(other_coords) /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in func(self, other) 1139 if isinstance(other, (xr.DataArray, xr.Dataset)): 1140 return NotImplemented -> 1141 self_data, other_data, dims = _broadcast_compat_data(self, other) 1142 new_data = (f(self_data, other_data) 1143 if not reflexive /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in _broadcast_compat_data(self, other) 1379 else: 1380 # rely on numpy broadcasting rules -> 1381 self_data = self.data 1382 other_data = other 1383 dims = self.dims /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in data(self) 265 return self._data 266 else: --> 267 return self.values 268 269 @data.setter /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in values(self) 306 def values(self): 307 """The variable's data as a numpy.ndarray""" --> 308 return _as_array_or_item(self._data) 309 310 @values.setter /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in _as_array_or_item(data) 182 TODO: remove this (replace with np.asarray) once these issues are fixed 183 """ --> 184 data = np.asarray(data) 185 if data.ndim == 0: 186 if data.dtype.kind == 'M': /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/numpy-1.11.2-py3.5-linux-x86_64.egg/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """ --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None): /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in array(self, dtype) 417 418 def array(self, dtype=None): --> 419 self._ensure_cached() 420 return np.asarray(self.array, dtype=dtype) 421 /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in _ensure_cached(self) 414 def _ensure_cached(self): 415 if not isinstance(self.array, np.ndarray): --> 416 self.array = np.asarray(self.array) 417 418 def array(self, dtype=None): /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/numpy-1.11.2-py3.5-linux-x86_64.egg/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """ --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None): /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in array(self, dtype) 398 399 def array(self, dtype=None): --> 400 return np.asarray(self.array, dtype=dtype) 401 402 def getitem(self, key): /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/numpy-1.11.2-py3.5-linux-x86_64.egg/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """ --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None): /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in array(self, dtype) 373 def array(self, dtype=None): 374 array = orthogonally_indexable(self.array) --> 375 return np.asarray(array[self.key], dtype=None) 376 377 def getitem(self, key): /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/conventions.py in getitem(self, key) 361 def getitem(self, key): 362 return mask_and_scale(self.array[key], self.fill_value, --> 363 self.scale_factor, self.add_offset, self._dtype) 364 365 def repr(self): /sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/conventions.py in mask_and_scale(array, fill_value, scale_factor, add_offset, dtype) 57 """ 58 # by default, cast to float to ensure NaN is meaningful ---> 59 values = np.array(array, dtype=dtype, copy=True) 60 if fill_value is not None and not np.all(pd.isnull(fill_value)): 61 if getattr(fill_value, 'size', 1) > 1: MemoryError: ``` I have uploaded the datafile to the following link: https://swiftbrowser.dkrz.de/public/dkrz_c0725fe8741c474b97f291aac57f268f/GregorMoeller/ Do I use the wrong netcdf-engine? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2863/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
441192361 | MDU6SXNzdWU0NDExOTIzNjE= | 2945 | Implicit conversion from int to float tampers with values when int is not representable as float | floogit 14000880 | closed | 0 | 1 | 2019-05-07T11:57:20Z | 2022-04-09T02:14:28Z | 2022-04-09T02:14:28Z | NONE | ```python ds = xr.Dataset() val = 95042027804193144 ds['var1'] = xr.DataArray(val) ds_1 = ds.where(ds.var1==val) print(ds_1.var1.dtype) dtype('float64') print(int(ds_1.var1)) 95042027804193152 ``` Problem descriptionAs described in #2183, int values are converted to float in Expected OutputI guess this is hard to fix. At a minimum, Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2945/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
326344778 | MDU6SXNzdWUzMjYzNDQ3Nzg= | 2183 | converting int vars to floats when I where the enclosing ds? | IvoCrnkovic 1778852 | open | 0 | 5 | 2018-05-25T00:48:43Z | 2022-04-09T02:14:23Z | NONE | Code Sample```python test_ds = xr.Dataset() test_ds['var1'] = xr.DataArray(np.arange(5)) test_ds['var2'] = xr.DataArray(np.ones(5)) assert(test_ds['var1'].dtype == np.int64) assert(test_ds.where(test_ds['var2'] == 1)['var1'].dtype == np.int64) ``` Problem descriptionSecond assert fails, which is a bit strange I think. Is that intended? If so, whats the reasoning? Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2183/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
446868198 | MDU6SXNzdWU0NDY4NjgxOTg= | 2978 | sel(method=x) is not propagated for MultiIndex | mschrimpf 5308236 | open | 0 | 3 | 2019-05-21T23:30:56Z | 2022-04-09T02:09:00Z | NONE | When passing a For a normal index, the This leads to an unexpected Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2978/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
447044177 | MDU6SXNzdWU0NDcwNDQxNzc= | 2980 | Jupyter Notebooks for Tutorials(USER GUIDE) | hdsingh 30382331 | open | 0 | 3 | 2019-05-22T10:01:26Z | 2022-04-09T02:07:55Z | NONE | This issue is more of a suggestion. A small issue that users reading documentation face is unavailability of jupyter notebooks for the tutorial docs User Guide. User constantly has to copy paste code from the documentation or Let's take example of 00 Setup — PyViz 0.10.0 documentation holoviews/examples/user_guide at master · pyviz/holoviews · GitHub Chatbot Tutorial — PyTorch Tutorials 1.1.0.dev20190507 documentation All of them provide option to download the tutorial in the form of |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2980/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
467736580 | MDU6SXNzdWU0Njc3MzY1ODA= | 3109 | In the contribution instructions, the py36.yml fails to set up | mmartini-usgs 23199378 | closed | 0 | 2 | 2019-07-13T15:55:23Z | 2022-04-09T02:05:48Z | 2022-04-09T02:05:48Z | NONE | Code Sample, a copy-pastable example if possibleconda env create -f ci/requirements/py36.yml Problem descriptionIn the contribution instructions, the py36.yml fails to set up, so the test environment does nto get created Expected OutputA test environment Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3109/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
484699415 | MDU6SXNzdWU0ODQ2OTk0MTU= | 3256 | .item() on a DataArray with dtype='datetime64[ns]' returns int | IvoCrnkovic 1778852 | open | 0 | 4 | 2019-08-23T20:29:50Z | 2022-04-09T02:03:43Z | NONE | MCVE Code Sample```python import datetime import xarray as xr test_da = xr.DataArray(datetime.datetime(2019, 1, 1, 1, 1)) test_da <xarray.DataArray ()>array('2019-01-01T01:01:00.000000000', dtype='datetime64[ns]')test_da.item() 1546304460000000000``` Expected OutputI would think it would be nice to get a Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3256/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
478398026 | MDU6SXNzdWU0NzgzOTgwMjY= | 3192 | Cloud Storage Buckets | pl-marasco 22492773 | closed | 0 | 1 | 2019-08-08T10:58:05Z | 2022-04-09T01:51:09Z | 2022-04-09T01:51:09Z | NONE | Following the instruction to create cloud storage here I stumbled with the fact that seems gcsfs doesn't anymore implement Is it the example correct or must be rewritten? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3192/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
60303760 | MDU6SXNzdWU2MDMwMzc2MA== | 364 | pd.Grouper support? | naught101 167164 | open | 0 | 24 | 2015-03-09T06:25:14Z | 2022-04-09T01:48:48Z | NONE | In pandas, you can pas a
AttributeError: 'TimeGrouper' object has no attribute 'ndim' ``` Not sure how this will work though, because pandas.TimeGrouper doesn't appear to work with multi-index dataframes yet anyway, so maybe there needs to be a feature request over there too, or maybe it's better to implement something from scratch... |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/364/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
400289716 | MDU6SXNzdWU0MDAyODk3MTY= | 2686 | Is `create_test_data()` public API? | TomNicholas 35968931 | open | 0 | 3 | 2019-01-17T14:00:20Z | 2022-04-09T01:48:14Z | MEMBER | We want to encourage people to use and extend xarray, and we already provide testing functions as public API to help with this. One function I keep using when writing code which uses xarray is Is there any reason why it shouldn't be public API? Is there something I should use instead? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2686/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
377096851 | MDU6SXNzdWUzNzcwOTY4NTE= | 2539 | Request: Add support for the ERDDAP griddap request | rmendels 1919031 | closed | 0 | 3 | 2018-11-03T21:56:10Z | 2022-04-09T01:47:28Z | 2022-04-09T01:47:28Z | NONE | xarray already supports OPenDAP requests, and the ERDDAP service is being installed in many places, and while an ERDDAP server can function as an OPeNDAP server, and its syntax is very close to the OpeNDAP syntax, ERDDAP/griddap has the advantage that requests can be made in coordinate space. Moreover, it would not have to be coded from scratch, ERDDAPy (https://github.com/pyoceans/erddapy) already has the code, it would be more of a question on how to integrate it. The ERDDAP service can return both netcdf or .dds files if that makes it easier to integrate. Thanks. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2539/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
818266159 | MDU6SXNzdWU4MTgyNjYxNTk= | 4973 | NetCDF encoded data not automatically decoded back into original dtype | chrism0dwk 625462 | closed | 0 | 2 | 2021-02-28T17:57:33Z | 2022-04-09T01:41:22Z | 2022-04-09T01:41:22Z | NONE | What happened: When reading in an encoded netCDF4 file, encoded variables are not transformed back to their original dtype in the resulting xarray. What you expected to happen:
As with the raw netCDF4 package, if an Minimal Complete Verifiable Example:
Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.7.7 (default, Mar 23 2020, 22:36:06) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-66-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.1.5 numpy: 1.19.5 scipy: None netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.02.0 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0 pip: 20.2.2 conda: None pytest: None IPython: 7.21.0 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4973/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
830040696 | MDU6SXNzdWU4MzAwNDA2OTY= | 5024 | xr.DataArray.sum() converts string objects into unicode | FabianHofmann 19226431 | open | 0 | 0 | 2021-03-12T11:47:06Z | 2022-04-09T01:40:09Z | CONTRIBUTOR | What happened: When summing over all axes of a DataArray with strings of dtype What you expected to happen: I expected the summation would preserve the dtype, meaning the one-size DataArray would be of dtype Minimal Complete Verifiable Example:
Output
On the other hand, when summing over one dimension only, the dtype is preserved
Output:
Anything else we need to know?: The problem becomes relevant as soon as dask is used in the workflow. Dask expects the aggregated DataArray to be of dtype Probably the behavior comes from creating a new DataArray after the reduction with Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-66-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: 0.7.4 h5py: 3.1.0 Nio: None zarr: 2.3.2 cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.0 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.01.1 distributed: 2021.01.1 matplotlib: 3.3.3 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: None pint: None setuptools: 52.0.0.post20210125 pip: 21.0 conda: 4.9.2 pytest: 6.2.2 IPython: 7.19.0 sphinx: 3.4.3 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5024/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
882105903 | MDU6SXNzdWU4ODIxMDU5MDM= | 5281 | 'Parallelized' apply_ufunc for scripy.interpolate.griddata | LJaksic 74414841 | open | 0 | 4 | 2021-05-09T10:08:46Z | 2022-04-09T01:39:13Z | NONE | Hi, I'm working with large files from an ocean model with an unstructered grid. For instance, variable flow velocity For smaller computational domains (smaller nFlowElement dimension) I ám still able to load the dataarray in my work memory. Then, the following code gives me the wanted result: ``` def interp_to_grid(u,xc,yc,xint,yint): print(u.shape,xc.shape,xint.shape) ug = griddata((xc,yc),u,(xint,yint), method='nearest', fill_value=np.nan) return ug uxg = xr.apply_ufunc(interp_to_grid,
ux, xc, yc, xint, yint,
dask = 'allowed',
input_core_dims=[['nFlowElem','time','laydim'],['nFlowElem'],['nFlowElem'],['dim_0','dim_1'],['dim_0','dim_1']],
output_core_dims=[['dim_0','dim_1','time','laydim']],
output_dtypes = [xr.DataArray]
)
However, for much larger spatial domains it is required to work with dask = 'parallelized', because these input dataarrays can nolonger be loaded into my working memory. I have tried to apply chunks over the time dimension, but also over the nFlowElement dimension. I am aware that it is not possible to chunk over core dimensions. This is one of my "parallel" attempts (with chunks along the time dim): Input ux:
File "interpnd.pyx", line 192, in scipy.interpolate.interpnd._check_init_shape ValueError: different number of values and points
Any advice is very welcome! Best Wishes, Luka |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5281/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
856900805 | MDU6SXNzdWU4NTY5MDA4MDU= | 5148 | Handling of non-string dimension names | bcbnz 367900 | open | 0 | 5 | 2021-04-13T12:13:44Z | 2022-04-09T01:36:19Z | CONTRIBUTOR | While working on a pull request (#5149) for #5146 I came across an inconsistency in allowed dimension names. If I try and create a DataArray with a non-string dimension, I get a TypeError: ```python console
But creating it with a string and renaming it works: ```python console
I can create a dataset via this renaming, but trying to get the repr value fails as ```python console
~/software/external/xarray/xarray/core/formatting.py in dim_summary(obj) 422 423 def dim_summary(obj): --> 424 elements = [f"{k}: {v}" for k, v in obj.sizes.items()] 425 return ", ".join(elements) 426 ~/software/external/xarray/xarray/core/formatting.py in <listcomp>(.0) 422 423 def dim_summary(obj): --> 424 elements = [f"{k}: {v}" for k, v in obj.sizes.items()] 425 return ", ".join(elements) 426 /usr/lib/python3.9/_collections_abc.py in iter(self) 847 848 def iter(self): --> 849 for key in self._mapping: 850 yield (key, self._mapping[key]) 851 ~/software/external/xarray/xarray/core/utils.py in iter(self) 437 438 def iter(self) -> Iterator[K]: --> 439 return iter(self.mapping) 440 441 def len(self) -> int: ~/software/external/xarray/xarray/core/utils.py in iter(self) 504 def iter(self) -> Iterator[K]: 505 # see #4571 for the reason of the type ignore --> 506 return iter(sorted(self.mapping)) # type: ignore[type-var] 507 508 def len(self) -> int: TypeError: '<' not supported between instances of 'str' and 'int' ``` The same thing happens if I call rename on the dataset rather than the array it is initialised with. If the initialiser requires the dimension names to be strings, and other code (which includes the HTML formatter I was looking at when I found this) assume that they are, then Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: 851d85b9203b49039237b447b3707b270d613db5 python: 3.9.2 (default, Feb 20 2021, 18:40:11) [GCC 10.2.0] python-bits: 64 OS: Linux OS-release: 5.11.13-arch1-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_NZ.UTF-8 LOCALE: en_NZ.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.2.3 numpy: 1.20.1 scipy: 1.6.2 netCDF4: 1.5.6 pydap: None h5netcdf: 0.10.0 h5py: 3.2.1 Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.2 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.03.0 distributed: 2021.03.0 matplotlib: 3.4.1 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: None pint: None setuptools: 54.2.0 pip: 20.3.1 conda: None pytest: 6.2.3 IPython: 7.22.0 sphinx: 3.5.4 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5148/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
910844095 | MDU6SXNzdWU5MTA4NDQwOTU= | 5434 | xarray.open_rasterio | ghost 10137 | closed | 0 | 2 | 2021-06-03T20:51:38Z | 2022-04-09T01:31:26Z | 2022-04-09T01:31:26Z | NONE | Could you please change |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5434/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1030768250 | I_kwDOAMm_X849cEZ6 | 5877 | Rolling() gives values different from pd.rolling() | chiaral 8453445 | open | 0 | 4 | 2021-10-19T21:41:42Z | 2022-04-09T01:29:07Z | CONTRIBUTOR | I am not sure this is a bug - but it clearly doesn't give the results the user would expect. The rolling sum of zeros gives me values that are not zeros ```python var = np.array([0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.31 , 0.91999996, 8.3 , 1.42 , 0.03 , 1.22 , 0.09999999, 0.14 , 0.13 , 0. , 0.12 , 0.03 , 2.53 , 0. , 0.19999999, 0.19999999, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], dtype='float32') timet = np.array([ 43200000000000, 129600000000000, 216000000000000, 302400000000000, 388800000000000, 475200000000000, 561600000000000, 648000000000000, 734400000000000, 820800000000000, 907200000000000, 993600000000000, 1080000000000000, 1166400000000000, 1252800000000000, 1339200000000000, 1425600000000000, 1512000000000000, 1598400000000000, 1684800000000000, 1771200000000000, 1857600000000000, 1944000000000000, 2030400000000000, 2116800000000000, 2203200000000000, 2289600000000000, 2376000000000000, 2462400000000000, 2548800000000000, 2635200000000000, 2721600000000000, 2808000000000000, 2894400000000000, 2980800000000000], dtype='timedelta64[ns]') ds_ex = xr.Dataset(data_vars=dict( pr=(["time"], var), ), coords=dict( time=("time", timet) ), ) ds_ex.rolling(time=3).sum().pr.values ``` it gives me this result: array([ nan, nan, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 3.1000000e-01, 1.2300000e+00, 9.5300007e+00, 1.0640000e+01, 9.7500000e+00, 2.6700001e+00, 1.3500001e+00, 1.4600002e+00, 3.7000012e-01, 2.7000013e-01, 2.5000012e-01, 1.5000013e-01, 2.6800001e+00, 2.5600002e+00, 2.7300003e+00, 4.0000033e-01, 4.0000033e-01, 2.0000035e-01, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07], dtype=float32) Note the non zero values - the non zero value changes depending on whether i use float64 or float32 as precision of my data. So this seems to be a precision related issue (although the first values are correctly set to zero), in fact other sums of values are not exactly what they should be. The small difference at the 8th/9th decimal position can be expected due to precision, but the fact that the 0s become non zeros is problematic imho, especially if not documented. Oftentimes zero in geoscience data can mean a very specific thing (i.e. zero rainfall will be characterized differently than non-zero). in pandas this instead works:
array([[ nan, nan, 0. , 0. , 0. , 0. , 0. , 0.31 , 1.22999996, 9.53000015, 10.6400001 , 9.75000015, 2.66999999, 1.35000001, 1.46000002, 0.36999998, 0.27 , 0.24999999, 0.15 , 2.67999997, 2.55999997, 2.72999996, 0.39999998, 0.39999998, 0.19999999, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]]) What you expected to happen: the sum of zeros should be zero. If this cannot be achieved/expected because of precision issues, it should be documented. Anything else we need to know?: I discovered this behavior in my old environments, but I created a new ad hoc environment with the latest versions, and it does the same thing. Environment: INSTALLED VERSIONScommit: None python: 3.9.7 (default, Sep 16 2021, 08:50:36) [Clang 10.0.0 ] python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 0.19.0 pandas: 1.3.3 numpy: 1.21.2 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 58.0.4 pip: 21.2.4 conda: None pytest: None IPython: 7.28.0 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5877/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
653442225 | MDU6SXNzdWU2NTM0NDIyMjU= | 4209 | `xr.save_mfdataset()` doesn't honor `compute=False` argument | andersy005 13301940 | open | 0 | 4 | 2020-07-08T16:40:11Z | 2022-04-09T01:25:56Z | MEMBER | What happened: While using What you expected to happen: I expect the datasets to be written when I explicitly call Minimal Complete Verifiable Example: ```python In [2]: import xarray as xr In [3]: ds = xr.tutorial.open_dataset('rasm', chunks={}) In [4]: ds Out[4]: <xarray.Dataset> Dimensions: (time: 36, x: 275, y: 205) Coordinates: * time (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00 xc (y, x) float64 dask.array<chunksize=(205, 275), meta=np.ndarray> yc (y, x) float64 dask.array<chunksize=(205, 275), meta=np.ndarray> Dimensions without coordinates: x, y Data variables: Tair (time, y, x) float64 dask.array<chunksize=(36, 205, 275), meta=np.ndarray> Attributes: title: /workspace/jhamman/processed/R1002RBRxaaa01a/l... institution: U.W. source: RACM R1002RBRxaaa01a output_frequency: daily output_mode: averaged convention: CF-1.4 references: Based on the initial model of Liang et al., 19... comment: Output from the Variable Infiltration Capacity... nco_openmp_thread_number: 1 NCO: "4.6.0" history: Tue Dec 27 14:15:22 2016: ncatted -a dimension... In [5]: path = "test.nc" In [7]: ls -ltrh test.nc ls: cannot access test.nc: No such file or directory In [8]: tasks = xr.save_mfdataset(datasets=[ds], paths=[path], compute=False) In [9]: tasks Out[9]: Delayed('list-aa0b52e0-e909-4e65-849f-74526d137542') In [10]: ls -ltrh test.nc -rw-r--r-- 1 abanihi ncar 14K Jul 8 10:29 test.nc ``` Anything else we need to know?: Environment: Output of <tt>xr.show_versions()</tt>```python INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-693.21.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.15.1 pandas: 0.25.3 numpy: 1.18.5 scipy: 1.5.0 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.0 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.20.0 distributed: 2.20.0 matplotlib: 3.2.1 cartopy: None seaborn: None numbagg: None setuptools: 49.1.0.post20200704 pip: 20.1.1 conda: None pytest: None IPython: 7.16.1 sphinx: None ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4209/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
439875798 | MDU6SXNzdWU0Mzk4NzU3OTg= | 2937 | encoding of boolean dtype in zarr | rabernat 1197350 | open | 0 | 3 | 2019-05-03T03:53:27Z | 2022-04-09T01:22:42Z | MEMBER | I want to store an array with 1364688000 boolean values in zarr. I will have to read this array many times, so I am trying to do it as efficiently as possible. I have noticed that, if we try to write boolean data to zarr from xarray, zarr stores it as Example
So it seems like, during serialization of bool data, xarray is converting the data to int8 and then adding a Problem descriptionSince zarr is fully capable of storing bool data directly, we should not need to encode the data as i8. I think this happens in which calls So maybe we make the boolean encoding optional? Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2937/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
650549352 | MDU6SXNzdWU2NTA1NDkzNTI= | 4197 | Provide a "shrink" command to remove bounding nan/ whitespace of DataArray | cwerner 13906519 | open | 0 | 7 | 2020-07-03T11:55:05Z | 2022-04-09T01:22:31Z | NONE | I'm currently trying to come up with an elegant solution to remove extra whitespace/ nan-values along the edges of a 2D DataArray. I'm working with geographic data and search for an automatic way to shrink the extend to valid data only. Think a map of the EU, but remove all cols/ rows of the array (starting from the edges) that only contain nan. Describe the solution you'd like A shrink command that removes all nan rows/ cols at the edges of a DataArray. Describe alternatives you've considered I currently do this with NumPy operating on the raw data and creating a new DataArray afterwards |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4197/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
528168017 | MDU6SXNzdWU1MjgxNjgwMTc= | 3573 | rasterio test failure | dcherian 2448579 | closed | 0 | 1 | 2019-11-25T15:40:19Z | 2022-04-09T01:17:32Z | 2022-04-09T01:17:32Z | MEMBER | version
``` =================================== FAILURES =================================== ___ TestRasterio.testrasterio_vrt ____ self = <xarray.tests.test_backends.TestRasterio object at 0x7fc8355c8f60>
xarray/tests/test_backends.py:3966: /usr/share/miniconda/envs/xarray-tests/lib/python3.6/site-packages/rasterio/sample.py:43: in sample_gen data = read(indexes, window=window, masked=masked, boundless=True)
rasterio/_warp.pyx:978: ValueError ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3573/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
504497403 | MDU6SXNzdWU1MDQ0OTc0MDM= | 3386 | add option to open_mfdataset for not using dask | sipposip 42270910 | closed | 0 | 6 | 2019-10-09T08:33:53Z | 2022-04-09T01:16:21Z | 2022-04-09T01:16:21Z | NONE | open_mfdataset only works with dask, whereas with open_dataset one can choose to use dask or not. It would be nice have an option (e.g. use_dask=False) to not use dask. My special use-case is the following: I use netcdf data as input for a tensorflow/keras application. I use parallel preprocessing threads in Keras. When using dask arrays, it gets complicated because both dask and tensorflow work with threads. I do not need any processing capability of dask/xarray, I only need a lazily loaded array that I can slice, and where the slices are loaded the moment they are accessed. So my application works nice with open_dataset (without defining chunks, and thus not using dask, but the data is accessed slice by slice, so it is never loaded as a whole into memory). However, it would be nice to have the same with open_mfdataset. Right now my workaround is to use netCDF4.MFDataset . (Obviously another workaround would be to concatenate my files into one and use open_dataset) Opening each file separately with open_dataset, and then concatenating them with xr.concat does not work, as this loads the data into memory. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3386/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);