github: issues: 8 rows where repo = 13221727, state = "open" and user = 14314623 sorted by updated

8 rows where repo = 13221727, state = "open" and user = 14314623 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	assignee	comments	created_at	updated_at ▲	author_association	body	reactions	repo	type
977544678	MDU6SXNzdWU5Nzc1NDQ2Nzg=	5733	Shoudn't `assert_allclose` transpose datasets?	jbusecke 14314623	open		16	2021-08-23T22:55:12Z	2024-05-01T15:27:21Z	CONTRIBUTOR	I am trying to compare two datasets, one of which has possibly transposed dimensions on a data variable. `python import xarray as xr import numpy as np data = np.random.rand(4,6) da = xr.DataArray(data, dims=['x','y']) ds1 = xr.Dataset({'data':da}) ds2 = xr.Dataset({'data':da}).transpose('y','x')` What happened: In my mind this should pass `python xr.testing.assert_allclose(ds1, ds2)` but instead it fails ``` AssertionError Traceback (most recent call last) <ipython-input-7-58cd53174a1e> in <module> ----> 1 xr.testing.assert_allclose(ds1, ds2) `[... skipping hidden 1 frame]` /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/testing.py in assert_allclose(a, b, rtol, atol, decode_bytes) 169 a.variables, b.variables, compat=compat_variable 170 ) --> 171 assert allclose, formatting.diff_dataset_repr(a, b, compat=equiv) 172 else: 173 raise TypeError("{} not supported by assertion comparison".format(type(a))) AssertionError: Left and right Dataset objects are not close Differing data variables: L data (x, y) float64 0.8589 0.09264 0.0264 ... 0.1039 0.3685 0.3983 R data (y, x) float64 0.8589 0.8792 0.8433 0.6952 ... 0.3664 0.2214 0.3983 ``` Simply transposing `ds2` to the same dimensions of `ds1` fixes this (since the data is the same after all) `python xr.testing.assert_allclose(ds1, ds2.transpose('x','y'))` Since most of the other xarray operations are 'transpose-safe' (`(ds1+ds2) = (ds1 + ds2.transpose('x','y')`), shouldnt this one be too? Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 \| packaged by conda-forge \| (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.109+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.7.1 cftime: 1.4.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.2.2 cfgrib: 0.9.9.0 iris: None bottleneck: 1.3.2 dask: 2021.04.1 distributed: 2021.04.1 matplotlib: 3.4.1 cartopy: 0.19.0 seaborn: None numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 20.3.4 conda: None pytest: None IPython: 7.22.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5733/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1755610168	I_kwDOAMm_X85opHw4	7918	xarray lazy indexing/loading is not sufficiently documented	jbusecke 14314623	open		1	2023-06-13T20:35:38Z	2023-06-13T21:29:53Z	CONTRIBUTOR	What is your issue? The default behavior of opening up datasets lazily instead of loading them into memory urgently needs more documentation or more extensive linking of existing docs. I have seen tons of example where the 'laziness' of the loading is not apparent to users. The workflow commonly looks something like this: 1. Open some 'larger-than-memory' dataset, e.g. from a cloud bucket with `xr.open_dataset('gs://your/bucket/store.zarr', engine='zarr')` 2. Printing the dataset repr and seeing no indication of the dataset not being in memory (I am not sure if there is somewhere to check this that I might have missed?) 3. Layering some calculations on it, and at some point a calculation is triggered and blows up the users machine/kernel.4. To start with, the docstring of `open_dataset` does not mention at all what is going to happen when the default `chunks=None` is used! This could be easily fixed if some more extensive text on the lazy loading exists. I was also not able to find any more descriptive docs on this feature, even though I might have missed something here. Up until a chat I had with @TomNicholas today, I honestly did not understand why this feature even existed. His explanation (below) was however very good, and if something similar is not in the docs yet, should probably be added. When you open a dataset from disk (/zarr), often the first thing you want to do is concatenate it and subset it by indexing. (the concatenation may even happen automatically in open_mfdataset). If you do not have dask (or choose not to use dask), opening a file would load all its data into a numpy array. You might then want to concatenate with other numpy arrays from opening other files, but then subset to only some time steps (for example). If everything is immediately loaded as numpy arrays this would be extremely wasteful - you would load all these values into memory even though you're about to drop them by slicing. This is why xarray has internal lazy indexing classes - they lazily do the indexing without actually loading the data as numpy arrays. If instead you load with dask, then because dask does everything lazily, you get basically the same features. But the implementation of those features is completely different (dask's delayed array objects vs xarray's lazy indexing internal classes). I think overall this is a giant pitfall, particularly for xarray beginners, and thus deserves some thought. While I am sure the choices made up to here might have some large functional upsides, I wonder three things: How can we improve the docs to at least make this behavior more obvious? Is the choice of setting the loading to lazy by default a required choice? Is there a way that we could indicate 'laziness' in the dataset repr? I do not know how to check whether a dataset is actually in memory at this point. Or maybe there is a clever way to raise a warning if the size of the dataset is larger than the system memory? Happy to work on this, since it is very relevant for many members of projects I work with. I first wanted to check if there is some existing docs that I missed.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7918/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
950882492	MDU6SXNzdWU5NTA4ODI0OTI=	5629	Polyfit performance on large datasets - Suboptimal dask task graph	jbusecke 14314623	open		15	2021-07-22T17:19:52Z	2023-01-24T20:19:19Z	CONTRIBUTOR	What happened: I am trying to calculate a linear trend over a large climate model simulation. I use rechunker to chunk the data along the horizontal dimensions and make sure that the time dimension (along which I want to calculate the fit) is not chunked. In my realistic example, this blows up the memory of my workers. What you expected to happen: I expected this to work very smoothly because it should be embarassingly parallel (no information of sourrounding chunks is needed and the time dimension is complete in each chunk). Minimal Complete Verifiable Example: I think this minimal example shows that the task graph created is not ideal `python import xarray as xr import dask.array as dsa da = xr.DataArray(dsa.random.random((4,6, 100), chunks=(1,2,100)), dims=['x','y', 'time']) da` Note that the original data array has 12 chunks When I apply polyfit I get this `python poly = da.polyfit('time', 1) poly` Now the number of chunks has decreased to 4? I am not sure why, but this indicates to me that my problem might be related to #4554. When I look at the task graph it seems that this explains why for very large dataset the computation blows up: As I said before I would expect this calculation to be fully parallelizable, but there seems to be some aggregation/rechunking steps in the bottom layer. How 'bad' these get (e.g. how many input chunks get lumped together in the `rechunk/merge` step, depends on the chunking structure of the horizontal axes. cc @TomNicholas Anything else we need to know?: Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 \| packaged by conda-forge \| (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.89+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.17.0 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.7.1 cftime: 1.4.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.2.2 cfgrib: 0.9.9.0 iris: None bottleneck: 1.3.2 dask: 2021.04.1 distributed: 2021.04.1 matplotlib: 3.4.1 cartopy: 0.19.0 seaborn: None numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 20.3.4 conda: None pytest: None IPython: 7.22.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5629/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
956259734	MDU6SXNzdWU5NTYyNTk3MzQ=	5649	xr.merge bug? when using combine_attrs='drop_conflicts'	jbusecke 14314623	open	keewis 14808389	3	2021-07-29T22:47:43Z	2022-04-09T03:14:24Z	CONTRIBUTOR	What happened: I have recently encountered a situation where combining two datasets failed, due to the datatype of their attributes. This example illustrates the situation: ```python ds1 = xr.Dataset(attrs={'a':[5]}) ds2 = xr.Dataset(attrs={'a':6}) xr.merge([ds1, ds2], combine_attrs='drop_conflicts') `give me this error:` TypeError Traceback (most recent call last) <ipython-input-12-1c8e82be0882> in <module> 2 ds2 = xr.Dataset(attrs={'a':6}) 3 ----> 4 xr.merge([ds1, ds2], combine_attrs='drop_conflicts') /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge(objects, compat, join, fill_value, combine_attrs) 898 dict_like_objects.append(obj) 899 --> 900 merge_result = merge_core( 901 dict_like_objects, 902 compat, /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value) 654 ) 655 --> 656 attrs = merge_attrs( 657 [var.attrs for var in coerced if isinstance(var, (Dataset, DataArray))], 658 combine_attrs, /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in merge_attrs(variable_attrs, combine_attrs, context) 544 } 545 ) --> 546 result = { 547 key: value 548 for key, value in result.items() /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/merge.py in <dictcomp>(.0) 547 key: value 548 for key, value in result.items() --> 549 if key not in attrs or equivalent(attrs[key], value) 550 } 551 dropped_keys \|= {key for key in attrs if key not in result} /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/utils.py in equivalent(first, second) 171 return duck_array_ops.array_equiv(first, second) 172 elif isinstance(first, list) or isinstance(second, list): --> 173 return list_equiv(first, second) 174 else: 175 return ( /srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/utils.py in list_equiv(first, second) 182 def list_equiv(first, second): 183 equiv = True --> 184 if len(first) != len(second): 185 return False 186 else: TypeError: object of type 'int' has no len() ``` Took me a while to find out what the root cause of this was with a fully populated dataset, since the error is less than obvious. What you expected to happen: In my understanding this should just drop the attribute `a`. The example works like expected when both attributes are an integer or both are lists with an integer. The error is only triggered when the type is mixed. Is there a way to handle this case more elegantly? Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 \| packaged by conda-forge \| (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.89+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.1.dev8+gda99a566 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.7.1 cftime: 1.4.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.2.2 cfgrib: 0.9.9.0 iris: None bottleneck: 1.3.2 dask: 2021.04.1 distributed: 2021.04.1 matplotlib: 3.4.1 cartopy: 0.19.0 seaborn: None numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 20.3.4 conda: None pytest: None IPython: 7.22.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5649/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1185232773	I_kwDOAMm_X85GpTeF	6425	Expose `coord_mode` in `.pad()`	jbusecke 14314623	open		1	2022-03-29T18:06:35Z	2022-03-30T09:19:05Z	CONTRIBUTOR	Is your feature request related to a problem? I am experiencing some issues with the way coordinates are padded. Currently the coordinates are padded according to the `mode` parameter passed to `pad()` here. For my particular issue I want to be able to have different modes for the data, but effectively pin the mode that is used to pad the coordinate values. A simple example: ```python import xarray as xr import numpy as np da = xr.DataArray([1,2,3], dims=['x'], coords={'x':[4,5,6]}) da ``` If I pad with `constant` `python da.pad(x=(0,1), mode='constant')` I get this: which is the coordinate padding I need. But if I pad with `wrap`: `python da.pad(x=(0,1), mode='wrap')` which gives me repeated coordinate values Describe the solution you'd like I would like to be able to do something like this: `python da.pad(x=(0,1), mode='wrap', coord_pad_mode='constant')` and get something like this Since #3596 internally already defines `coord_pad_mode` it should be easy enough to expose this to the user? Possibly related (but I think not incompatible with the changes propsed here?): #3868 Happy to work with @TomNicholas on a PR, but wanted to get some feedback/comments first. cc @dcherian @mark-boer who worked on this code. Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6425/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
830638672	MDU6SXNzdWU4MzA2Mzg2NzI=	5030	`missing_dims` option for aggregation methods like `mean` and `std`	jbusecke 14314623	open		5	2021-03-12T23:12:08Z	2022-03-03T22:37:16Z	CONTRIBUTOR	I work a lot with climate model output and often loop over several models, of which some have a 'member' dimension and others don't. I end up writing many lines like this: `python for ds in model_datasets: if 'member_id' in ds.dims: ds = ds.mean('member_id)` Which often makes for very lengthy code blocks. I recently noticed that `.isel()` actually has a nifty keyword argument 'missing_dims', which enables the user to apply `isel` and it just doesn't do anything when the dimension is not present. I'd love to be able to do: `python for ds in model_datasets: ds = ds.mean('member_id', missing_dims='ignore')` Is there a way to implement this generally for xarray aggregation methods (mean/max/min/std/...). Or is there a reason this should be avoided?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5030/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
627600168	MDU6SXNzdWU2Mjc2MDAxNjg=	4112	Unexpected chunking behavior when using `xr.align` with `join='outer'`	jbusecke 14314623	open		6	2020-05-29T23:46:31Z	2020-10-06T20:20:34Z	CONTRIBUTOR	I just came across some unexpected behavior, when using `xr.align` with the option `join='outer'` on two Dataarrays which contain dask.arrays and have different dimension lengths. MCVE Code Sample ```python import numpy as np import xarray as xr short_time = xr.cftime_range('2000', periods=12) long_time = xr.cftime_range('2000', periods=120) data_short = np.random.rand(len(short_time)) data_long = np.random.rand(len(long_time)) a = xr.DataArray(data_short, dims=['time'], coords={'time':short_time}).chunk({'time':3}) b = xr.DataArray(data_long, dims=['time'], coords={'time':long_time}).chunk({'time':3}) a,b = xr.align(a,b, join = 'outer') ``` Expected Output As expected `a` is filled with missing values: `python a.plot() b.plot()` But the filled values do not replicate the chunking along the time dimension in `b`. Instead the padded values are in one single chunk, which can be substantially larger than the others. `python a.data` `python b.data` (Quick shoutout for the amazing html representation. This made diagnosing this problem super easy! 🥳 ) Problem Description I think for many problems it would be more appropriate if the padded portion of the array would have a chunking scheme like the longer array. A practical example (which brought me to this issue) is given in the CMIP6 data archive, where some models give output for several members, with some of them running longer than others, leading to problems when these are combined (see intake-esm/#225). Basically for that particular model, there are 5 members with a runtime of 100 years and one member with a runtime of 300 years. I think using `xr.align` leads immediately to a chunk that is 200 years long and blows up the memory on all systems I have tried this on. Is there a way to work around this, or is this behavior intended and I am missing something? cc'ing @dcherian @andersy005 Versions Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.2 \| packaged by conda-forge \| (default, Apr 24 2020, 08:20:52) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1127.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.15.1 pandas: 1.0.3 numpy: 1.18.4 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.8.0 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.1.2 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.1.3 cfgrib: None iris: None bottleneck: None dask: 2.15.0 distributed: 2.15.2 matplotlib: 3.2.1 cartopy: 0.18.0 seaborn: None numbagg: None setuptools: 46.1.3.post20200325 pip: 20.1 conda: None pytest: 5.4.2 IPython: 7.14.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4112/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
447361330	MDU6SXNzdWU0NDczNjEzMzA=	2982	cbar_kwargs are ignored for `plot.contourf`	jbusecke 14314623	open		3	2019-05-22T21:57:24Z	2019-05-30T00:36:17Z	CONTRIBUTOR	Code Sample, a copy-pastable example if possible I just stumbled on some weird behaviour, possibly a bug. When plotting with the `contourf` method, xarray ignores the `extend` value passed to `cbar_kwargs` ```python import numpy as np import xarray as xr %matplotlib inline test_data = np.random.rand(10, 10) da = xr.DataArray(test_data) h = da.plot.contourf(vmax=0.5, cbar_kwargs={"extend": "neither"}) print(h.colorbar.extend) ``` When doing a similar plot with matplotlib this happens ``` matplotlib example import matplotlib.pyplot as plt ha = plt.contourf(test_data, vmax=0.3) cb = plt.colorbar(ha, extend='neither') ``` Is xarray adjusting the colorbar to the `vmin` and `vmax` inputs but not carrying the `cbar_kwargs` for that step? This makes it especially hard to produce movies with (e.g. xmovie) because the colorbar `flickers`. If someone could point me to the right location in the code and tests, I would be happy to put together a PR. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 \| packaged by conda-forge \| (default, Feb 28 2019, 09:07:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 2.6.32-696.30.1.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US LOCALE: en_US.ISO8859-1 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.0 pandas: 0.24.2 numpy: 1.16.2 scipy: 1.2.1 netCDF4: 1.5.0.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.3.1 cftime: 1.0.3.4 nc_time_axis: 1.2.0 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 1.1.5 distributed: 1.26.1 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: 4.4.2 IPython: 7.1.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2982/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

8 rows where repo = 13221727, state = "open" and user = 14314623 sorted by updated_at descending

```

What is your issue?

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

MCVE Code Sample

Expected Output

Problem Description

Versions

Code Sample, a copy-pastable example if possible

matplotlib example

Output of xr.show_versions()

Advanced export

Output of `xr.show_versions()`