github: issues: 25 rows where comments = 5 and user = 1217238 sorted by updated

25 rows where comments = 5 and user = 1217238 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	milestone	comments	created_at	updated_at ▲	closed_at	author_association	draft	pull_request	body	reactions	state_reason	repo	type
271043420	MDU6SXNzdWUyNzEwNDM0MjA=	1689	Roundtrip serialization of coordinate variables with spaces in their names	shoyer 1217238	open		5	2017-11-03T16:43:20Z	2024-03-22T14:02:48Z		MEMBER			If coordinates have spaces in their names, they get restored from netCDF files as data variables instead: ``` xarray.open_dataset(xarray.Dataset(coords={'name with spaces': 1}).to_netcdf()) <xarray.Dataset> Dimensions: () Data variables: name with spaces int32 1 ```` This happens because the CF convention is to indicate coordinates as a space separated string, e.g., `coordinates='latitude longitude'`. Even though these aren't CF compliant variable names (which cannot have strings) It would be nice to have an ad-hoc convention for xarray that allows us to serialize/deserialize coordinates in all/most cases. Maybe we could use escape characters for spaces (e.g., `coordinates='name\ with\ spaces'`) or quote names if they have spaces (e.g., `coordinates='"name\ with\ spaces"'`? At the very least, we should issue a warning in these cases.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1689/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
325439138	MDU6SXNzdWUzMjU0MzkxMzg=	2171	Support alignment/broadcasting with unlabeled dimensions of size 1	shoyer 1217238	open		5	2018-05-22T19:52:21Z	2022-04-19T03:15:24Z		MEMBER			Sometimes, it's convenient to include placeholder dimensions of size 1, which allows for removing any ambiguity related to the order of output dimensions. Currently, this is not supported with xarray: ``` xr.DataArray([1], dims='x') + xr.DataArray([1, 2, 3], dims='x') ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension sizes: {1, 3} xr.Variable(('x',), [1]) + xr.Variable(('x',), [1, 2, 3]) ValueError: operands cannot be broadcast together with mismatched lengths for dimension 'x': (1, 3) ``` However, these operations aren't really ambiguous. With size 1 dimensions, we could logically do broadcasting like NumPy arrays, e.g., ``` np.array([1]) + np.array([1, 2, 3]) array([2, 3, 4]) ``` This would be particularly convenient if we add `keepdims=True` to xarray operations (#2170).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2171/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
237008177	MDU6SXNzdWUyMzcwMDgxNzc=	1460	groupby should still squeeze for non-monotonic inputs	shoyer 1217238	open		5	2017-06-19T20:05:14Z	2022-03-04T21:31:41Z		MEMBER			We can simply use `argsort()` to determine `group_indices` instead of `np.arange()`: https://github.com/pydata/xarray/blob/22ff955d53e253071f6e4fa849e5291d0005282a/xarray/core/groupby.py#L256	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1460/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
645062817	MDExOlB1bGxSZXF1ZXN0NDM5NTg4OTU1	4178	Fix min_deps_check; revert to support numpy=1.14 and pandas=0.24	shoyer 1217238	closed		5	2020-06-25T00:37:19Z	2021-02-27T21:46:43Z	2021-02-27T21:46:42Z	MEMBER	1	pydata/xarray/pulls/4178	Fixes the issue noticed in: https://github.com/pydata/xarray/pull/4175#issuecomment-649135372 Let's see if this passes CI... [x] Passes `isort -rc . && black . && mypy . && flake8`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4178/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
314444743	MDU6SXNzdWUzMTQ0NDQ3NDM=	2059	How should xarray serialize bytes/unicode strings across Python/netCDF versions?	shoyer 1217238	open		5	2018-04-15T19:36:55Z	2020-11-19T10:08:16Z		MEMBER			netCDF string types We have several options for storing strings in netCDF files: - `NC_CHAR`: netCDF's legacy character type. The closest match is NumPy `'S1'` dtype. In principle, it's supposed to be able to store arbitrary bytes. On HDF5, it uses an UTF-8 encoded string with a fixed-size of 1 (but note that HDF5 does not complain about storing arbitrary bytes). - `NC_STRING`: netCDF's newer variable length string type. It's only available on netCDF4 (not netCDF3). It corresponds to an HDF5 variable-length string with UTF-8 encoding. - `NC_CHAR` with an `_Encoding` attribute: xarray and netCDF4-Python support an ad-hoc convention for storing unicode strings in `NC_CHAR` data-types, by adding an attribute `{'_Encoding': 'UTF-8'}`. The data is still stored as fixed width strings, but xarray (and netCDF4-Python) can decode them as unicode. `NC_STRING` would seem like a clear win in cases where it's supported, but as @crusaderky points out in https://github.com/pydata/xarray/issues/2040, it actually results in much larger netCDF files in many cases than using character arrays, which are more easily compressed. Nonetheless, we currently default to storing unicode strings in `NC_STRING`, because it's the most portable option -- every tool that handles HDF5 and netCDF4 should be able to read it properly as unicode strings. NumPy/Python string types On the Python side, our options are perhaps even more confusing: - NumPy's `dtype=np.string_` corresponds to fixed-length bytes. This is the default dtype for strings on Python 2, because on Python 2 strings are the same as bytes. - NumPy's `dtype=np.unicode_` corresponds to fixed-length unicode. This is the default dtype for strings on Python 3, because on Python 3 strings are the same as unicode. - Strings are also commonly stored in numpy arrays with `dtype=np.object_`, as arrays of either `bytes` or `unicode` objects. This is a pragmatic choice, because otherwise NumPy has no support for variable length strings. We also use this (like pandas) to mark missing values with `np.nan`. Like pandas, we are pretty liberal with converting back and forth between fixed-length (`np.string`/`np.unicode_`) and variable-length (object dtype) representations of strings as necessary. This works pretty well, though converting from object arrays in particular has downsides, since it cannot be done lazily with dask. Current behavior of xarray Currently, xarray uses the same behavior on Python 2/3. The priority was faithfully round-tripping data from a particular version of Python to netCDF and back, which the current serialization behavior achieves: \| Python version \| NetCDF version \| NumPy datatype \| NetCDF datatype \| \| --------- \| ---------- \| -------------- \| ------------ \| \| Python 2 \| NETCDF3 \| np.string_ / str \| NC_CHAR \| \| Python 2 \| NETCDF4 \| np.string_ / str \| NC_CHAR \| \| Python 3 \| NETCDF3 \| np.string_ / bytes \| NC_CHAR \| \| Python 3 \| NETCDF4 \| np.string_ / bytes \| NC_CHAR \| \| Python 2 \| NETCDF3 \| np.unicode_ / unicode \| NC_CHAR with UTF-8 encoding \| \| Python 2 \| NETCDF4 \| np.unicode_ / unicode \| NC_STRING \| \| Python 3 \| NETCDF3 \| np.unicode_ / str \| NC_CHAR with UTF-8 encoding \| \| Python 3 \| NETCDF4 \| np.unicode_ / str \| NC_STRING \| \| Python 2 \| NETCDF3 \| object bytes/str \| NC_CHAR \| \| Python 2 \| NETCDF4 \| object bytes/str \| NC_CHAR \| \| Python 3 \| NETCDF3 \| object bytes \| NC_CHAR \| \| Python 3 \| NETCDF4 \| object bytes \| NC_CHAR \| \| Python 2 \| NETCDF3 \| object unicode \| NC_CHAR with UTF-8 encoding \| \| Python 2 \| NETCDF4 \| object unicode \| NC_STRING \| \| Python 3 \| NETCDF3 \| object unicode/str \| NC_CHAR with UTF-8 encoding \| \| Python 3 \| NETCDF4 \| object unicode/str \| NC_STRING \| This can also be selected explicitly for most data-types by setting dtype in encoding: - `'S1'` for NC_CHAR (with or without encoding) - `str` for NC_STRING (though I'm not 100% sure it works properly currently when given bytes) Script for generating table: ```python from __future__ import print_function import xarray as xr import uuid import netCDF4 import numpy as np import sys for dtype_name, value in [ ('np.string_ / ' + type(b'').__name__, np.array([b'abc'])), ('np.unicode_ / ' + type(u'').__name__, np.array([u'abc'])), ('object bytes/' + type(b'').__name__, np.array([b'abc'], dtype=object)), ('object unicode/' + type(u'').__name__, np.array([u'abc'], dtype=object)), ]: for format in ['NETCDF3_64BIT', 'NETCDF4']: filename = str(uuid.uuid4()) + '.nc' xr.Dataset({'data': value}).to_netcdf(filename, format=format) with netCDF4.Dataset(filename) as f: var = f.variables['data'] disk_dtype = var.dtype has_encoding = hasattr(var, '_Encoding') disk_dtype_name = (('NC_CHAR' if disk_dtype == 'S1' else 'NC_STRING') + (' with UTF-8 encoding' if has_encoding else '')) print('\|', 'Python %i' % sys.version_info[0], '\|', format[:7], '\|', dtype_name, '\|', disk_dtype_name, '\|') ``` Potential alternatives The main option I'm considering is switching to default to `NC_CHAR` with UTF-8 encoding for np.string_ / str and object bytes/str on Python 2. The current behavior could be explicitly toggled by setting an encoding of `{'_Encoding': None}`. This would imply two changes: 1. Attempting to serialize arbitrary bytes (on Python 2) would start raising an error -- anything that isn't ASCII would require explicitly disabling `_Encoding`. 2. Strings read back from disk on Python 2 would come back as unicode instead of bytes. This implicit conversion would be consistent with Python 2's general handling of bytes/unicode, and facilitate reading netCDF files on Python 3 that were written with Python 2. The counter-argument is that it may not be worth changing this at this late point, given that we will be sunsetting Python 2 support by year's end.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2059/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
613546626	MDExOlB1bGxSZXF1ZXN0NDE0MjgwMDEz	4039	Revise pull request template	shoyer 1217238	closed		5	2020-05-06T19:08:19Z	2020-06-18T05:45:11Z	2020-06-18T05:45:10Z	MEMBER	0	pydata/xarray/pulls/4039	See below for the new language, to clarify that documentation is only necessary for "user visible changes." I added "including notable bug fixes" to indicate that minor bug fixes may not be worth noting (I was thinking of test-suite only fixes in this category) but perhaps that is too confusing. cc @pydata/xarray for opinions! [ ] Closes #xxxx [ ] Tests added [ ] Passes `isort -rc . && black . && mypy . && flake8` [ ] Fully documented, including `whats-new.rst` for user visible changes (including notable bug fixes) and `api.rst` for new API	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4039/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
612214951	MDExOlB1bGxSZXF1ZXN0NDEzMjIyOTEx	4028	Remove broken test for Panel with to_pandas()	shoyer 1217238	closed		5	2020-05-04T22:41:42Z	2020-05-06T01:50:21Z	2020-05-06T01:50:21Z	MEMBER	0	pydata/xarray/pulls/4028	We don't support creating a Panel with to_pandas() with any version of pandas at present, so this test was previous broken if pandas < 0.25 was installed.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4028/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
309136602	MDU6SXNzdWUzMDkxMzY2MDI=	2019	Appending to an existing netCDF file fails with scipy==1.0.1	shoyer 1217238	closed		5	2018-03-27T21:15:05Z	2020-03-09T07:18:07Z	2020-03-09T07:18:07Z	MEMBER			https://travis-ci.org/pydata/xarray/builds/359093748 Example failure: ``` ___ ScipyFilePathTest.test_append_write ____ self = <xarray.tests.test_backends.ScipyFilePathTest testMethod=test_append_write> def test_append_write(self): # regression for GH1215 data = create_test_data() `with self.roundtrip_append(data) as actual:` xarray/tests/test_backends.py:786: ../../../miniconda/envs/test_env/lib/python3.6/contextlib.py:81: in enter return next(self.gen) xarray/tests/test_backends.py:155: in roundtrip_append self.save(data[[key]], path, mode=mode, save_kwargs) xarray/tests/test_backends.py:162: in save kwargs) xarray/core/dataset.py:1131: in to_netcdf unlimited_dims=unlimited_dims) xarray/backends/api.py:657: in to_netcdf unlimited_dims=unlimited_dims) xarray/core/dataset.py:1068: in dump_to_store unlimited_dims=unlimited_dims) xarray/backends/common.py:363: in store unlimited_dims=unlimited_dims) xarray/backends/common.py:402: in set_variables self.writer.add(source, target) xarray/backends/common.py:265: in add target[...] = source xarray/backends/scipy_.py:61: in setitem data[key] = value self = <scipy.io.netcdf.netcdf_variable object at 0x7fe3eb3ec6a0> index = Ellipsis, data = array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ]) def setitem(self, index, data): if self.maskandscale: missing_value = ( self._get_missing_value() or getattr(data, 'fill_value', 999999)) self._attributes.setdefault('missing_value', missing_value) self._attributes.setdefault('_FillValue', missing_value) data = ((data - self._attributes.get('add_offset', 0.0)) / self._attributes.get('scale_factor', 1.0)) data = np.ma.asarray(data).filled(missing_value) if self._typecode not in 'fd' and data.dtype.kind == 'f': data = np.round(data) `# Expand data for record vars? if self.isrec: if isinstance(index, tuple): rec_index = index[0] else: rec_index = index if isinstance(rec_index, slice): recs = (rec_index.start or 0) + len(data) else: recs = rec_index + 1 if recs > len(self.data): shape = (recs,) + self._shape[1:] # Resize in-place does not always work since # the array might not be single-segment try: self.data.resize(shape) except ValueError: self.__dict__['data'] = np.resize(self.data, shape).astype(self.data.dtype)` `self.data[index] = data` E ValueError: assignment destination is read-only ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2019/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
479914290	MDExOlB1bGxSZXF1ZXN0MzA2NzExNDYx	3210	sparse=True option for from_dataframe and from_series	shoyer 1217238	closed		5	2019-08-13T01:09:19Z	2019-08-27T16:04:13Z	2019-08-27T08:54:26Z	MEMBER	0	pydata/xarray/pulls/3210	Fixes https://github.com/pydata/xarray/issues/3206 Example usage: `In [3]: import pandas as pd ...: import numpy as np ...: import xarray ...: df = pd.DataFrame({ ...: 'w': range(10), ...: 'x': list('abcdefghij'), ...: 'y': np.arange(0, 100, 10), ...: 'z': np.ones(10), ...: }).set_index(['w', 'x', 'y']) ...: In [4]: ds = xarray.Dataset.from_dataframe(df, sparse=True) In [5]: ds.z.data Out[5]: <COO: shape=(10, 10, 10), dtype=float64, nnz=10, fill_value=nan>` [x] Closes #3206, Closes #2139 [x] Tests added [x] Passes `black . && mypy . && flake8` [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3210/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
440233667	MDU6SXNzdWU0NDAyMzM2Njc=	2940	test_rolling_wrapped_dask is failing with dask-master	shoyer 1217238	closed		5	2019-05-03T21:44:23Z	2019-06-28T16:49:04Z	2019-06-28T16:49:04Z	MEMBER			The `test_rolling_wrapped_dask` tests in `test_dataarray.py` are failing with dask master, e.g., as seen here: https://travis-ci.org/pydata/xarray/jobs/527936531 I reproduced this locally. `git bisect` identified the culprit as https://github.com/dask/dask/pull/4756. The source of this issue on the xarray side appears to be these lines: https://github.com/pydata/xarray/blob/dd99b7d7d8576eefcef4507ae9eb36a144b60adf/xarray/core/rolling.py#L287-L291 In particular, we are currently `padded` as an xarray.DataArray object, not a dask array. Changing this to `padded.data` shows that passing an actual dask array to `dask_array_ops.rolling_window` results in failing tests. @fujiisoup @jhamman any idea what's going on here?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2940/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
427451138	MDExOlB1bGxSZXF1ZXN0MjY2MDQ4MzEw	2858	Various fixes for explicit Dataset.indexes	shoyer 1217238	closed		5	2019-03-31T21:48:47Z	2019-04-04T22:59:48Z	2019-04-04T21:58:24Z	MEMBER	0	pydata/xarray/pulls/2858	I've added internal consistency checks to the uses of `assert_equal` in our test suite, so this shouldn't happen again. [x] Closes #2856, closes #2854 [x] Tests added [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2858/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
365961291	MDExOlB1bGxSZXF1ZXN0MjE5NzUyOTE3	2458	WIP: sketch of resample support for CFTimeIndex	shoyer 1217238	closed		5	2018-10-02T15:44:36Z	2019-02-03T03:21:52Z	2019-02-03T03:21:52Z	MEMBER	0	pydata/xarray/pulls/2458	Example usage: ``` import xarray times = xarray.cftime_range('2000', periods=30, freq='MS') da = xarray.DataArray(range(30), [('time', times)]) da.resample(time='1AS').mean() <xarray.DataArray (time: 3)> array([ 5.5, 17.5, 26.5]) Coordinates: * time (time) object 2001-01-01 00:00:00 ... 2003-01-01 00:00:00 ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2458/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 1, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
388977754	MDExOlB1bGxSZXF1ZXN0MjM3MTAyNjYz	2595	Close files when CachingFileManager is garbage collected	shoyer 1217238	closed		5	2018-12-09T01:53:50Z	2018-12-23T20:11:35Z	2018-12-23T20:11:32Z	MEMBER	0	pydata/xarray/pulls/2595	This frees users from needing to worry about this. Using `__del__` turned up to be easier than using weak references. [x] Closes #2560 [x] Closes #2614 [x] Tests added [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new AP	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2595/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
293345254	MDU6SXNzdWUyOTMzNDUyNTQ=	1875	roll doesn't handle periodic boundary conditions well	shoyer 1217238	closed		5	2018-01-31T23:07:42Z	2018-08-15T08:11:29Z	2018-08-15T08:11:29Z	MEMBER			DataArray.roll() currently rolls both data variables and coordinates: ``` arr = xr.DataArray(range(4), [('x', range(0, 360, 90))]) arr.roll(x=2) <xarray.DataArray (x: 4)> array([2, 3, 0, 1]) Coordinates: * x (x) int64 180 270 0 90 ``` This is sort of makes sense, but the labels are now all non-monotonic, so you can't even plot the data with xarray. In my experience, you probably want coordinate labels that either look like: The unrolled original coordinates: [0, 90, 180, 270] Shifted coordinates: [-180, -90, 0, 90] It should be easier to accomplish this is in xarray. I currently resort to using roll and manually fixing up coordinates after the fact. I'm actually not sure if there are any use-cases for the current behavior. Choice (1) would have the virtue of being consistent with shift(): ``` arr.shift(x=2) <xarray.DataArray (x: 4)> array([nan, nan, 0., 1.]) Coordinates: * x (x) int64 0 90 180 270 ``` We could potentially add optional another argument for shifting labels, too, or requiring fixing that up by subtraction. Note: you might argue that this is overly geoscience specific, and it would be, if this was only for handling a longitude coordinate. But periodic boundary conditions are common in many areas of the physical sciences.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1875/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
323674056	MDU6SXNzdWUzMjM2NzQwNTY=	2137	0.10.4 release	shoyer 1217238	closed		5	2018-05-16T15:31:57Z	2018-05-17T02:29:52Z	2018-05-17T02:29:52Z	MEMBER			Our last release was April 13 (just over a month ago), and we've had a number of features land, so I'd like to issue this shortly. Ideally within the next few days, or maybe even later today. CC @pydata/xarray	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2137/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
305702311	MDU6SXNzdWUzMDU3MDIzMTE=	1993	DataArray.rolling().mean() is way slower than it should be	shoyer 1217238	closed		5	2018-03-15T20:10:22Z	2018-03-18T08:56:27Z	2018-03-18T08:56:27Z	MEMBER			Code Sample, a copy-pastable example if possible From @RayPalmerTech in https://github.com/kwgoodman/bottleneck/issues/186: ```python import numpy as np import pandas as pd import time import bottleneck as bn import xarray import matplotlib.pyplot as plt N = 30000200 # Number of datapoints Fs = 30000 # sample rate T=1/Fs # sample period duration = N/Fs # duration in s t = np.arange(0,duration,T) # time vector DATA = np.random.randn(N,)+5np.sin(2np.pi0.01t) # Example noisy sine data and window size w = 330000 def using_bottleneck_mean(data,width): return bn.move_mean(a=data,window=width,min_count = 1) def using_pandas_rolling_mean(data,width): return np.asarray(pd.DataFrame(data).rolling(window=width,center=True,min_periods=1).mean()).ravel() def using_xarray_mean(data,width): return xarray.DataArray(data,dims='x').rolling(x=width,min_periods=1, center=True).mean() start=time.time() A = using_bottleneck_mean(DATA,w) print('Bottleneck: ', time.time()-start, 's') start=time.time() B = using_pandas_rolling_mean(DATA,w) print('Pandas: ',time.time()-start,'s') start=time.time() C = using_xarray_mean(DATA,w) print('Xarray: ',time.time()-start,'s') ``` This results in: `Bottleneck: 0.0867006778717041 s Pandas: 0.563546895980835 s Xarray: 25.133142709732056 s` Somehow xarray is way slower than pandas and bottleneck, even though it's using bottleneck under the hood! Problem description Profiling shows that the majority of time is spent in `xarray.core.rolling.DataArrayRolling._setup_windows`. Monkey-patching that method with a dummy rectifies the issue: `xarray.core.rolling.DataArrayRolling._setup_windows = lambda *args: None` Now we obtain: `Bottleneck: 0.06775331497192383 s Pandas: 0.48262882232666016 s Xarray: 0.1723031997680664 s` The solution is to make setting up windows done lazily (in `__iter__`), instead of doing it in the constructor. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 4.4.96+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.2 pandas: 0.22.0 numpy: 1.14.2 scipy: 0.19.1 netCDF4: None h5netcdf: None h5py: 2.7.1 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: None distributed: None matplotlib: 2.1.2 cartopy: None seaborn: 0.7.1 setuptools: 36.2.7 pip: 9.0.1 conda: None pytest: None IPython: 5.5.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1993/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
299601789	MDExOlB1bGxSZXF1ZXN0MTcwOTM0ODg1	1936	Tweak stickler config: ignore Python files in the docs & disable fixer	shoyer 1217238	closed		5	2018-02-23T05:18:29Z	2018-02-25T20:51:42Z	2018-02-25T20:49:15Z	MEMBER	0	pydata/xarray/pulls/1936	It doesn't always make sense to lint these files fully.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1936/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
272460887	MDExOlB1bGxSZXF1ZXN0MTUxNTc2MzA1	1705	Make Indexer classes not inherit from tuple.	shoyer 1217238	closed		5	2017-11-09T07:08:27Z	2017-11-17T16:33:40Z	2017-11-14T03:32:34Z	MEMBER	0	pydata/xarray/pulls/1705	I'm not entirely sure this is a good idea. The advantage is that it ensures that all our indexing code is entirely explicit: everything that reaches a backend must be an ExplicitIndexer. The downside is that it removes a bit of internal flexibility: we can't just use tuples in place of basic indexers anymore. On the whole, I think this is probably worth it but I would appreciate feedback. @fujiisoup can you take a look? [x] Tests added / passed [x] Passes `git diff upstream/master */py \| flake8 --diff`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1705/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
112928260	MDExOlB1bGxSZXF1ZXN0NDg1MzUxMTA=	637	size and aspect arguments for plotting methods even without faceting	shoyer 1217238	closed		5	2015-10-23T02:10:06Z	2016-12-20T10:08:35Z	2016-12-20T10:08:35Z	MEMBER	0	pydata/xarray/pulls/637	I was finding myself writting `plt.figure(figsize=(x, y))` way too often. This will be a convenient shortcut. Still needs tests.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/637/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
180756013	MDU6SXNzdWUxODA3NTYwMTM=	1034	test_conventions.TestEncodeCFVariable failing on master for Appveyor Python 2.7 build	shoyer 1217238	closed		5	2016-10-03T21:48:55Z	2016-10-22T00:49:53Z	2016-10-22T00:49:53Z	MEMBER			I have on idea what's going on here but maybe somebody who knows Windows better has a guess: ``` ================================== FAILURES =================================== __ TestEncodeCFVariable.testmissing_fillvalue ___ self = <xarray.test.test_conventions.TestEncodeCFVariable testMethod=test_missing_fillvalue> def test_missing_fillvalue(self): v = Variable(['x'], np.array([np.nan, 1, 2, 3])) v.encoding = {'dtype': 'int16'} with self.assertWarns('floating point data as an integer'): `conventions.encode_cf_variable(v)` xarray\test\test_conventions.py:523: C:\Python27-conda32\lib\contextlib.py:24: in exit self.gen.next() self = <xarray.test.test_conventions.TestEncodeCFVariable testMethod=test_missing_fillvalue> message = 'floating point data as an integer' @contextmanager def assertWarns(self, message): with warnings.catch_warnings(record=True) as w: warnings.filterwarnings('always', message) yield assert len(w) > 0 `assert all(message in str(wi.message) for wi in w)` E AssertionError: NameError: all(<generator object \<genexpr> at 0x0617D170>) << global name 'message' is not defined xarray\test__init__.py:140: AssertionError ============== 1 failed, 970 passed, 67 skipped in 70.58 seconds ============== ``` I could understand a warning failing to be raised, but the `NameError` is especially strange.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1034/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
173908126	MDExOlB1bGxSZXF1ZXN0ODMxOTM2NTI=	993	Coordinate -> IndexVariable and other deprecations	shoyer 1217238	closed		5	2016-08-30T01:12:19Z	2016-09-01T21:56:07Z	2016-09-01T21:56:02Z	MEMBER	0	pydata/xarray/pulls/993	Renamed the `Coordinate` class from xarray's low level API to `IndexVariable`. xref https://github.com/pydata/xarray/pull/947#issuecomment-238549129 Deprecated supplying `coords` as a dictionary to the `DataArray` constructor without also supplying an explicit `dims` argument. The old behavior encouraged relying on the iteration order of dictionaries, which is a bad practice (fixes #727). Removed a number of methods deprecated since v0.7.0 or earlier: `load_data`, `vars`, `drop_vars`, `dump`, `dumps` and the `variables` keyword argument alias to `Dataset`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/993/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
89866276	MDU6SXNzdWU4OTg2NjI3Ng==	439	Display datetime64 arrays without showing local timezones	shoyer 1217238	closed		5	2015-06-21T05:13:58Z	2016-04-21T15:43:27Z	2016-04-21T15:43:27Z	MEMBER			NumPy has an unfortunate way of adding local timezone offsets when printing datetime64 arrays: `<xray.DataArray 'time' (time: 4000)> array(['1999-12-31T16:00:00.000000000-0800', '2000-01-01T16:00:00.000000000-0800', '2000-01-02T16:00:00.000000000-0800', ..., '2010-12-10T16:00:00.000000000-0800', '2010-12-11T16:00:00.000000000-0800', '2010-12-12T16:00:00.000000000-0800'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ...` We should use custom formatting code to remove the local timezone (to encourage folks just to use naive timezones/UTC).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/439/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
98498103	MDExOlB1bGxSZXF1ZXN0NDEzOTQyNzY=	507	Add sel_points for point-wise indexing by label	shoyer 1217238	closed		5	2015-08-01T01:52:52Z	2015-08-05T03:51:46Z	2015-08-05T03:51:44Z	MEMBER	0	pydata/xarray/pulls/507	xref #475 Example usage: ``` In [1]: da = xray.DataArray(np.arange(56).reshape((7, 8)), ...: coords={'x': list('abcdefg'), ...: 'y': 10 * np.arange(8)}, ...: dims=['x', 'y']) ...: In [2]: da Out[2]: <xray.DataArray (x: 7, y: 8)> array([[ 0, 1, 2, 3, 4, 5, 6, 7], [ 8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47], [48, 49, 50, 51, 52, 53, 54, 55]]) Coordinates: * y (y) int64 0 10 20 30 40 50 60 70 * x (x) \|S1 'a' 'b' 'c' 'd' 'e' 'f' 'g' we can index by position along each dimension In [3]: da.isel_points(x=[0, 1, 6], y=[0, 1, 0], dim='points') Out[3]: <xray.DataArray (points: 3)> array([ 0, 9, 48]) Coordinates: y (points) int64 0 10 0 x (points) \|S1 'a' 'b' 'g' * points (points) int64 0 1 2 or equivalently by label In [4]: da.sel_points(x=['a', 'b', 'g'], y=[0, 10, 0], dim='points') Out[4]: <xray.DataArray (points: 3)> array([ 0, 9, 48]) Coordinates: y (points) int64 0 10 0 x (points) \|S1 'a' 'b' 'g' * points (points) int64 0 1 2 Bug fixes ``` cc @jhamman	{ "url": "https://api.github.com/repos/pydata/xarray/issues/507/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
43442970	MDExOlB1bGxSZXF1ZXN0MjE1NjU3Mjg=	236	WIP: convert to/from cdms2 variables	shoyer 1217238	closed	0.3.2 836999	5	2014-09-22T08:48:52Z	2014-12-19T09:11:42Z	2014-12-19T09:11:39Z	MEMBER	0	pydata/xarray/pulls/236	Fixes #133 @DamienIrving am I missing anything obvious here?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/236/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
27625970	MDExOlB1bGxSZXF1ZXN0MTI1NzI5OTE=	12	Stephan's sprintbattical	shoyer 1217238	closed		5	2014-02-14T21:23:09Z	2014-08-04T00:03:21Z	2014-02-21T00:36:53Z	MEMBER	0	pydata/xarray/pulls/12		{ "url": "https://api.github.com/repos/pydata/xarray/issues/12/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

25 rows where comments = 5 and user = 1217238 sorted by updated_at descending

netCDF string types

NumPy/Python string types

Current behavior of xarray

Potential alternatives

Code Sample, a copy-pastable example if possible

Problem description

Output of xr.show_versions()

we can index by position along each dimension

or equivalently by label

Advanced export

Output of `xr.show_versions()`