issues
11 rows where comments = 5, type = "issue" and user = 1217238 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date), closed_at (date)
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 271043420 | MDU6SXNzdWUyNzEwNDM0MjA= | 1689 | Roundtrip serialization of coordinate variables with spaces in their names | shoyer 1217238 | open | 0 | 5 | 2017-11-03T16:43:20Z | 2024-03-22T14:02:48Z | MEMBER | If coordinates have spaces in their names, they get restored from netCDF files as data variables instead: ```
This happens because the CF convention is to indicate coordinates as a space separated string, e.g., Even though these aren't CF compliant variable names (which cannot have strings) It would be nice to have an ad-hoc convention for xarray that allows us to serialize/deserialize coordinates in all/most cases. Maybe we could use escape characters for spaces (e.g., At the very least, we should issue a warning in these cases. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/1689/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 325439138 | MDU6SXNzdWUzMjU0MzkxMzg= | 2171 | Support alignment/broadcasting with unlabeled dimensions of size 1 | shoyer 1217238 | open | 0 | 5 | 2018-05-22T19:52:21Z | 2022-04-19T03:15:24Z | MEMBER | Sometimes, it's convenient to include placeholder dimensions of size 1, which allows for removing any ambiguity related to the order of output dimensions. Currently, this is not supported with xarray: ```
However, these operations aren't really ambiguous. With size 1 dimensions, we could logically do broadcasting like NumPy arrays, e.g., ```
This would be particularly convenient if we add |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/2171/reactions",
"total_count": 4,
"+1": 4,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 237008177 | MDU6SXNzdWUyMzcwMDgxNzc= | 1460 | groupby should still squeeze for non-monotonic inputs | shoyer 1217238 | open | 0 | 5 | 2017-06-19T20:05:14Z | 2022-03-04T21:31:41Z | MEMBER | We can simply use |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/1460/reactions",
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 314444743 | MDU6SXNzdWUzMTQ0NDQ3NDM= | 2059 | How should xarray serialize bytes/unicode strings across Python/netCDF versions? | shoyer 1217238 | open | 0 | 5 | 2018-04-15T19:36:55Z | 2020-11-19T10:08:16Z | MEMBER | netCDF string typesWe have several options for storing strings in netCDF files:
-
NumPy/Python string typesOn the Python side, our options are perhaps even more confusing:
- NumPy's Like pandas, we are pretty liberal with converting back and forth between fixed-length ( Current behavior of xarrayCurrently, xarray uses the same behavior on Python 2/3. The priority was faithfully round-tripping data from a particular version of Python to netCDF and back, which the current serialization behavior achieves: | Python version | NetCDF version | NumPy datatype | NetCDF datatype | | --------- | ---------- | -------------- | ------------ | | Python 2 | NETCDF3 | np.string_ / str | NC_CHAR | | Python 2 | NETCDF4 | np.string_ / str | NC_CHAR | | Python 3 | NETCDF3 | np.string_ / bytes | NC_CHAR | | Python 3 | NETCDF4 | np.string_ / bytes | NC_CHAR | | Python 2 | NETCDF3 | np.unicode_ / unicode | NC_CHAR with UTF-8 encoding | | Python 2 | NETCDF4 | np.unicode_ / unicode | NC_STRING | | Python 3 | NETCDF3 | np.unicode_ / str | NC_CHAR with UTF-8 encoding | | Python 3 | NETCDF4 | np.unicode_ / str | NC_STRING | | Python 2 | NETCDF3 | object bytes/str | NC_CHAR | | Python 2 | NETCDF4 | object bytes/str | NC_CHAR | | Python 3 | NETCDF3 | object bytes | NC_CHAR | | Python 3 | NETCDF4 | object bytes | NC_CHAR | | Python 2 | NETCDF3 | object unicode | NC_CHAR with UTF-8 encoding | | Python 2 | NETCDF4 | object unicode | NC_STRING | | Python 3 | NETCDF3 | object unicode/str | NC_CHAR with UTF-8 encoding | | Python 3 | NETCDF4 | object unicode/str | NC_STRING | This can also be selected explicitly for most data-types by setting dtype in encoding:
- Script for generating table:
```python
from __future__ import print_function
import xarray as xr
import uuid
import netCDF4
import numpy as np
import sys
for dtype_name, value in [
('np.string_ / ' + type(b'').__name__, np.array([b'abc'])),
('np.unicode_ / ' + type(u'').__name__, np.array([u'abc'])),
('object bytes/' + type(b'').__name__, np.array([b'abc'], dtype=object)),
('object unicode/' + type(u'').__name__, np.array([u'abc'], dtype=object)),
]:
for format in ['NETCDF3_64BIT', 'NETCDF4']:
filename = str(uuid.uuid4()) + '.nc'
xr.Dataset({'data': value}).to_netcdf(filename, format=format)
with netCDF4.Dataset(filename) as f:
var = f.variables['data']
disk_dtype = var.dtype
has_encoding = hasattr(var, '_Encoding')
disk_dtype_name = (('NC_CHAR' if disk_dtype == 'S1' else 'NC_STRING') +
(' with UTF-8 encoding' if has_encoding else ''))
print('|', 'Python %i' % sys.version_info[0],
'|', format[:7],
'|', dtype_name,
'|', disk_dtype_name,
'|')
```
Potential alternativesThe main option I'm considering is switching to default to This would imply two changes:
1. Attempting to serialize arbitrary bytes (on Python 2) would start raising an error -- anything that isn't ASCII would require explicitly disabling This implicit conversion would be consistent with Python 2's general handling of bytes/unicode, and facilitate reading netCDF files on Python 3 that were written with Python 2. The counter-argument is that it may not be worth changing this at this late point, given that we will be sunsetting Python 2 support by year's end. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/2059/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
xarray 13221727 | issue | ||||||||
| 309136602 | MDU6SXNzdWUzMDkxMzY2MDI= | 2019 | Appending to an existing netCDF file fails with scipy==1.0.1 | shoyer 1217238 | closed | 0 | 5 | 2018-03-27T21:15:05Z | 2020-03-09T07:18:07Z | 2020-03-09T07:18:07Z | MEMBER | https://travis-ci.org/pydata/xarray/builds/359093748 Example failure: ``` ___ ScipyFilePathTest.test_append_write ____ self = <xarray.tests.test_backends.ScipyFilePathTest testMethod=test_append_write> def test_append_write(self): # regression for GH1215 data = create_test_data()
../../../miniconda/envs/test_env/lib/python3.6/contextlib.py:81: in enter return next(self.gen) xarray/tests/test_backends.py:155: in roundtrip_append self.save(data[[key]], path, mode=mode, save_kwargs) xarray/tests/test_backends.py:162: in save kwargs) xarray/core/dataset.py:1131: in to_netcdf unlimited_dims=unlimited_dims) xarray/backends/api.py:657: in to_netcdf unlimited_dims=unlimited_dims) xarray/core/dataset.py:1068: in dump_to_store unlimited_dims=unlimited_dims) xarray/backends/common.py:363: in store unlimited_dims=unlimited_dims) xarray/backends/common.py:402: in set_variables self.writer.add(source, target) xarray/backends/common.py:265: in add target[...] = source xarray/backends/scipy_.py:61: in setitem data[key] = value self = <scipy.io.netcdf.netcdf_variable object at 0x7fe3eb3ec6a0> index = Ellipsis, data = array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ]) def setitem(self, index, data): if self.maskandscale: missing_value = ( self._get_missing_value() or getattr(data, 'fill_value', 999999)) self._attributes.setdefault('missing_value', missing_value) self._attributes.setdefault('_FillValue', missing_value) data = ((data - self._attributes.get('add_offset', 0.0)) / self._attributes.get('scale_factor', 1.0)) data = np.ma.asarray(data).filled(missing_value) if self._typecode not in 'fd' and data.dtype.kind == 'f': data = np.round(data)
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/2019/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 440233667 | MDU6SXNzdWU0NDAyMzM2Njc= | 2940 | test_rolling_wrapped_dask is failing with dask-master | shoyer 1217238 | closed | 0 | 5 | 2019-05-03T21:44:23Z | 2019-06-28T16:49:04Z | 2019-06-28T16:49:04Z | MEMBER | The I reproduced this locally. The source of this issue on the xarray side appears to be these lines: https://github.com/pydata/xarray/blob/dd99b7d7d8576eefcef4507ae9eb36a144b60adf/xarray/core/rolling.py#L287-L291 In particular, we are currently @fujiisoup @jhamman any idea what's going on here? |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/2940/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 293345254 | MDU6SXNzdWUyOTMzNDUyNTQ= | 1875 | roll doesn't handle periodic boundary conditions well | shoyer 1217238 | closed | 0 | 5 | 2018-01-31T23:07:42Z | 2018-08-15T08:11:29Z | 2018-08-15T08:11:29Z | MEMBER | DataArray.roll() currently rolls both data variables and coordinates: ```
This is sort of makes sense, but the labels are now all non-monotonic, so you can't even plot the data with xarray. In my experience, you probably want coordinate labels that either look like:
It should be easier to accomplish this is in xarray. I currently resort to using roll and manually fixing up coordinates after the fact. I'm actually not sure if there are any use-cases for the current behavior. Choice (1) would have the virtue of being consistent with shift(): ```
Note: you might argue that this is overly geoscience specific, and it would be, if this was only for handling a longitude coordinate. But periodic boundary conditions are common in many areas of the physical sciences. |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/1875/reactions",
"total_count": 1,
"+1": 1,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 323674056 | MDU6SXNzdWUzMjM2NzQwNTY= | 2137 | 0.10.4 release | shoyer 1217238 | closed | 0 | 5 | 2018-05-16T15:31:57Z | 2018-05-17T02:29:52Z | 2018-05-17T02:29:52Z | MEMBER | Our last release was April 13 (just over a month ago), and we've had a number of features land, so I'd like to issue this shortly. Ideally within the next few days, or maybe even later today. CC @pydata/xarray |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/2137/reactions",
"total_count": 3,
"+1": 3,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 305702311 | MDU6SXNzdWUzMDU3MDIzMTE= | 1993 | DataArray.rolling().mean() is way slower than it should be | shoyer 1217238 | closed | 0 | 5 | 2018-03-15T20:10:22Z | 2018-03-18T08:56:27Z | 2018-03-18T08:56:27Z | MEMBER | Code Sample, a copy-pastable example if possibleFrom @RayPalmerTech in https://github.com/kwgoodman/bottleneck/issues/186: ```python import numpy as np import pandas as pd import time import bottleneck as bn import xarray import matplotlib.pyplot as plt N = 30000200 # Number of datapoints Fs = 30000 # sample rate T=1/Fs # sample period duration = N/Fs # duration in s t = np.arange(0,duration,T) # time vector DATA = np.random.randn(N,)+5np.sin(2np.pi0.01t) # Example noisy sine data and window size w = 330000 def using_bottleneck_mean(data,width): return bn.move_mean(a=data,window=width,min_count = 1) def using_pandas_rolling_mean(data,width): return np.asarray(pd.DataFrame(data).rolling(window=width,center=True,min_periods=1).mean()).ravel() def using_xarray_mean(data,width): return xarray.DataArray(data,dims='x').rolling(x=width,min_periods=1, center=True).mean() start=time.time() A = using_bottleneck_mean(DATA,w) print('Bottleneck: ', time.time()-start, 's') start=time.time() B = using_pandas_rolling_mean(DATA,w) print('Pandas: ',time.time()-start,'s') start=time.time() C = using_xarray_mean(DATA,w) print('Xarray: ',time.time()-start,'s') ``` This results in:
Somehow xarray is way slower than pandas and bottleneck, even though it's using bottleneck under the hood! Problem descriptionProfiling shows that the majority of time is spent in Now we obtain:
The solution is to make setting up windows done lazily (in Output of
|
{
"url": "https://api.github.com/repos/pydata/xarray/issues/1993/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 180756013 | MDU6SXNzdWUxODA3NTYwMTM= | 1034 | test_conventions.TestEncodeCFVariable failing on master for Appveyor Python 2.7 build | shoyer 1217238 | closed | 0 | 5 | 2016-10-03T21:48:55Z | 2016-10-22T00:49:53Z | 2016-10-22T00:49:53Z | MEMBER | I have on idea what's going on here but maybe somebody who knows Windows better has a guess: ``` ================================== FAILURES =================================== __ TestEncodeCFVariable.testmissing_fillvalue ___ self = <xarray.test.test_conventions.TestEncodeCFVariable testMethod=test_missing_fillvalue> def test_missing_fillvalue(self): v = Variable(['x'], np.array([np.nan, 1, 2, 3])) v.encoding = {'dtype': 'int16'} with self.assertWarns('floating point data as an integer'):
C:\Python27-conda32\lib\contextlib.py:24: in exit self.gen.next() self = <xarray.test.test_conventions.TestEncodeCFVariable testMethod=test_missing_fillvalue> message = 'floating point data as an integer' @contextmanager def assertWarns(self, message): with warnings.catch_warnings(record=True) as w: warnings.filterwarnings('always', message) yield assert len(w) > 0
I could understand a warning failing to be raised, but the |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/1034/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue | ||||||
| 89866276 | MDU6SXNzdWU4OTg2NjI3Ng== | 439 | Display datetime64 arrays without showing local timezones | shoyer 1217238 | closed | 0 | 5 | 2015-06-21T05:13:58Z | 2016-04-21T15:43:27Z | 2016-04-21T15:43:27Z | MEMBER | NumPy has an unfortunate way of adding local timezone offsets when printing datetime64 arrays:
We should use custom formatting code to remove the local timezone (to encourage folks just to use naive timezones/UTC). |
{
"url": "https://api.github.com/repos/pydata/xarray/issues/439/reactions",
"total_count": 0,
"+1": 0,
"-1": 0,
"laugh": 0,
"hooray": 0,
"confused": 0,
"heart": 0,
"rocket": 0,
"eyes": 0
} |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] (
[id] INTEGER PRIMARY KEY,
[node_id] TEXT,
[number] INTEGER,
[title] TEXT,
[user] INTEGER REFERENCES [users]([id]),
[state] TEXT,
[locked] INTEGER,
[assignee] INTEGER REFERENCES [users]([id]),
[milestone] INTEGER REFERENCES [milestones]([id]),
[comments] INTEGER,
[created_at] TEXT,
[updated_at] TEXT,
[closed_at] TEXT,
[author_association] TEXT,
[active_lock_reason] TEXT,
[draft] INTEGER,
[pull_request] TEXT,
[body] TEXT,
[reactions] TEXT,
[performed_via_github_app] TEXT,
[state_reason] TEXT,
[repo] INTEGER REFERENCES [repos]([id]),
[type] TEXT
);
CREATE INDEX [idx_issues_repo]
ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
ON [issues] ([user]);