home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

4 rows where user = 1882397 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 4

state 1

  • closed 4

repo 1

  • xarray 4
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
279456192 MDU6SXNzdWUyNzk0NTYxOTI= 1761 Importing xarray fails if old version of bottleneck is installed aseyboldt 1882397 closed 0     5 2017-12-05T17:10:25Z 2020-02-09T21:39:48Z 2020-02-09T21:39:48Z NONE      

Importing version 0.11 of xarray fails if version 1.0.0 of Bottleneck is installed. Bottleneck seems to be an optional dependency of xarray. During runtime xarray replaces functions by their bottleneck versions if that is installed, but it does not check if the version of bottleneck that is installed is new enough to provide that function:

The getattr here fails with an AttributeError in this case:

https://github.com/pydata/xarray/blob/b46fcd656391d786b8d25b0615f6d4bd30b524b7/xarray/core/ops.py#L361-L365

AttributeError: 'module' object has no attribute 'move_argmax'

move_argmax was included into bottleneck in version 1.1.0, so if version 1.0 is installed this can't work.

I saw this on python2.7, but I don't think that should matter...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1761/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
372006204 MDU6SXNzdWUzNzIwMDYyMDQ= 2496 Incorrect conversion from sliced pd.MultiIndex aseyboldt 1882397 closed 0     2 2018-10-19T15:25:38Z 2019-02-19T09:42:52Z 2019-02-19T09:42:51Z NONE      

If we convert a pandas dataframe with a multiindex, slice it to remove some entries from the index, a converted DataArray still contains the removed items in the coordinates (although the values are NaN).

```python

We create an example dataframe

idx = pd.MultiIndex.from_product([list('abc'), list('xyz')]) df = pd.DataFrame(data={'col': np.random.randn(len(idx))}, index=idx) df.columns.name = 'cols' df.index.names = ['idx1', 'idx2'] df2 = df.loc[['a', 'b']] python

df2 does not contain c in the first level

df2 cols col idx1 idx2
a x -0.844476 y -0.845998 z 1.965143 b x -0.159293 y 0.188163 z -1.076204

It still shows up in the converted xarray though:

xr.DataArray(df2).unstack('dim_0') <xarray.DataArray (cols: 1, idx1: 3, idx2: 3)> array([[[-0.844476, -0.845998, 1.965143], [-0.159293, 0.188163, -1.076204], [ nan, nan, nan]]]) Coordinates: * cols (cols) object 'col' * idx1 (idx1) object 'a' 'b' 'c' * idx2 (idx2) object 'x' 'y' 'z' ```

If the original dataframe is very sparse, this can lead to gigantic unnecessary memory usage.

#### Output of ``xr.show_versions()`` ``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_GB.UTF-8 LANG: None LOCALE: en_GB.UTF-8 xarray: 0.10.9 pandas: 0.23.4 numpy: 1.15.2 scipy: 1.1.0 netCDF4: 1.4.1 h5netcdf: 0.6.2 h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.0b1 PseudonetCDF: None rasterio: None iris: None bottleneck: 1.2.1 cyordereddict: None dask: 0.19.2 distributed: 1.23.2 matplotlib: 3.0.0 cartopy: None seaborn: 0.9.0 setuptools: 40.4.3 pip: 18.0 conda: 4.5.11 pytest: 3.8.1 IPython: 7.0.1 sphinx: 1.8.1 ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2496/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
355264812 MDU6SXNzdWUzNTUyNjQ4MTI= 2389 Large pickle overhead in ds.to_netcdf() involving dask.delayed functions aseyboldt 1882397 closed 0     11 2018-08-29T17:43:28Z 2019-01-13T21:17:12Z 2019-01-13T21:17:12Z NONE      

If we write a dask array that doesn't involve dask.delayed functions using ds.to_netcdf, there is only little overhead from pickle:

python vals = da.random.random(500, chunks=(1,)) ds = xr.Dataset({'vals': (['a'], vals)}) write = ds.to_netcdf('file2.nc', compute=False) %prun -stime -l10 write.compute()

``` 123410 function calls (104395 primitive calls) in 13.720 seconds

Ordered by: internal time List reduced from 203 to 10 due to restriction <10>

ncalls tottime percall cumtime percall filename:lineno(function) 8 10.032 1.254 10.032 1.254 {method 'acquire' of '_thread.lock' objects} 1001 2.939 0.003 2.950 0.003 {built-in method _pickle.dumps} 1001 0.614 0.001 3.569 0.004 pickle.py:30(dumps) 6504/1002 0.012 0.000 0.021 0.000 utils.py:803(convert) 11507/1002 0.010 0.000 0.019 0.000 utils_comm.py:144(unpack_remotedata) 6013 0.009 0.000 0.009 0.000 utils.py:767(tokey) 3002/1002 0.008 0.000 0.017 0.000 utils_comm.py:181(<listcomp>) 11512 0.007 0.000 0.008 0.000 core.py:26(istask) 1002 0.006 0.000 3.589 0.004 worker.py:788(dumps_task) 1 0.005 0.005 0.007 0.007 core.py:273(<dictcomp>) ```

But if we use results from dask.delayed, pickle takes up most of the time: ```python @dask.delayed def make_data(): return np.array(np.random.randn())

vals = da.stack([da.from_delayed(make_data(), (), np.float64) for _ in range(500)]) ds = xr.Dataset({'vals': (['a'], vals)}) write = ds.to_netcdf('file5.nc', compute=False) %prun -stime -l10 write.compute() ```

``` 115045243 function calls (104115443 primitive calls) in 67.240 seconds

Ordered by: internal time List reduced from 292 to 10 due to restriction <10>

ncalls tottime percall cumtime percall filename:lineno(function) 8120705/501 17.597 0.000 59.036 0.118 pickle.py:457(save) 2519027/501 7.581 0.000 59.032 0.118 pickle.py:723(save_tuple) 4 6.978 1.745 6.978 1.745 {method 'acquire' of '_thread.lock' objects} 3082150 5.362 0.000 8.748 0.000 pickle.py:413(memoize) 11474396 4.516 0.000 5.970 0.000 pickle.py:213(write) 8121206 4.186 0.000 5.202 0.000 pickle.py:200(commit_frame) 13747943 2.703 0.000 2.703 0.000 {method 'get' of 'dict' objects} 17057538 1.887 0.000 1.887 0.000 {built-in method builtins.id} 4568116 1.772 0.000 1.782 0.000 {built-in method _struct.pack} 2762513 1.613 0.000 2.826 0.000 pickle.py:448(get) ```

This additional pickle overhead does not happen if we compute the dataset without writing it to a file.

Output of `%prun -stime -l10 ds.compute()` without `dask.delayed`: ``` 83856 function calls (73348 primitive calls) in 0.566 seconds Ordered by: internal time List reduced from 259 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 4 0.441 0.110 0.441 0.110 {method 'acquire' of '_thread.lock' objects} 502 0.013 0.000 0.013 0.000 {method 'send' of '_socket.socket' objects} 500 0.011 0.000 0.011 0.000 {built-in method _pickle.dumps} 1000 0.007 0.000 0.008 0.000 core.py:159(get_dependencies) 3500 0.007 0.000 0.007 0.000 utils.py:767(tokey) 3000/500 0.006 0.000 0.010 0.000 utils.py:803(convert) 500 0.005 0.000 0.019 0.000 pickle.py:30(dumps) 1 0.004 0.004 0.008 0.008 core.py:3826(concatenate3) 4500/500 0.004 0.000 0.008 0.000 utils_comm.py:144(unpack_remotedata) 1 0.004 0.004 0.017 0.017 order.py:83(order) ``` With `dask.delayed`: ``` 149376 function calls (139868 primitive calls) in 1.738 seconds Ordered by: internal time List reduced from 264 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 4 1.568 0.392 1.568 0.392 {method 'acquire' of '_thread.lock' objects} 1 0.015 0.015 0.038 0.038 optimization.py:455(fuse) 502 0.012 0.000 0.012 0.000 {method 'send' of '_socket.socket' objects} 6500 0.010 0.000 0.010 0.000 utils.py:767(tokey) 5500/1000 0.009 0.000 0.012 0.000 utils_comm.py:144(unpack_remotedata) 2500 0.008 0.000 0.009 0.000 core.py:159(get_dependencies) 500 0.007 0.000 0.009 0.000 client.py:142(__init__) 1000 0.005 0.000 0.008 0.000 core.py:280(subs) 2000/1000 0.005 0.000 0.008 0.000 utils.py:803(convert) 1 0.004 0.004 0.022 0.022 order.py:83(order) ```

I am using dask.distributed. I haven't tested it with anything else.

Software versions

``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_GB.UTF-8 LANG: None LOCALE: en_GB.UTF-8 xarray: 0.10.8 pandas: 0.23.4 numpy: 1.15.1 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: 0.6.2 h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.18.2 distributed: 1.22.1 matplotlib: 2.2.2 cartopy: None seaborn: 0.9.0 setuptools: 40.2.0 pip: 18.0 conda: 4.5.11 pytest: 3.7.3 IPython: 6.5.0 sphinx: 1.7.7 ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2389/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
342426261 MDU6SXNzdWUzNDI0MjYyNjE= 2299 Confusing behaviour with MultiIndex aseyboldt 1882397 closed 0 fujiisoup 6815844   1 2018-07-18T17:41:12Z 2018-08-13T22:16:31Z 2018-08-13T22:16:31Z NONE      

Dataset allows assignment of new variables with dimension names that are used in a MultiIndex, even if the lengths do not match the existing coordinate.

```python a = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).unstack('a') a.index.names = ['dim0', 'dim1'] a.index.name = 'stacked_dim'

b = xr.Dataset(coords={'dim0': ['a', 'b'], 'dim1': [0, 1]}) b = b.stack(dim_stacked=['dim0', 'dim1']) assert(len(b.dim0) == 4)

This should raise an errors because the length is != 4

b['c'] = (('dim0',), [10, 11]) b Instead, it reports `dim0` as a new dimension without coordinates: <xarray.Dataset> Dimensions: (dim0: 2, dim_stacked: 4) Coordinates: * dim_stacked (dim_stacked) MultiIndex - dim0 (dim_stacked) object 'a' 'a' 'b' 'b' - dim1 (dim_stacked) int64 0 1 0 1 Dimensions without coordinates: dim0 Data variables: c (dim0) int64 10 11 ```

Similar cases of coordinates that aren't used do raise an error: python ds = xr.Dataset() ds.coords['a'] = [1, 2, 3] ds = ds.sel(a=1) ds['b'] = (('a',), [1, 2]) ds

Output of xr.show_versions()

``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_GB.UTF-8 LANG: None LOCALE: en_GB.UTF-8 xarray: 0.10.7 pandas: 0.23.2 numpy: 1.14.5 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: None h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.18.1 distributed: 1.22.0 matplotlib: 2.2.2 cartopy: None seaborn: 0.8.1 setuptools: 39.2.0 pip: 10.0.1 conda: 4.5.8 pytest: 3.6.2 IPython: 6.4.0 sphinx: 1.7.5 ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2299/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 138.813ms · About: xarray-datasette