github: issues: 4 rows where user = 1882397 sorted by updated

4 rows where user = 1882397 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	assignee	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
279456192	MDU6SXNzdWUyNzk0NTYxOTI=	1761	Importing xarray fails if old version of bottleneck is installed	aseyboldt 1882397	closed		5	2017-12-05T17:10:25Z	2020-02-09T21:39:48Z	2020-02-09T21:39:48Z	NONE	Importing version 0.11 of xarray fails if version 1.0.0 of Bottleneck is installed. Bottleneck seems to be an optional dependency of xarray. During runtime xarray replaces functions by their bottleneck versions if that is installed, but it does not check if the version of bottleneck that is installed is new enough to provide that function: The `getattr` here fails with an AttributeError in this case: https://github.com/pydata/xarray/blob/b46fcd656391d786b8d25b0615f6d4bd30b524b7/xarray/core/ops.py#L361-L365 `AttributeError: 'module' object has no attribute 'move_argmax'` `move_argmax` was included into bottleneck in version 1.1.0, so if version 1.0 is installed this can't work. I saw this on python2.7, but I don't think that should matter...	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1761/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
372006204	MDU6SXNzdWUzNzIwMDYyMDQ=	2496	Incorrect conversion from sliced pd.MultiIndex	aseyboldt 1882397	closed		2	2018-10-19T15:25:38Z	2019-02-19T09:42:52Z	2019-02-19T09:42:51Z	NONE	If we convert a pandas dataframe with a multiindex, slice it to remove some entries from the index, a converted DataArray still contains the removed items in the coordinates (although the values are NaN). ```python We create an example dataframe idx = pd.MultiIndex.from_product([list('abc'), list('xyz')]) df = pd.DataFrame(data={'col': np.random.randn(len(idx))}, index=idx) df.columns.name = 'cols' df.index.names = ['idx1', 'idx2'] df2 = df.loc[['a', 'b']] python df2 does not contain `c` in the first level df2 cols col idx1 idx2 a x -0.844476 y -0.845998 z 1.965143 b x -0.159293 y 0.188163 z -1.076204 It still shows up in the converted xarray though: xr.DataArray(df2).unstack('dim_0') <xarray.DataArray (cols: 1, idx1: 3, idx2: 3)> array([[[-0.844476, -0.845998, 1.965143], [-0.159293, 0.188163, -1.076204], [ nan, nan, nan]]]) Coordinates: * cols (cols) object 'col' * idx1 (idx1) object 'a' 'b' 'c' * idx2 (idx2) object 'x' 'y' 'z' ``` If the original dataframe is very sparse, this can lead to gigantic unnecessary memory usage. #### Output of ``xr.show_versions()`` ``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_GB.UTF-8 LANG: None LOCALE: en_GB.UTF-8 xarray: 0.10.9 pandas: 0.23.4 numpy: 1.15.2 scipy: 1.1.0 netCDF4: 1.4.1 h5netcdf: 0.6.2 h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.0b1 PseudonetCDF: None rasterio: None iris: None bottleneck: 1.2.1 cyordereddict: None dask: 0.19.2 distributed: 1.23.2 matplotlib: 3.0.0 cartopy: None seaborn: 0.9.0 setuptools: 40.4.3 pip: 18.0 conda: 4.5.11 pytest: 3.8.1 IPython: 7.0.1 sphinx: 1.8.1 ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2496/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
355264812	MDU6SXNzdWUzNTUyNjQ4MTI=	2389	Large pickle overhead in ds.to_netcdf() involving dask.delayed functions	aseyboldt 1882397	closed		11	2018-08-29T17:43:28Z	2019-01-13T21:17:12Z	2019-01-13T21:17:12Z	NONE	If we write a dask array that doesn't involve `dask.delayed` functions using `ds.to_netcdf`, there is only little overhead from pickle: `python vals = da.random.random(500, chunks=(1,)) ds = xr.Dataset({'vals': (['a'], vals)}) write = ds.to_netcdf('file2.nc', compute=False) %prun -stime -l10 write.compute()` ``` 123410 function calls (104395 primitive calls) in 13.720 seconds Ordered by: internal time List reduced from 203 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 8 10.032 1.254 10.032 1.254 {method 'acquire' of '_thread.lock' objects} 1001 2.939 0.003 2.950 0.003 {built-in method _pickle.dumps} 1001 0.614 0.001 3.569 0.004 pickle.py:30(dumps) 6504/1002 0.012 0.000 0.021 0.000 utils.py:803(convert) 11507/1002 0.010 0.000 0.019 0.000 utils_comm.py:144(unpack_remotedata) 6013 0.009 0.000 0.009 0.000 utils.py:767(tokey) 3002/1002 0.008 0.000 0.017 0.000 utils_comm.py:181(<listcomp>) 11512 0.007 0.000 0.008 0.000 core.py:26(istask) 1002 0.006 0.000 3.589 0.004 worker.py:788(dumps_task) 1 0.005 0.005 0.007 0.007 core.py:273(<dictcomp>) ``` But if we use results from `dask.delayed`, pickle takes up most of the time: ```python @dask.delayed def make_data(): return np.array(np.random.randn()) vals = da.stack([da.from_delayed(make_data(), (), np.float64) for _ in range(500)]) ds = xr.Dataset({'vals': (['a'], vals)}) write = ds.to_netcdf('file5.nc', compute=False) %prun -stime -l10 write.compute() ``` ``` 115045243 function calls (104115443 primitive calls) in 67.240 seconds Ordered by: internal time List reduced from 292 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 8120705/501 17.597 0.000 59.036 0.118 pickle.py:457(save) 2519027/501 7.581 0.000 59.032 0.118 pickle.py:723(save_tuple) 4 6.978 1.745 6.978 1.745 {method 'acquire' of '_thread.lock' objects} 3082150 5.362 0.000 8.748 0.000 pickle.py:413(memoize) 11474396 4.516 0.000 5.970 0.000 pickle.py:213(write) 8121206 4.186 0.000 5.202 0.000 pickle.py:200(commit_frame) 13747943 2.703 0.000 2.703 0.000 {method 'get' of 'dict' objects} 17057538 1.887 0.000 1.887 0.000 {built-in method builtins.id} 4568116 1.772 0.000 1.782 0.000 {built-in method _struct.pack} 2762513 1.613 0.000 2.826 0.000 pickle.py:448(get) ``` This additional pickle overhead does not happen if we compute the dataset without writing it to a file. Output of `%prun -stime -l10 ds.compute()` without `dask.delayed`: ``` 83856 function calls (73348 primitive calls) in 0.566 seconds Ordered by: internal time List reduced from 259 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 4 0.441 0.110 0.441 0.110 {method 'acquire' of '_thread.lock' objects} 502 0.013 0.000 0.013 0.000 {method 'send' of '_socket.socket' objects} 500 0.011 0.000 0.011 0.000 {built-in method _pickle.dumps} 1000 0.007 0.000 0.008 0.000 core.py:159(get_dependencies) 3500 0.007 0.000 0.007 0.000 utils.py:767(tokey) 3000/500 0.006 0.000 0.010 0.000 utils.py:803(convert) 500 0.005 0.000 0.019 0.000 pickle.py:30(dumps) 1 0.004 0.004 0.008 0.008 core.py:3826(concatenate3) 4500/500 0.004 0.000 0.008 0.000 utils_comm.py:144(unpack_remotedata) 1 0.004 0.004 0.017 0.017 order.py:83(order) ``` With `dask.delayed`: ``` 149376 function calls (139868 primitive calls) in 1.738 seconds Ordered by: internal time List reduced from 264 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 4 1.568 0.392 1.568 0.392 {method 'acquire' of '_thread.lock' objects} 1 0.015 0.015 0.038 0.038 optimization.py:455(fuse) 502 0.012 0.000 0.012 0.000 {method 'send' of '_socket.socket' objects} 6500 0.010 0.000 0.010 0.000 utils.py:767(tokey) 5500/1000 0.009 0.000 0.012 0.000 utils_comm.py:144(unpack_remotedata) 2500 0.008 0.000 0.009 0.000 core.py:159(get_dependencies) 500 0.007 0.000 0.009 0.000 client.py:142(__init__) 1000 0.005 0.000 0.008 0.000 core.py:280(subs) 2000/1000 0.005 0.000 0.008 0.000 utils.py:803(convert) 1 0.004 0.004 0.022 0.022 order.py:83(order) ``` I am using `dask.distributed`. I haven't tested it with anything else. Software versions ``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_GB.UTF-8 LANG: None LOCALE: en_GB.UTF-8 xarray: 0.10.8 pandas: 0.23.4 numpy: 1.15.1 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: 0.6.2 h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.18.2 distributed: 1.22.1 matplotlib: 2.2.2 cartopy: None seaborn: 0.9.0 setuptools: 40.2.0 pip: 18.0 conda: 4.5.11 pytest: 3.7.3 IPython: 6.5.0 sphinx: 1.7.7 ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2389/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
342426261	MDU6SXNzdWUzNDI0MjYyNjE=	2299	Confusing behaviour with MultiIndex	aseyboldt 1882397	closed	fujiisoup 6815844	1	2018-07-18T17:41:12Z	2018-08-13T22:16:31Z	2018-08-13T22:16:31Z	NONE	`Dataset` allows assignment of new variables with dimension names that are used in a MultiIndex, even if the lengths do not match the existing coordinate. ```python a = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).unstack('a') a.index.names = ['dim0', 'dim1'] a.index.name = 'stacked_dim' b = xr.Dataset(coords={'dim0': ['a', 'b'], 'dim1': [0, 1]}) b = b.stack(dim_stacked=['dim0', 'dim1']) assert(len(b.dim0) == 4) This should raise an errors because the length is != 4 b['c'] = (('dim0',), [10, 11]) b Instead, it reports `dim0` as a new dimension without coordinates: <xarray.Dataset> Dimensions: (dim0: 2, dim_stacked: 4) Coordinates: * dim_stacked (dim_stacked) MultiIndex - dim0 (dim_stacked) object 'a' 'a' 'b' 'b' - dim1 (dim_stacked) int64 0 1 0 1 Dimensions without coordinates: dim0 Data variables: c (dim0) int64 10 11 ``` Similar cases of coordinates that aren't used do raise an error: `python ds = xr.Dataset() ds.coords['a'] = [1, 2, 3] ds = ds.sel(a=1) ds['b'] = (('a',), [1, 2]) ds` Output of `xr.show_versions()` ``` INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_GB.UTF-8 LANG: None LOCALE: en_GB.UTF-8 xarray: 0.10.7 pandas: 0.23.2 numpy: 1.14.5 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: None h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.18.1 distributed: 1.22.0 matplotlib: 2.2.2 cartopy: None seaborn: 0.8.1 setuptools: 39.2.0 pip: 10.0.1 conda: 4.5.8 pytest: 3.6.2 IPython: 6.4.0 sphinx: 1.7.5 ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2299/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

4 rows where user = 1882397 sorted by updated_at descending

We create an example dataframe

df2 does not contain `c` in the first level

It still shows up in the converted xarray though:

Software versions

This should raise an errors because the length is != 4

Output of `xr.show_versions()`

Advanced export

issues

4 rows where user = 1882397 sorted by updated_at descending

We create an example dataframe

df2 does not contain c in the first level

It still shows up in the converted xarray though:

Software versions

This should raise an errors because the length is != 4

Output of xr.show_versions()

Advanced export

df2 does not contain `c` in the first level

Output of `xr.show_versions()`