github: issues: 2 rows where state = "closed" and user = 34276374 sorted by updated

2 rows where state = "closed" and user = 34276374 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1197117301	I_kwDOAMm_X85HWo91	6456	Writing a a dataset to .zarr in a loop makes all the data NaNs	tbloch1 34276374	closed	0			11	2022-04-08T10:05:25Z	2023-10-14T20:30:49Z	2023-10-14T20:30:48Z	NONE				What happened? I have lots (61) pickled pandas dataframes that I'm trying to convert from pickle/pandas to zarr/xarray. Since the dataframes are large (10000x2048) I can't load them all into memory. To get around this I'm (MCVE below) looping through the pickle files, reading them into dataframes, constructing DataArrays and then Datasets from the data, concatinating the dataset with the previous dataset and updating the dataset to point to this new concatenated dataset. Since I didn't want to use up too much memory, I'm also periodically writing the Dataset to .zarr in the loop and reopening it (hoping to make use of dask storing data on disk?). When I do this however, the final dataset ends up being all NaNs. What did you expect to happen? I expected the final dataset to contain all the concatenated data. Minimal Complete Verifiable Example ```Python import pandas as pd import numpy as np import glob import xarray as xr from tqdm import tqdm Creating pkl files [pd.DataFrame(np.random.randint(0,10, (1000,500))).astype(object).to_pickle('df{}.pkl'.format(i)) for i in range(4)] fnames = glob.glob('*.pkl') df = pd.read_pickle(fnames[0]) df.columns = np.arange(0,500).astype(object) # the real pkl files contain all objects df.index = np.arange(0,1000).astype(object) df = df.astype(np.float32) ds = xr.DataArray(df.values, dims=['fname', 'res_dim'], coords={'fname': df.index.values, 'res_dim': df.columns.values}) ds = ds.to_dataset(name='low_dim') for idx, fname in enumerate(tqdm(fnames[1:])): df = pd.read_pickle(fname) df.columns = np.arange(0,500).astype(object) df.index = np.arange(0,1000).astype(object) df = df.astype(np.float32) `ds2 = xr.DataArray(df.values, dims=['fname', 'res_dim'], coords={'fname': df.index.values, 'res_dim': df.columns.values}) ds2 = ds2.to_dataset(name='low_dim') ds = xr.concat([ds, ds2], dim='fname') ds['fname'] = ds.fname.astype(str) if (idx%2 == 0) & (idx !=0): ds.to_zarr('zarr_bug.zarr', mode='w') ds = xr.open_zarr('zarr_bug.zarr')` ds.to_zarr('zarr_bug.zarr', mode='w') ds = xr.open_zarr('zarr_bug.zarr') print(ds.low_dim.values) ``` Relevant log output `Python [[nan nan nan ... nan nan nan] [nan nan nan ... nan nan nan] [nan nan nan ... nan nan nan] ... [nan nan nan ... nan nan nan] [nan nan nan ... nan nan nan] [nan nan nan ... nan nan nan]]` Anything else we need to know? If I get rid of the loop saving, everything works normally. Environment INSTALLED VERSIONS commit: None python: 3.9.11 (main, Mar 28 2022, 10:10:35) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.11.0-27-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.21.0 scipy: 1.8.0 netCDF4: 1.5.8 pydap: installed h5netcdf: 1.0.0 h5py: 3.6.0 Nio: None zarr: 2.11.1 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: 0.9.10.1 iris: None bottleneck: None dask: 2022.03.0 distributed: 2022.3.0 matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.02.0 cupy: None pint: None sparse: None setuptools: 58.0.4 pip: 21.2.4 conda: None pytest: None IPython: 8.1.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6456/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		not_planned	xarray 13221727	issue
1674532233	I_kwDOAMm_X85jz1WJ	7767	Inconsistency between xr.where() and da.where()	tbloch1 34276374	closed	0			6	2023-04-19T09:30:02Z	2023-09-20T19:25:58Z	2023-09-20T19:25:58Z	NONE				What is your issue? `xr.where()` and `da.where()` behave in seemingly opposite ways. Example: `python da = xr.DataArray(np.arange(10) print(xr.where(da < 5, 0, da).values) print(da.where(da < 5, 0).values)` `[0 0 0 0 0 5 6 7 8 9] [0 1 2 3 4 0 0 0 0 0]` It seems like these two methods with the same name should have the same functionality, but they give inverse results.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7767/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

2 rows where state = "closed" and user = 34276374 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Creating pkl files

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS

What is your issue?

Advanced export