home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

3 rows where user = 34276374 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 2
  • open 1

type 1

  • issue 3

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1197117301 I_kwDOAMm_X85HWo91 6456 Writing a a dataset to .zarr in a loop makes all the data NaNs tbloch1 34276374 closed 0     11 2022-04-08T10:05:25Z 2023-10-14T20:30:49Z 2023-10-14T20:30:48Z NONE      

What happened?

I have lots (61) pickled pandas dataframes that I'm trying to convert from pickle/pandas to zarr/xarray. Since the dataframes are large (10000x2048) I can't load them all into memory. To get around this I'm (MCVE below) looping through the pickle files, reading them into dataframes, constructing DataArrays and then Datasets from the data, concatinating the dataset with the previous dataset and updating the dataset to point to this new concatenated dataset.

Since I didn't want to use up too much memory, I'm also periodically writing the Dataset to .zarr in the loop and reopening it (hoping to make use of dask storing data on disk?).

When I do this however, the final dataset ends up being all NaNs.

What did you expect to happen?

I expected the final dataset to contain all the concatenated data.

Minimal Complete Verifiable Example

```Python import pandas as pd import numpy as np import glob import xarray as xr from tqdm import tqdm

Creating pkl files

[pd.DataFrame(np.random.randint(0,10, (1000,500))).astype(object).to_pickle('df{}.pkl'.format(i)) for i in range(4)]

fnames = glob.glob('*.pkl')

df = pd.read_pickle(fnames[0]) df.columns = np.arange(0,500).astype(object) # the real pkl files contain all objects df.index = np.arange(0,1000).astype(object) df = df.astype(np.float32)

ds = xr.DataArray(df.values, dims=['fname', 'res_dim'], coords={'fname': df.index.values, 'res_dim': df.columns.values}) ds = ds.to_dataset(name='low_dim')

for idx, fname in enumerate(tqdm(fnames[1:])): df = pd.read_pickle(fname) df.columns = np.arange(0,500).astype(object) df.index = np.arange(0,1000).astype(object) df = df.astype(np.float32)

ds2 = xr.DataArray(df.values, dims=['fname', 'res_dim'],
              coords={'fname': df.index.values, 'res_dim': df.columns.values})
ds2 = ds2.to_dataset(name='low_dim')

ds = xr.concat([ds, ds2], dim='fname')
ds['fname'] = ds.fname.astype(str)
if (idx%2 == 0) & (idx !=0):
    ds.to_zarr('zarr_bug.zarr', mode='w')
    ds = xr.open_zarr('zarr_bug.zarr')

ds.to_zarr('zarr_bug.zarr', mode='w') ds = xr.open_zarr('zarr_bug.zarr')

print(ds.low_dim.values) ```

Relevant log output

Python [[nan nan nan ... nan nan nan] [nan nan nan ... nan nan nan] [nan nan nan ... nan nan nan] ... [nan nan nan ... nan nan nan] [nan nan nan ... nan nan nan] [nan nan nan ... nan nan nan]]

Anything else we need to know?

If I get rid of the loop saving, everything works normally.

Environment

INSTALLED VERSIONS

commit: None python: 3.9.11 (main, Mar 28 2022, 10:10:35) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.11.0-27-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4

xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.21.0 scipy: 1.8.0 netCDF4: 1.5.8 pydap: installed h5netcdf: 1.0.0 h5py: 3.6.0 Nio: None zarr: 2.11.1 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: 0.9.10.1 iris: None bottleneck: None dask: 2022.03.0 distributed: 2022.3.0 matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.02.0 cupy: None pint: None sparse: None setuptools: 58.0.4 pip: 21.2.4 conda: None pytest: None IPython: 8.1.1 sphinx: None

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6456/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  not_planned xarray 13221727 issue
1674532233 I_kwDOAMm_X85jz1WJ 7767 Inconsistency between xr.where() and da.where() tbloch1 34276374 closed 0     6 2023-04-19T09:30:02Z 2023-09-20T19:25:58Z 2023-09-20T19:25:58Z NONE      

What is your issue?

xr.where() and da.where() behave in seemingly opposite ways.

Example:

python da = xr.DataArray(np.arange(10) print(xr.where(da < 5, 0, da).values) print(da.where(da < 5, 0).values) [0 0 0 0 0 5 6 7 8 9] [0 1 2 3 4 0 0 0 0 0]

It seems like these two methods with the same name should have the same functionality, but they give inverse results.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7767/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1318369110 I_kwDOAMm_X85OlLdW 6828 xarray.DataArray.str.cat() doesn't work on chunked data tbloch1 34276374 open 0     3 2022-07-26T14:58:16Z 2023-01-17T18:36:14Z   NONE      

What happened?

I was trying to concatenate some DataArrays of strings, and it kept just returning the first DataArray without any changes.

What did you expect to happen?

I was expecting it to just provide the strings, concatenated together with the spearator between them.

Minimal Complete Verifiable Example

```Python da = xr.DataArray( np.zeros((2, 2)).astype(str), coords={'x':np.arange(2), 'y': np.arange(2)}, dims=['x', 'y'])

dac = da.chunk()

print((da == dac).values.all()) print((da.str.cat(da, sep='--') == dac.str.cat(dac, sep='--')).values.all()) print((da.str.cat(da, sep='--') == dac.compute().str.cat(dac.compute(), sep='--')).values.all())

True False True ```

MVCE confirmation

  • [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [x] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.11.0-27-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.0 xarray: 2022.3.0 pandas: 1.4.2 numpy: 1.22.4 scipy: 1.8.1 netCDF4: 1.6.0 pydap: None h5netcdf: 1.0.1 h5py: 3.6.0 Nio: None zarr: 2.11.3 cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: None iris: None bottleneck: None dask: 2022.7.0 distributed: None matplotlib: 3.5.2 cartopy: None seaborn: None numbagg: None fsspec: 2022.5.0 cupy: None pint: None sparse: None setuptools: 62.3.2 pip: 22.1.2 conda: None pytest: None IPython: 8.3.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6828/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 20.284ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows