home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

4 rows where type = "issue" and user = 1797906 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 3
  • open 1

type 1

  • issue · 4 ✖

repo 1

  • xarray 4
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1364911775 I_kwDOAMm_X85RWuaf 7005 Cannot re-index or align objects with conflicting indexes jamesstidard 1797906 open 0     2 2022-09-07T16:22:46Z 2022-09-09T16:04:05Z   NONE      

What happened?

I'm looking to rename the values of indices of an existing dataset, for both regular and multi-index. i.e. you might start with a dataset with and index [1,2,3] and you want to rename those to ["foo", "bar", "bar"].

I appear to be able to rename a couple using the method I've written, though renaming a second multi-index in the same xr.Dataset causes a ValueError I am struggling to interpret.

cannot re-index or align objects with conflicting indexes found for the following dimensions: 'x' (2 conflicting indexes) Conflicting indexes may occur when - they relate to different sets of coordinate and/or dimension names - they don't have the same type - they may be used to reindex data along common dimensions

What did you expect to happen?

I start with the xr.Datset:

xarray.DataArray (x: 6, y: 6, z: 3)> array(...) Coordinates: * x (x) object MultiIndex * x_one (x) object 'a' 'a' 'b' 'b' 'c' 'c' * x_two (x) int64 0 1 0 1 0 1 * y (y) object MultiIndex * y_one (y) object 'a' 'a' 'b' 'b' 'c' 'c' * y_two (y) int64 0 1 0 1 0 1 * z (z) int64 0 1 2

And remap the z, x_one, and y_one values to:

xarray.DataArray (x: 6, y: 6, z: 3)> array(...) Coordinates: * x (x) object MultiIndex * x_one (x) object 'aa' 'aa' 'bb' 'bb' 'cc' 'cc' * x_two (x) int64 0 1 0 1 0 1 * y (y) object MultiIndex * y_one (y) object 'aa' 'aa' 'bb' 'bb' 'cc' 'cc' * y_two (y) int64 0 1 0 1 0 1 * z (z) <U4 'zero' 'one' 'two'

Minimal Complete Verifiable Example

```Python import numpy as np import pandas as pd import xarray as xr

def map_coords(ds, *, name, mapping): """ Takes a xarray dataset's coordinate values and updates them with the given the provided mapping.

Can handle both regular indices and multi-level indices.

ds: the datasets
name: name of the coordinate to update
mapping: dictionary, key of old value, value of new value.
"""
coord = ds.coords[name]
old_values = coord.values.tolist()
new_values = [mapping[v] for v in old_values]
ds.coords[name] = xr.DataArray(new_values, coords=coord.coords)
ds.coords[name].attrs = dict(coord.attrs)

midx = pd.MultiIndex.from_product([list("abc"), [0, 1]], names=("x_one", "x_two")) midy = pd.MultiIndex.from_product([list("abc"), [0, 1]], names=("y_one", "y_two"))

mda = xr.DataArray(np.random.rand(6, 6, 3), [("x", midx), ("y", midy), ("z", range(3))])

map_coords(mda, name="z", mapping={0: "zero", 1: "one", 2: "two"}). # success map_coords(mda, name="x_one", mapping={"a": "aa", "b": "bb", "c": "cc"}) # success map_coords(mda, name="y_one", mapping={"a": "aa", "b": "bb", "c": "cc"}) # ValueError ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Python Traceback (most recent call last): File "./main.py", line 30, in <module> map_coords(mda, name="y_one", mapping={"a": "aa", "b": "bb", "c": "cc"}) File "./main.py", line 20, in map_coords ds.coords[name] = xr.DataArray(new_values, coords=coord.coords) File "./.venv/lib/python3.10/site-packages/xarray/core/coordinates.py", line 32, in __setitem__ self.update({key: value}) File "./.venv/lib/python3.10/site-packages/xarray/core/coordinates.py", line 162, in update coords, indexes = merge_coords( File "./.venv/lib/python3.10/site-packages/xarray/core/merge.py", line 561, in merge_coords aligned = deep_align( File "./.venv/lib/python3.10/site-packages/xarray/core/alignment.py", line 827, in deep_align aligned = align( File "./.venv/lib/python3.10/site-packages/xarray/core/alignment.py", line 764, in align aligner.align() File "./.venv/lib/python3.10/site-packages/xarray/core/alignment.py", line 550, in align self.assert_no_index_conflict() File "./.venv/lib/python3.10/site-packages/xarray/core/alignment.py", line 319, in assert_no_index_conflict raise ValueError( ValueError: cannot re-index or align objects with conflicting indexes found for the following dimensions: 'x' (2 conflicting indexes) Conflicting indexes may occur when - they relate to different sets of coordinate and/or dimension names - they don't have the same type - they may be used to reindex data along common dimensions

Anything else we need to know?

I may also not be doing this remapping in the best way, this has been the easiest way I've found to do it. So maybe part of the problem is that, so open to alternative methods as well.

Thanks.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.4 (main, Mar 28 2022, 15:33:01) [Clang 13.1.6 (clang-1316.0.21.2)] python-bits: 64 OS: Darwin OS-release: 21.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2022.6.0 pandas: 1.4.4 numpy: 1.23.2 scipy: 1.9.1 netCDF4: None pydap: None h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 63.4.3 pip: 22.2.2 conda: None pytest: None IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7005/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
291332965 MDU6SXNzdWUyOTEzMzI5NjU= 1854 Drop coordinates on loading large dataset. jamesstidard 1797906 closed 0     22 2018-01-24T19:35:46Z 2020-02-15T14:49:53Z 2020-02-15T14:49:53Z NONE      

I've been struggling for quite a while to load a large dataset so I thought it best ask as I think I'm missing a trick. I've also looked through the issues but, even though there are a fair few questions that seemed promising.

I have a number of *.nc files with variables across the coordinates latitude, longitude and time. Each file has the data for all the latitude and longitudes of the world and then some period of time - about two months.

The goal is to go through that data and get all the history of a single latitude/longitude coordinate - instead of the data for all latitude and longitude for small periods.

This is my current few lines of script:

python ds = xr.open_mfdataset('path/to/ncs/*.nc', chunks={'time': 127}) # 127 is normally the size of the time dimension in each file recs = ds.sel(latitude=10, longitude=10).to_dataframe().to_records() np.savez('location.npz', recs)

However, this blows out the memory on my machine on the open_mfdataset call when I use the full dataset. I've tried a bunch of different ways of chunking the data (like: 'latitude': 1, 'longitude': 1) but not been able to get past this stage.

I was wondering if there's a way to either determine a good chunk size or maybe tell the open_mfdataset to only keep values from the lat/lng coordinates I care about (coords kwarg looked like it could've been it) .

I'm using version 0.10.0 of xarray

Would very much appreciate any help.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1854/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
257400162 MDU6SXNzdWUyNTc0MDAxNjI= 1572 Modifying data set resulting in much larger file size jamesstidard 1797906 closed 0     7 2017-09-13T14:24:06Z 2017-09-18T08:59:24Z 2017-09-13T17:12:28Z NONE      

I'm loading a 130MB nc file and applying a where mask to it to remove a significant amount of the floating points - replacing them with nan. However, when I go to save this file it has increased to over 500MB. If I load the original data set and instantly save it the file stays roughly the same size.

Here's how I'm applying the mask:

```python import os import xarray as xr

fp = 'ERA20c/swh_2010_01_05_05.nc' ds = xr.open_dataset(fp)

ds = ds.where(ds.latitude > 50)

head, ext = os.path.splitext(fp) xr.open_dataset(fp).to_netcdf('{}-duplicate{}'.format(head, ext)) ds.to_netcdf('{}-masked{}'.format(head, ext)) ```

Is there a way to reduce this file size of the masked dataset? I'd expect it to be roughly the same size or smaller.

Thanks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1572/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
255997962 MDU6SXNzdWUyNTU5OTc5NjI= 1561 exit code 137 when using xarray.open_mfdataset jamesstidard 1797906 closed 0     3 2017-09-07T16:31:50Z 2017-09-13T14:16:07Z 2017-09-13T14:16:06Z NONE      

While using the xarray.open_mfdataset I get a exit code 137 SIGKILL 9 killing my process. I do not get this while using a subset of the data though. I'm also providing a chunks argument.

Does anyone know what might be causing this? Could it be the computer is completely running out of memory (RAM + SWAP + HDD)? Unsure what's causing this as I get no stack trace just the SIGKILL.

Thanks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1561/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 178.543ms · About: xarray-datasette