home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

6 rows where repo = 13221727 and "updated_at" is on date 2022-05-01 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

These facets timed out: state, repo, type

id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1222215528 I_kwDOAMm_X85I2Ydo 6555 sortby with ascending=False should create an index headtr1ck 43316012 closed 0     4 2022-05-01T16:57:51Z 2022-05-01T22:17:50Z 2022-05-01T22:17:50Z COLLABORATOR      

Is your feature request related to a problem?

When using sortby with ascending=False on a DataArray/Dataset without an explicit index, the data gets correctly reversed, but it is not possible to tell anymore which ordering the data has.

If an explicit index (like [0, 1, 2]) exists, it gets correctly reordered and allowes correct aligning.

Describe the solution you'd like

For consistency with aligning xarray should create a new index that indicates that the data has been reordered, i.e. [2, 1, 0].

Only downside: this will break code that relies on non-existent indexes.

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6555/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1216982208 PR_kwDOAMm_X8422vsw 6522 Update issue template to include a checklist max-sixty 5635139 closed 0     4 2022-04-27T08:19:49Z 2022-05-01T22:14:35Z 2022-05-01T22:14:32Z MEMBER   0 pydata/xarray/pulls/6522

This replaces https://github.com/pydata/xarray/pull/5787.

Please check out the previews in the most recent comment there

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6522/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
376370028 MDU6SXNzdWUzNzYzNzAwMjg= 2534 to_dataframe() excessive memory usage guygriffiths 1665346 closed 0     3 2018-11-01T12:20:39Z 2022-05-01T22:04:51Z 2022-05-01T22:04:43Z NONE      

Code Sample, a copy-pastable example if possible

```python import xarray as xr from glob import glob

This refers to a large multi-file NetCDF dataset

file_list = sorted(glob('~/Data///*.nc'))

dataset = xr.open_mfdataset(file_list, decode_times=True, autoclose=True, decode_cf=True, cache=False, concat_dim='time')

At this point, the total RAM used by the python process is ~1.4G

Select a timeseries at a single point

This is near instantaneous and uses no additional memory

ts = dataset.sel({'lat': 10, 'lon': 10}, method='nearest')

Convert that timeseries to a pandas dataframe.

This is where the actual data reading happens, and reads the data into memory

df = ts.to_dataframe()

At this point, the total RAM used by the python process is ~10.5G

```

Problem description

Despite the fact that the resulting dataframe only has a single lat/lon point's worth of data, a huge amount of RAM is used. I can get (what appears to be) an identical pandas DataFrame by changing the final line to:

python df = (ts * 1.0).to_dataframe() which reduces the total RAM to ~2.2G (i.e. 0.6G additional RAM for that single line vs 9G additional RAM). No type conversion is taking place (i.e. ts and ts * 1.0 both have identical data types)

Expected Output

I would expect that to_dataframe() would require the same amount of memory whether or not it was multiplied by 1.0. I'm aware there could be a good reason for this, but it took me by surprise somewhat.

Output of xr.show_versions()

commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-36-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 xarray: 0.10.7 pandas: 0.23.1 numpy: 1.13.3 scipy: 0.17.0 netCDF4: 1.4.1 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: 0.19.0 distributed: None matplotlib: 1.5.1 cartopy: None seaborn: 0.8.1 setuptools: 20.7.0 pip: 18.0 conda: None pytest: None IPython: 2.4.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2534/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
822987300 MDU6SXNzdWU4MjI5ODczMDA= 5001 .min() doesn't work on np.datetime64 with a chunked Dataset ludwigVonKoopa 49512274 open 0     2 2021-03-05T11:12:19Z 2022-05-01T16:11:48Z   NONE      

Hi all,

if a xr.Dataset is chunked, i cannot do ds.time.min(), i get an error : ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]'). I don't know if it is expected ? Moreover, ds2.time.mean() works

Thanks

What happened:

raised an UFuncTypeError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')

What you expected to happen:

compute the min & max on a chunked datetime64 xarray.DataArray

Minimal Complete Verifiable Example:

```python import xarray as xr import numpy as np

obs=200 t0 = np.datetime64("2010-01-01T00:00:00") tn = t0 + np.timedelta64(123*4, "D")

ds2 = xr.Dataset( { "time": (["obs"], np.arange(t0, tn, (tn-t0)/obs)), }, coords={ "obs": (["obs"], np.arange(obs)), }, ).chunk({"obs": 100})

ds2.time.min() ```

Anything else we need to know?:

ds2.time.mean() works, max & min raise Exception

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-133-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.6.1 cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.01.1 distributed: 2021.01.1 matplotlib: 3.3.4 cartopy: None seaborn: None numbagg: None pint: 0.16.1 setuptools: 52.0.0.post20210125 pip: 20.3.3 conda: None pytest: 6.2.2 IPython: 7.20.0 sphinx: 3.5.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5001/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1221885425 I_kwDOAMm_X85I1H3x 6549 Improved Dataset broadcasting headtr1ck 43316012 open 0     3 2022-04-30T17:51:37Z 2022-05-01T14:37:43Z   COLLABORATOR      

Is your feature request related to a problem?

I am a bit puzzled about how xarrays is broadcasting Datasets. It seems to always add all dimensions to all variables. Is this what you want in general?

See this example: ```python import xarray as xr

da = xr.DataArray([[1, 2, 3]], dims=("x", "y"))

<xarray.DataArray (x: 1, y: 3)>

array([[1, 2, 3]])

ds = xr.Dataset({"a": ("x", [1]), "b": ("z", [2, 3])})

<xarray.Dataset>

Dimensions: (x: 1, z: 2)

Dimensions without coordinates: x, z

Data variables:

a (x) int32 1

b (z) int32 2 3

ds.broadcast_like(da)

returns:

<xarray.Dataset>

Dimensions: (x: 1, y: 3, z: 2)

Dimensions without coordinates: x, y, z

Data variables:

a (x, y, z) int32 1 1 1 1 1 1

b (x, y, z) int32 2 3 2 3 2 3

I think it should return:

<xarray.Dataset>

Dimensions: (x: 1, y: 3, z: 2)

Dimensions without coordinates: x, y, z

Data variables:

a (x, y) int32 1 1 1 # notice here without "z" dim

b (x, y, z) int32 2 3 2 3 2 3

```

Describe the solution you'd like

I would like broadcasting to behave the same way as e.g. a simple addition. In the upper example da + ds produces the dimensions that I want.

Describe alternatives you've considered

ds + xr.zeros_like(da) this works, but seems more like a "dirty hack".

Additional context

Maybe one can add an option to broadcasting that controls this behavior?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6549/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
264321376 MDU6SXNzdWUyNjQzMjEzNzY= 1621 Undesired decoding to timedelta64 (was: units of "seconds" translated to time coordinate) pacioos 4701070 open 0     16 2017-10-10T17:58:45Z 2022-05-01T08:49:43Z   NONE      

When using open_dataset( ), it is translating data variables with units of "seconds" to time coordinates. For example, measurements of wave period. I don't believe xarray should treat variables as time coordinates unless their units are of "seconds since...". I have noticed that changing my units to "second" or "sec" or "s" prevents xarray from translating the variable to datetime64 and keeps it float64, as desired. More details and an OPeNDAP example posted on github here: https://stackoverflow.com/questions/46552078/xarray-wave-period-in-seconds-ingested-as-timedelta64

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1621/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 5500.981ms · About: xarray-datasette