home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

6 rows where user = 2418513 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

state 2

  • open 4
  • closed 2

type 1

  • issue 6

repo 1

  • xarray 6
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
427410885 MDU6SXNzdWU0Mjc0MTA4ODU= 2857 Quadratic slowdown when saving multiple datasets to the same h5 file (h5netcdf) aldanor 2418513 closed 0     24 2019-03-31T15:47:40Z 2022-01-12T07:19:06Z 2022-01-12T07:19:06Z NONE      

I can't quite understand what's wrong with my side of the code, wondering if this kind of slowdown is expected or not?

Basically, what I'm doing is something like this:

python with h5py.File('file.h5', 'w') as f: f.flush() # reset the file for i, ds in enumerate(datasets): ds.to_netcdf('file.h5', group=str(i), engine='h5netcdf', mode='a')

And here's the log for saving 20 datasets, the listed times are for each dataset independently. Instead of the expected 10 sec (which is already kind of slow, but whatever), I get 2 minutes. The time to save each dataset seems to increase linearly, which leads to a quadratic overall slowdown:

``` saving dataset... 00:00:00.559135 saving dataset... 00:00:00.924617 saving dataset... 00:00:01.351670 saving dataset... 00:00:01.818111 saving dataset... 00:00:02.356307 saving dataset... 00:00:02.971077 saving dataset... 00:00:03.685565 saving dataset... 00:00:04.375104 saving dataset... 00:00:04.575837 saving dataset... 00:00:05.179975 saving dataset... 00:00:05.793876 saving dataset... 00:00:06.517916 saving dataset... 00:00:07.190257 saving dataset... 00:00:07.993795 saving dataset... 00:00:08.786421 saving dataset... 00:00:09.414821 saving dataset... 00:00:10.729006 saving dataset... 00:00:11.584044 saving dataset... 00:00:14.160655 saving dataset... 00:00:14.460564

CPU times: user 1min 49s, sys: 12.8 s, total: 2min 2s Wall time: 2min 4s ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2857/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
423749397 MDU6SXNzdWU0MjM3NDkzOTc= 2836 xarray.concat() with compat='identical' fails for DataArray attrs aldanor 2418513 open 0     9 2019-03-21T14:11:29Z 2021-07-08T17:42:52Z   NONE      

Not sure if it was ever supposed to work with numpy arrays, but it actually does :thinking::

```python

attr = np.array([[3, 4]]) d1 = xr.Dataset({'z': 1}, attrs={'y': attr}) d2 = xr.Dataset({'z': 2}, attrs={'y': attr.copy()}) xr.concat([d1, d2], dim='z', compat='identical') ```

However, it fails if you use DataArray attrs:

```python

attr = xr.DataArray([3, 4], {'x': [1, 2]}, 'x') d1 = xr.Dataset({'z': 1}, attrs={'y': attr}) d2 = xr.Dataset({'z': 2}, attrs={'y': attr.copy()}) xr.concat([d1, d2], dim='z', compat='identical') ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ```

Given that the check is simply (a is b) or (a == b), should it try to do something smarter for array-like attrs?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2836/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
423016453 MDU6SXNzdWU0MjMwMTY0NTM= 2824 Dataset.from_records()? aldanor 2418513 open 0     4 2019-03-20T00:46:19Z 2021-05-13T20:20:52Z   NONE      

Currently, to easily create a Dataset from an existing numpy recarray (not a DataArray, which is currently bugged anyway with recarrays due to #1434), I couldn't find an easier way than

python df = xr.Dataset.from_dataframe(pd.DataFrame(my_recarray).set_index('foo'))

(which is kind of dumb since it allocates the memory twice)

It would definitely be nice to be able to do just this (perhaps with extra arguments to set index on the fly etc):

python df = xr.Dataset.from_records(my_recarray, ...)

(Apologies if I'm missing something obvious.)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2824/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
549712566 MDU6SXNzdWU1NDk3MTI1NjY= 3695 mypy --strict fails on scripts/packages depending on xarray; __all__ required aldanor 2418513 closed 0 crusaderky 6213168   3 2020-01-14T17:27:44Z 2020-01-17T20:42:25Z 2020-01-17T20:42:25Z NONE      

Checked this with both 0.14.1 and master branch.

Create foo.py:

python from xarray import DataArray

and run:

sh $ mypy --strict foo.py

which results in

foo.py:1: error: Module 'xarray' has no attribute 'DataArray' Found 1 error in 1 file (checked 1 source file)

I did a bit of digging trying to make it work, it looks like what makes the above script work with mypy is adding

python __all__ = ('DataArray',)

to xarray/__init__.py, otherwise mypy treats those imports as "private" (and is correct in doing so).

Should __all__ be added to the root __init__.py? To any __init__.py in subpackages as well?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3695/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
423774214 MDU6SXNzdWU0MjM3NzQyMTQ= 2837 DataArray plotting: pyplot compat and passing the style aldanor 2418513 open 0     6 2019-03-21T14:57:12Z 2019-04-11T16:25:49Z   NONE      

These are two unrelated issues in one really that I've noticed while trying to plot things directly from DataArray objects.


The following works as expected, by converting DataArray to pandas first)

```python

arr.to_series().plot(style='.-') arr.to_series().plot.line(style='.-') ```

Passing Series to pyplot.plot() directly also works and retains index:

```python

plt.plot(arr.to_series(), '.-') ```

Trying to set style directly when plotting from DataArray doesn't work: ```python

arr.plot(style='.-') AttributeError: Unknown property style arr.plot.line(style='.-') AttributeError: Unknown property style ```

Passing DataArray to pyplot.plot() loses index:

```python

plt.plot(arr, '.-')

works but loses coords; same as plot.plot(arr.values, '.-')

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2837/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
423023519 MDU6SXNzdWU0MjMwMjM1MTk= 2825 KeyError on selecting empty time slice from a datetime-indexed Dataset aldanor 2418513 open 0     4 2019-03-20T01:21:56Z 2019-03-20T17:58:24Z   NONE      

(xarray version: 0.11.3)

Just wanted to confirm this is expected behaviour: sel() with a date that would yield an empty selection throws an exception (I would naturally expect it to return a zero-length dataarray/dataset instead):

```python

foo = xr.DataArray( np.array([1, 2, 3]), {'t': pd.to_datetime(['2018-01-01', '2018-02-02T01:01', '2018-02-02T02:02'])}, dims=['t'] )

foo.sel(t='2018-01-01').size 1

foo.sel(t='2018-02-02').size 2

foo.sel(t='2018-03-03').size # expected 0?

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item() TypeError: an integer is required

During handling of the above exception, another exception occurred:

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc() pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc() pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine._date_check_type() KeyError: '2018-03-03'

During handling of the above exception, another exception occurred:

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item() TypeError: an integer is required

During handling of the above exception, another exception occurred:

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc() pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc() pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine._date_check_type()

KeyError: '2018-03-03'

During handling of the above exception, another exception occurred:

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item() KeyError: 1520035200000000000

During handling of the above exception, another exception occurred:

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc() pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc() KeyError: Timestamp('2018-03-03 00:00:00')

During handling of the above exception, another exception occurred:

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item() KeyError: 1520035200000000000

During handling of the above exception, another exception occurred:

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc() pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc() KeyError: Timestamp('2018-03-03 00:00:00')

...

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2825/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 2878.085ms · About: xarray-datasette