home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

7 rows where state = "closed" and user = 743508 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 6
  • pull 1

state 1

  • closed · 7 ✖

repo 1

  • xarray 7
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
163267018 MDU6SXNzdWUxNjMyNjcwMTg= 893 'Warm start' for open_mfdataset? mangecoeur 743508 closed 0     3 2016-06-30T21:05:46Z 2023-05-29T13:35:32Z 2023-05-29T13:35:32Z CONTRIBUTOR      

I'm using xarray in ipython to do interactive/exploratory analysis on large multi-file datasets. To avoid having too many files open, I'm wrapping my file-open code in a with block. However, this means that every time I re-run the code the multi-file dataset is re-initialised, causing xarray to re-scan every input datafile to construct the Dataset.

It would be good to have some kind of 'warm start' or caching mechanism to make it easier to re-open multifile datasets without having to re-scan the input files, but equally without having to keep the dataset open which keeps all the file handles open (I've hit the OS max file limit because of this).

Not sure what API would suit this - since it while being a useful usecase it's also a bit wierd. Something like open_cached_mfdataset which closes input files after initialisation but caches the information collected and simply assumes that files don't move or change between accesses.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/893/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
231061878 MDU6SXNzdWUyMzEwNjE4Nzg= 1424 Huge memory use when using FacetGrid mangecoeur 743508 closed 0     6 2017-05-24T14:35:16Z 2019-06-29T02:58:33Z 2019-06-29T02:58:33Z CONTRIBUTOR      

When plotting a time series of maps using faceting, my memory use jumps by over 3x, from about 4GB to 14GB.

Using macOS, Python 3.6, xarray 0.9.5, jupyter notebook.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1424/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
161068483 MDU6SXNzdWUxNjEwNjg0ODM= 887 Perf: use Scipy engine by default for netcdf3? mangecoeur 743508 closed 0     2 2016-06-19T11:27:56Z 2019-02-26T12:51:17Z 2019-02-26T12:51:17Z CONTRIBUTOR      

Not really a bug, but I'm finding that the scipy backend is considerably faster than the netCDF backend for netCDF 3 files (using dataset: http://rda.ucar.edu/datasets/ds093.1/). Using Anaconda python with MKL. Not sure if this is always faster, but if it is perhaps xarray should default to scipy backend for netCDF 3 files?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/887/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
157886730 MDU6SXNzdWUxNTc4ODY3MzA= 864 TypeError: invalid type promotion when reading multi-file dataset mangecoeur 743508 closed 0     3 2016-06-01T11:44:49Z 2019-01-27T21:54:49Z 2019-01-27T21:54:49Z CONTRIBUTOR      

I'm trying to select data from a collection of weather files. Xarray opens the multifile dataset perfectly, but when I try the following selection:

``` python

cfsr_new = xr.open_mfdataset('*.grb2.nc')

lon_sel = np.array(cfsr_new.lon[np.array([3, 4, 8])]) lat_sel = np.array(cfsr_new.lat[np.array([2, 3, 4])]) time_sel = cfsr_new.time[100:200]

selection = cfsr_new.sel(lon=lon_sel, lat=lat_sel, time=time_sel) selection.to_array()

```

I get:

```

TypeError Traceback (most recent call last) <ipython-input-38-3f04c6458da2> in <module>() ----> 1 selection.to_array()

/Users/<user>/anaconda/lib/python3.5/site-packages/xarray/core/dataset.py in to_array(self, dim, name) 1847 data_vars = [self.variables[k] for k in self.data_vars] 1848 broadcast_vars = broadcast_variables(*data_vars) -> 1849 data = ops.stack([b.data for b in broadcast_vars], axis=0) 1850 1851 coords = dict(self.coords)

/Users/<user>//anaconda/lib/python3.5/site-packages/xarray/core/ops.py in f(args, kwargs) 65 else: 66 module = eager_module ---> 67 return getattr(module, name)(args, kwargs) 68 else: 69 def f(data, *args, kwargs):

/Users/<user>//anaconda/lib/python3.5/site-packages/dask/array/core.py in stack(seq, axis) 1754 1755 if all(a._dtype is not None for a in seq): -> 1756 dt = reduce(np.promote_types, [a._dtype for a in seq]) 1757 else: 1758 dt = None

/Users/<user>//anaconda/lib/python3.5/site-packages/toolz/functoolz.py in call(self, args, kwargs) 217 def call(self, args, kwargs): 218 try: --> 219 return self._partial(*args, kwargs) 220 except TypeError: 221 # If there was a genuine TypeError

TypeError: invalid type promotion

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/864/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
142675134 MDU6SXNzdWUxNDI2NzUxMzQ= 799 Support for pathlib.Path mangecoeur 743508 closed 0     2 2016-03-22T14:53:48Z 2017-09-01T15:31:52Z 2017-09-01T15:31:52Z CONTRIBUTOR      

pathlib.Path IMHO is one of the best additions to Python. Would be nice if it were possible to open files from Path without having to cast to str

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/799/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
195050684 MDU6SXNzdWUxOTUwNTA2ODQ= 1161 Generated Dask graph is huge - performance issue? mangecoeur 743508 closed 0     8 2016-12-12T18:35:12Z 2017-01-23T20:21:14Z 2017-01-23T20:21:14Z CONTRIBUTOR      

I've been trying to get around some performance issues when subsetting a set of netCDF files opend with open_mfdataset. I managed to print out the generated dask graph for one variable and it doesn't seem right - it's huge, 5000 elements, and seems to have a getitem entry for every requested element for that variable.

The code that generates this select looks roughly like:

```python

paths = WEATHER_MET['latlon'].glob('_resampled.nc') dataset = xr.open_mfdataset([str(p) for p in paths]) selection = dataset.sel(time=time_sel).sel_points(method='nearest', tolerance=0.1, lon=lon, lat=lat) selection = weights ```

and the graph for one variable in the select (the irradiance value) looks like this:

mydask.pdf

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1161/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
195125296 MDExOlB1bGxSZXF1ZXN0OTc2NjMxMTg= 1162 #1161 WIP to vectorize isel_points mangecoeur 743508 closed 0     15 2016-12-13T00:19:46Z 2017-01-23T20:20:51Z 2017-01-23T20:20:47Z CONTRIBUTOR   0 pydata/xarray/pulls/1162

WIP to use dask vindex to point based selection

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1162/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 19.818ms · About: xarray-datasette