home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

17 rows where user = 731499 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, closed_at, body, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 10
  • pull 7

state 2

  • closed 15
  • open 2

repo 1

  • xarray 17
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1288323549 I_kwDOAMm_X85MykHd 6736 better handling of invalid files in open_mfdataset vnoel 731499 open 0     2 2022-06-29T08:00:18Z 2023-07-09T23:49:36Z   CONTRIBUTOR      

Is your feature request related to a problem?

Suppose I'm trying to read a large number of netCDF files with open_mfdataset.

Now suppose that one of those files is for some reason incorrect -- for instance there was a problem during the creation of that particular file, and its file size is zero, or it is not valid netCDF. The file exists, but it is invalid.

Currently open_mfdataset will raise an exception with the message ValueError: did not find a match in any of xarray's currently installed IO backends

As far as I can tell, there is currently no way to identify which one(s) of the files being read is the source of the problem. If there are several hundreds of those, finding the problematic files is a task by itself, even though xarray probably knows them.

Describe the solution you'd like

It would be most useful to this particular user if the error message could somehow identify the file(s) responsible for the exception.

Apart from better reporting, I would find it very useful if I could pass to open_mfdataset some kind of argument that would make it ignore invalid files altogether (ignore_invalid=False comes to mind).

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6736/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1295939038 I_kwDOAMm_X85NPnXe 6758 simple groupby_bins 10x slower than numpy vnoel 731499 closed 0     8 2022-07-06T14:36:26Z 2022-07-07T08:26:26Z 2022-07-06T17:24:27Z CONTRIBUTOR      

I am finding that groupby_bins is 10x slower than numpy in what I consider to be a simple implementation.

In the screenshot below, you can see me opening a netCDF file containing two variables with the same single dimension. One variable is the latitude. I want to aggregate (sum) the other variable in bins of latitude. The xarray approach using groupby_bins takes ~314ms per loop, the numpy approach less than 30ms per loop.

I need to do this kind of computation on many more variables, on data spanning several years, and following the xarray approach leads to many more hours of processing :-/

Am I doing something wrong here?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6758/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
205414496 MDU6SXNzdWUyMDU0MTQ0OTY= 1249 confusing dataset creation process vnoel 731499 open 0     6 2017-02-05T09:52:44Z 2022-06-26T15:07:59Z   CONTRIBUTOR      

In another issue I create a simple dataset like so:

python lat = np.random.rand(50000) * 180 - 90 lon = np.random.rand(50000) * 360 - 180 d = xr.Dataset({'latitude':lat, 'longitude':lon})

I expected d to contain two variables (latitude and longitude) with no coordinates. Instead d appears to contain two coordinates and no variables:

In [5]: d Out[5]: <xarray.Dataset> Dimensions: (latitude: 50000, longitude: 50000) Coordinates: * latitude (latitude) float64 -76.0 -84.36 26.69 66.44 -37.85 50.13 ... * longitude (longitude) float64 -148.7 -74.82 18.37 117.7 80.63 12.25 ... Data variables: *empty*

Is this desired behavior?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1249/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
200364693 MDU6SXNzdWUyMDAzNjQ2OTM= 1201 pass projection argument to plt.subplot when faceting with cartopy transform vnoel 731499 closed 0     10 2017-01-12T13:18:52Z 2020-03-29T16:30:29Z 2020-03-29T16:30:29Z CONTRIBUTOR      

I have a data 3D DataArray with Time, Latitude and Longitude coordinates.

I want to plot maps of this dataset, faceted by Time. The following code

import cartopy.crs as ccrs proj = ccrs.PlateCarree() data.plot(transform=proj, col='Time', col_wrap=3, robust=True)

fails with

ValueError: Axes should be an instance of GeoAxes, got <class 'matplotlib.axes._subplots.AxesSubplot'>

this is because to plot with a transform, the axes must be a GeoAxes, which is done with something like plt.subplot(111, projection=proj). The implicit subplotting done when faceting does not do that. To make the faceting works, I had to do

import cartopy.crs as ccrs proj = ccrs.PlateCarree() data.plot(transform=proj, col='Time', col_wrap=3, robust=True, subplot_kws={'projection':proj})

I propose that, when plot faceting is requested with a transform kw, the content of that keyword should be passed to the subplot function as a projection argument automatically by default. If a projection is provided explicitely like in the call above, use that one.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1201/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
274298111 MDU6SXNzdWUyNzQyOTgxMTE= 1719 open_mfdataset crashes when files are present with datasets of dimension = 0 vnoel 731499 closed 0     2 2017-11-15T20:44:17Z 2020-03-09T00:50:17Z 2020-03-09T00:50:17Z CONTRIBUTOR      

I have a bunch of netCDF files that I want to read through open_mfdataset. Each file was created with xarray through to_netcdf() and contains several variables with a single time dimension. Most files look like this:

``` netcdf CAL_LID_L2_05kmCLay-Standard-V4-10.2008-07-01T01-54-46ZD.hdf_extract { dimensions: time = 112 ; variables: float lat(time) ; lat:_FillValue = NaNf ; float lon(time) ; lon:_FillValue = NaNf ; float elev(time) ; elev:_FillValue = NaNf ; double daynight(time) ; daynight:_FillValue = NaN ; double surf(time) ; surf:_FillValue = NaN ;

// global attributes: :_NCProperties = "version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17" ; } ```

if one of the files is empty, i.e. the length of the 'time' dimension is zero:

``` netcdf CAL_LID_L2_05kmCLay-Standard-V4-10.2008-01-01T00-37-48ZD.hdf_extract { dimensions: time = UNLIMITED ; // (0 currently) variables: float lat(time) ; lat:_FillValue = NaNf ; float lon(time) ; lon:_FillValue = NaNf ; float elev(time) ; elev:_FillValue = NaNf ; double daynight(time) ; daynight:_FillValue = NaN ; double surf(time) ; surf:_FillValue = NaN ;

// global attributes: :_NCProperties = "version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17" ; } ```

then open_mfdataset crashes with

python File "./test_map_elev_month_gl2.py", line 22, in main data = xr.open_mfdataset(files, concat_dim='time', autoclose=True) File "/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/backends/api.py", line 505, in open_mfdataset **kwargs) for p in paths] File "/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/backends/api.py", line 505, in <listcomp> **kwargs) for p in paths] File "/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/backends/api.py", line 301, in open_dataset return maybe_decode_store(store, lock) File "/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/backends/api.py", line 243, in maybe_decode_store lock=lock) File "/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/core/dataset.py", line 1094, in chunk for k, v in self.variables.items()]) File "/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/core/dataset.py", line 1094, in <listcomp> for k, v in self.variables.items()]) File "/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/core/dataset.py", line 1089, in maybe_chunk return var.chunk(chunks, name=name2, lock=lock) File "/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/core/variable.py", line 540, in chunk data = da.from_array(data, chunks, name=name, lock=lock) File "/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py", line 1798, in from_array chunks = normalize_chunks(chunks, x.shape) File "/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py", line 1758, in normalize_chunks for s, c in zip(shape, chunks)), ()) File "/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py", line 1758, in <genexpr> for s, c in zip(shape, chunks)), ()) File "/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py", line 881, in blockdims_from_blockshape for d, bd in zip(shape, chunks)) File "/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py", line 881, in <genexpr> for d, bd in zip(shape, chunks)) ZeroDivisionError: integer division or modulo by zero

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1719/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
200593854 MDExOlB1bGxSZXF1ZXN0MTAxNDIwNzky 1205 transfer projection to implied subplots when faceting vnoel 731499 closed 0     8 2017-01-13T10:16:30Z 2019-07-13T20:54:14Z 2019-07-13T20:54:14Z CONTRIBUTOR   0 pydata/xarray/pulls/1205

this catches transform kw passed to plot() when faceting, and pass the associated projection to the subplots. This does what was suggested in issue #1201.

(sorry for the irrelevant change to reshaping.rst, it seems I've stuck myself in a git hole and can't get out)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1205/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
200376941 MDExOlB1bGxSZXF1ZXN0MTAxMjY2MTM5 1203 add info in doc on how to facet with cartopy vnoel 731499 closed 0     7 2017-01-12T14:13:32Z 2019-03-28T12:48:09Z 2017-01-13T16:29:14Z CONTRIBUTOR   0 pydata/xarray/pulls/1203

This change explains - how to pass projection arguments to faceted subplots - how to access the Axes of the created subplots (not relevant only to cartopy), relevant to issue #1202

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1203/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
205215815 MDU6SXNzdWUyMDUyMTU4MTU= 1247 numpy function very slow on DataArray compared to DataArray.values vnoel 731499 closed 0     5 2017-02-03T17:12:08Z 2019-01-23T17:34:22Z 2019-01-23T17:34:22Z CONTRIBUTOR      

First I create some fake latitude and longitude points. I stash them in a dataset, and compute a 2d histogram on those.

```python

!/usr/bin/env python

import xarray as xr import numpy as np

lat = np.random.rand(50000) * 180 - 90 lon = np.random.rand(50000) * 360 - 180 d = xr.Dataset({'latitude':lat, 'longitude':lon})

latbins = np.r_[-90:90:2.] lonbins = np.r_[-180:180:2.] h, xx, yy = np.histogram2d(d['longitude'], d['latitude'], bins=(lonbins, latbins)) ```

When I run this I get some underwhelming performance:

```

time ./test_with_xarray.py

real 0m28.152s user 0m27.201s sys 0m0.630s ```

If I change the last line to

python h, xx, yy = np.histogram2d(d['longitude'].values, d['latitude'].values, bins=(lonbins, latbins))

(i.e. I pass the numpy arrays directly to the histogram2d function), things are very different:

```

time ./test_with_xarray.py

real 0m0.996s user 0m0.569s sys 0m0.253s ```

It's ~28 times slower to call histogram2d on the DataArrays, compared to calling it on the underlying numpy arrays. I ran into this issue while histogramming quite large lon/lat vectors from multiple netCDF files. I got tired waiting for the computation to end, added the .values to the call and went through very quickly.

It seems problematic that using xarray can slow down your code by 28 times with no real way for you to know about it...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1247/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
329483009 MDU6SXNzdWUzMjk0ODMwMDk= 2216 let the user specify figure dpi when plotting vnoel 731499 closed 0     3 2018-06-05T14:28:52Z 2019-01-13T01:40:11Z 2019-01-13T01:40:11Z CONTRIBUTOR      

when using a DataArray plot function, it is already possible to specify the figure size and aspect ratio.

I think it would make sense to also be able to specify the dpi, e.g. x.plot(dpi=109). It would save one call to plt.figure().

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2216/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
208903781 MDU6SXNzdWUyMDg5MDM3ODE= 1279 Rolling window operation does not work with dask arrays vnoel 731499 closed 0     12 2017-02-20T14:59:59Z 2017-09-14T17:19:51Z 2017-09-14T17:19:51Z CONTRIBUTOR      

As the title says :-)

This would be very useful to downsample long time series read from multiple consecutive netcdf files.

Note that I was able to apply the rolling window by converting my variable to a pandas series with to_series(). I then could use panda's own rolling window methods. I guess that when converting to a pandas series the dask array is read in memory?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1279/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
211391408 MDExOlB1bGxSZXF1ZXN0MTA4NzU5MzIw 1291 Guess the complementary dimension when only one is passed to pcolormesh vnoel 731499 closed 0     10 2017-03-02T13:31:16Z 2017-03-07T15:46:58Z 2017-03-07T14:56:13Z CONTRIBUTOR   0 pydata/xarray/pulls/1291
  • [x] closes #1290
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1291/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
211380196 MDU6SXNzdWUyMTEzODAxOTY= 1290 guess the second coordinate when only one is passed to pcolormesh() vnoel 731499 closed 0     0 2017-03-02T12:42:37Z 2017-03-07T14:56:13Z 2017-03-07T14:56:13Z CONTRIBUTOR      

Say I have a DataArray zwith dimensions ('x', 'y'), in that order. If I call z.plot(), it will create a pcolormesh with x the vertical dimension and y the horizontal one. If I want to invert the axes, I need to call z.plot(x='x', y='y'). If I supply only one of the dimensions, i.e. z.plot(x='x'), I get the error message

python ValueError: cannot supply only one of x and y

I think that when calling the plot function on a 2d DataArray and passing only one coordinate, as in z.plot(x='x'), xarray could guess we want the unpassed coordinate to be the other one. I don't see why this would be unsafe.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1290/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
200369077 MDU6SXNzdWUyMDAzNjkwNzc= 1202 way to apply functions to subplots when faceting vnoel 731499 closed 0     2 2017-01-12T13:39:07Z 2017-01-13T18:48:32Z 2017-01-13T18:48:32Z CONTRIBUTOR      

When plotting maps with cartopy, it is common to request plotting additional information over the map, e.g. coastlines using ax.coastlines().

When faceting maps (as in issue #1201), AFAICS there is no way to add coastlines to each of the faceted subplots. I do not know if or how this can be done, but in my view not being able to do it severely limits the interest of faceting when dealing with maps.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1202/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
199809433 MDExOlB1bGxSZXF1ZXN0MTAwODY0NzY0 1196 small typo vnoel 731499 closed 0     1 2017-01-10T12:32:51Z 2017-01-10T18:10:21Z 2017-01-10T18:10:19Z CONTRIBUTOR   0 pydata/xarray/pulls/1196
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1196/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
187661575 MDExOlB1bGxSZXF1ZXN0OTI1NDc5ODg= 1088 missing return value in sample function calls (I think) vnoel 731499 closed 0     3 2016-11-07T09:25:40Z 2016-11-16T02:14:33Z 2016-11-16T02:14:28Z CONTRIBUTOR   0 pydata/xarray/pulls/1088

sorry if I messed up, I'm not a github master

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1088/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
187990259 MDExOlB1bGxSZXF1ZXN0OTI3NzY0NjI= 1096 fix typo in doc vnoel 731499 closed 0     1 2016-11-08T13:22:31Z 2016-11-08T15:55:32Z 2016-11-08T15:55:28Z CONTRIBUTOR   0 pydata/xarray/pulls/1096
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1096/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
187661822 MDExOlB1bGxSZXF1ZXN0OTI1NDgxNTI= 1089 fix typo vnoel 731499 closed 0     1 2016-11-07T09:26:56Z 2016-11-07T13:59:10Z 2016-11-07T13:59:10Z CONTRIBUTOR   0 pydata/xarray/pulls/1089
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1089/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 60.379ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows