home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

6 rows where user = 1117224 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 4
  • pull 2

state 2

  • closed 4
  • open 2

repo 1

  • xarray 6
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
481761508 MDU6SXNzdWU0ODE3NjE1MDg= 3223 Feature request for multiple tolerance values when using nearest method and sel() NicWayand 1117224 open 0     4 2019-08-16T19:53:31Z 2024-04-29T23:21:04Z   NONE      

```python import xarray as xr import numpy as np import pandas as pd

Create test data

ds = xr.Dataset() ds.coords['lon'] = np.arange(-120,-60) ds.coords['lat'] = np.arange(30,50) ds.coords['time'] = pd.date_range('2018-01-01','2018-01-30') ds['AirTemp'] = xr.DataArray(np.ones((ds.lat.size,ds.lon.size,ds.time.size)), dims=['lat','lon','time'])

target_lat = [36.83] target_lon = [-110] target_time = [np.datetime64('2019-06-01')]

Nearest pulls a date too far away

ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest')

Adding tolerance for lat long, but also applied to time

ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance=0.5)

Ideally tolerance could accept a dictionary but currently fails

ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance={'lat':0.5, 'lon':0.5, 'time':np.timedelta64(1,'D')})

```

Expected Output

A dataset with nearest values to tolerances on each dim.

Problem Description

I would like to add the ability of tolerance to accept a dictionary for multiple tolerance values for different dimensions. Before I try implementing it, I wanted to 1) check it doesn't already exist or someone isn't working on it, and 2) get suggestions for how to proceed.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Feb 20 2019, 02:51:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.9.184-0.1.ac.235.83.329.metal1.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.11.3 pandas: 0.24.1 numpy: 1.15.4 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: 1.5.5 zarr: 2.2.0 cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None cyordereddict: None dask: 1.1.2 distributed: 1.26.0 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: None IPython: 7.3.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3223/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
309100522 MDU6SXNzdWUzMDkxMDA1MjI= 2018 MemoryError when using save_mfdataset() NicWayand 1117224 closed 0     1 2018-03-27T19:22:28Z 2020-03-28T07:51:17Z 2020-03-28T07:51:17Z NONE      

Code Sample, a copy-pastable example if possible

```python import xarray as xr import dask

Dummy data that on disk is about ~200GB

da = xr.DataArray(dask.array.random.normal(0, 1, size=(12,408,1367,304,448), chunks=(1, 1, 1, 304, 448)), dims=('ensemble', 'init_time', 'fore_time', 'x', 'y'))

Perform some calculation on the dask data

da_sum = da.sum(dim='x').sum(dim='y')(2525)/(10**6)

Write to multiple files

c_e, datasets = zip(*da_sum.to_dataset(name='sic').groupby('ensemble')) paths = ['file_%s.nc' % e for e in c_e] xr.save_mfdataset(datasets, paths)

```

Problem description

Results in a MemoryError, when dask should handle writing this OOM DataArray to multiple within-memory-sized netcdf files. Related SO post here

Expected Output

12 netcdf files (grouped by the ensemble dim).

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.14.12 machine: x86_64 processor: byteorder: little LC_ALL: C LANG: C LOCALE: None.None xarray: 0.10.2 pandas: 0.22.0 numpy: 1.14.1 scipy: 1.0.0 netCDF4: 1.3.1 h5netcdf: 0.5.0 h5py: 2.7.1 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.17.1 distributed: 1.21.1 matplotlib: 2.2.2 cartopy: None seaborn: 0.8.1 setuptools: 38.5.1 pip: 9.0.1 conda: None pytest: None IPython: 6.2.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2018/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
225536793 MDU6SXNzdWUyMjU1MzY3OTM= 1391 Adding Example/Tutorial of importing data to Xarray (Merge/conact/etc) NicWayand 1117224 open 0 rabernat 1197350   11 2017-05-01T21:50:33Z 2019-07-12T19:43:30Z   NONE      

I love xarray for analysis but getting my data into xarray often takes a lot more time than I think it should. I am a hydrologist and very often hydro data is poorly stored/formatted, which means I need to do multiple merge/conact/combine_first operations etc. to get to a nice xarray dataset format. I think having more examples for importing different types of data would be helpful (for me and possibly others), instead of my current approach, which often entails trial and error.

I can start off by providing an example of importing funky hydrology data that hopefully would be general enough for others to use. Maybe we can compile other examples as well. With the end goal of adding to the readthedocs.

@klapo @jhamman

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1391/reactions",
    "total_count": 7,
    "+1": 7,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
186326698 MDExOlB1bGxSZXF1ZXN0OTE2Mzk0OTY= 1070 Feature/rasterio NicWayand 1117224 closed 0     11 2016-10-31T16:14:55Z 2017-05-22T08:47:40Z 2017-05-22T08:47:40Z NONE   0 pydata/xarray/pulls/1070

@jhamman started a backend for RasterIO that I have been working on. There are two issues I am stuck on that I could use some help:

1) Lat/long coords are not being decoded correctly (missing from output dataset). Lat/lon projection are correctly calculated and added here (https://github.com/NicWayand/xray/blob/feature/rasterio/xarray/backends/rasterio_.py#L117). But, it appears (with my limited knowledge of xarray) that the lat/long coords contained within obj are lost at this line (https://github.com/NicWayand/xray/blob/feature/rasterio/xarray/conventions.py#L930).

2) Lazy-loading needs to be enabled. How can I setup/test this? Are there examples from other backends I could follow?

790

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1070/reactions",
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
170688064 MDExOlB1bGxSZXF1ZXN0ODA5ODgxNzA= 961 Update time-series.rst NicWayand 1117224 closed 0     3 2016-08-11T16:26:58Z 2017-04-03T05:31:06Z 2017-04-03T05:31:06Z NONE   0 pydata/xarray/pulls/961

Thought it would be helpful to users to know that timezones are not handled here, rather than googling and finding this: https://github.com/pydata/xarray/issues/552

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/961/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
171504099 MDU6SXNzdWUxNzE1MDQwOTk= 970 Multiple preprocessing functions in open_mfdataset? NicWayand 1117224 closed 0     3 2016-08-16T20:01:22Z 2016-08-17T07:01:02Z 2016-08-16T21:46:43Z NONE      

I would like to have multiple functions applied during a open_mfdataset call.

Using one works great:

Python ds = xr.open_mfdataset(files,concat_dim='time',engine='pynio', preprocess=lambda x: x.load())

Does the current behavior include multiple calls? (apologizes if this is defined somewhere, I couldn't find any multiple calls examples)

Something like:

Python ds = xr.open_mfdataset(files,concat_dim='time',engine='pynio', preprocess=[lambda x: x.load(),lambda y: y['time']=100])

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/970/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 20.57ms · About: xarray-datasette