home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where user = 7933853 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 6

  • concat result not correct for particular dataset 3
  • How to efficiently use DataArrays with Cartopy's add_cyclic_point utility? 2
  • BUG: Resample on PeriodIndex not working? 2
  • How to avoid the auto convert variable dtype from float32 to float64 when read netCDF file use open_dataset? 1
  • Facetgrid: colors beyond range (extend) not saturated 1
  • to_netcdf() doesn't work with multiprocessing scheduler 1

user 1

  • lvankampenhout · 10 ✖

author_association 1

  • NONE 10
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
702348129 https://github.com/pydata/xarray/issues/3781#issuecomment-702348129 https://api.github.com/repos/pydata/xarray/issues/3781 MDEyOklzc3VlQ29tbWVudDcwMjM0ODEyOQ== lvankampenhout 7933853 2020-10-01T19:24:48Z 2020-10-01T20:00:27Z NONE

I think I ran into a similar problem when combining dask-chunked DataSets (originating from open_mfdataset) with Python's native multiprocessing package. I get no error message, and the headers of the files are created, but then the script hangs indefinitely. The use case is: combining and resampling of variables into ~1000 different NetCDF files, which I want to distribute over different processes using multiprocessing.

MCVE Code Sample ```python import xarray as xr from multiprocessing import Pool import os

if (False): """ Load data without using dask """ ds = xr.open_dataset("http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.1960.nc") else: """ Load data using dask """ ds = xr.open_dataset("http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.1960.nc", chunks={})

print(ds.nbytes / 1e6, 'MB')

print('chunks', ds.air.chunks) # chunks is empty without dask

outdir = '/glade/scratch/lvank' # change this to some temporary directory on your system

def do_work(n): print(n) ds.to_netcdf(os.path.join(outdir, f'{n}.nc'))

tasks = range(10)

with Pool(processes=2) as pool: pool.map(do_work, tasks)

print('done') ```

Expected Output The NetCDF copies in outdir named 0.nc to 9.nc should be created for both cases (with and without Dask).

Problem Description In the case with Dask, when the if-statement evaluates to False, the files are not created and the program hangs.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1127.13.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.7.3 xarray: 0.16.1 pandas: 1.1.1 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.27.0 distributed: 2.28.0 matplotlib: 3.3.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20200925 pip: 20.2.2 conda: None pytest: None IPython: 7.18.1 sphinx: None ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf() doesn't work with multiprocessing scheduler 567678992
573042086 https://github.com/pydata/xarray/issues/3681#issuecomment-573042086 https://api.github.com/repos/pydata/xarray/issues/3681 MDEyOklzc3VlQ29tbWVudDU3MzA0MjA4Ng== lvankampenhout 7933853 2020-01-10T13:48:09Z 2020-01-10T13:49:24Z NONE

Unfortunately, join='override' raises an IndexError.

``` --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-90-462267a327eb> in <module> ----> 1 ds6 = xr.concat((ds1,ds2), dim='time',join='override') ~/anaconda3/lib/python3.6/site-packages/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join) 131 "objects, got %s" % type(first_obj) 132 ) --> 133 return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) 134 135 ~/anaconda3/lib/python3.6/site-packages/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join) 299 datasets = [ds.copy() for ds in datasets] 300 datasets = align( --> 301 *datasets, join=join, copy=False, exclude=[dim], fill_value=fill_value 302 ) 303 ~/anaconda3/lib/python3.6/site-packages/xarray/core/alignment.py in align(join, copy, indexes, exclude, fill_value, *objects) 269 270 if join == "override": --> 271 objects = _override_indexes(objects, all_indexes, exclude) 272 273 # We don't reindex over dimensions with all equal indexes for two reasons: ~/anaconda3/lib/python3.6/site-packages/xarray/core/alignment.py in _override_indexes(objects, all_indexes, exclude) 53 for dim in obj.dims: 54 if dim not in exclude: ---> 55 new_indexes[dim] = all_indexes[dim][0] 56 objects[idx + 1] = obj._overwrite_indexes(new_indexes) 57 IndexError: list index out of range ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat result not correct for particular dataset 548029687
573040403 https://github.com/pydata/xarray/issues/3681#issuecomment-573040403 https://api.github.com/repos/pydata/xarray/issues/3681 MDEyOklzc3VlQ29tbWVudDU3MzA0MDQwMw== lvankampenhout 7933853 2020-01-10T13:43:41Z 2020-01-10T13:44:15Z NONE

Thanks Tom. This indeed gives a dataset with the correct dimensions but there is missing data

I've also tried join='override' but this raises an IndexError.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat result not correct for particular dataset 548029687
573026476 https://github.com/pydata/xarray/issues/3681#issuecomment-573026476 https://api.github.com/repos/pydata/xarray/issues/3681 MDEyOklzc3VlQ29tbWVudDU3MzAyNjQ3Ng== lvankampenhout 7933853 2020-01-10T13:01:46Z 2020-01-10T13:01:46Z NONE

good point,np.array_equal(ds1.lat , ds2.lat) yields False whereas np.allclose() reported True.

How to use compat='override' in this case? I tried ds3 = xr.concat((ds1,ds2), dim='time',compat='override',coords='minimal') But this didn't work.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat result not correct for particular dataset 548029687
488279851 https://github.com/pydata/xarray/issues/2932#issuecomment-488279851 https://api.github.com/repos/pydata/xarray/issues/2932 MDEyOklzc3VlQ29tbWVudDQ4ODI3OTg1MQ== lvankampenhout 7933853 2019-05-01T13:19:40Z 2019-05-01T13:19:40Z NONE

Thanks, I've implemented your suggestion as a workaround, but it fails with the following error:

NameError: name '_process_cmap_cbar_kwargs' is not defined

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Facetgrid: colors beyond range (extend) not saturated 438694589
470546895 https://github.com/pydata/xarray/issues/1005#issuecomment-470546895 https://api.github.com/repos/pydata/xarray/issues/1005 MDEyOklzc3VlQ29tbWVudDQ3MDU0Njg5NQ== lvankampenhout 7933853 2019-03-07T14:29:53Z 2019-03-07T14:29:53Z NONE

Stephan, thanks a lot for your code snippet from December, this is an elegant solution to the problem. One minor correction though, because I found that it fails to infer the period if none is given. The divide should be a multiplication I believe, i.e.

```python import xarray import numpy as np

def add_cyclic_point(xarray_obj, dim, period=None): if period is None: period = xarray_obj.sizes[dim] * xarray_obj.coords[dim][:2].diff(dim).item() first_point = xarray_obj.isel({dim: slice(1)}) first_point.coords[dim] = first_point.coords[dim]+period return xarray.concat([xarray_obj, first_point], dim=dim) ```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How to efficiently use DataArrays with Cartopy's add_cyclic_point utility? 177484162
447034484 https://github.com/pydata/xarray/issues/1005#issuecomment-447034484 https://api.github.com/repos/pydata/xarray/issues/1005 MDEyOklzc3VlQ29tbWVudDQ0NzAzNDQ4NA== lvankampenhout 7933853 2018-12-13T16:34:13Z 2018-12-13T16:34:29Z NONE

Any update on this issue? It would be great if add_cyclic_point could be applied to all variables automatically.

Just for other peoples reference, I now have this workaround, creating erai_jja_cy a 'cyclic' version of erai_jja: python dd, ll = add_cyclic_point(erai_jja.values, erai_jja.lon) erai_jja_cy = xr.DataArray(dd, coords={'lat':erai_jja.lat, 'lon':ll}, dims=('lat','lon'))

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How to efficiently use DataArrays with Cartopy's add_cyclic_point utility? 177484162
392682701 https://github.com/pydata/xarray/issues/1270#issuecomment-392682701 https://api.github.com/repos/pydata/xarray/issues/1270 MDEyOklzc3VlQ29tbWVudDM5MjY4MjcwMQ== lvankampenhout 7933853 2018-05-29T07:41:53Z 2018-05-29T07:41:53Z NONE

thanks for your elaborate response @spencerkclark

Do you happen to be using a PeriodIndex because of pandas Timestamp-limitations?

Yes, the main limitation being the limited range of years (~584) whereas my dataset spans 1800 years. Note that in glaciology, which deals with ice sheet responses over multiple millennia, this is considered a short period.

I elaborated a bit more on my problem in this issue which is in a unofficial repo, I realized too late.

Anyway, your code using cftime solves my problem 😄 indeed resampling to 'AS-JUN' is what I was looking for. Still, it would be nice to have better support for PeriodIndex in the future. It has costed me a lot of time figuring out what's going on and learning the details of all the different date & time implementations. Which is a waste in the end.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  BUG: Resample on PeriodIndex not working? 207862981
390892554 https://github.com/pydata/xarray/issues/1270#issuecomment-390892554 https://api.github.com/repos/pydata/xarray/issues/1270 MDEyOklzc3VlQ29tbWVudDM5MDg5MjU1NA== lvankampenhout 7933853 2018-05-22T07:36:40Z 2018-05-22T07:36:40Z NONE

+1 to this issue. I'm struggling big time with an 1800-year climate model dataset that I need to resample in order to make different annual means (June-May).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  BUG: Resample on PeriodIndex not working? 207862981
376810608 https://github.com/pydata/xarray/issues/1008#issuecomment-376810608 https://api.github.com/repos/pydata/xarray/issues/1008 MDEyOklzc3VlQ29tbWVudDM3NjgxMDYwOA== lvankampenhout 7933853 2018-03-28T08:49:24Z 2018-03-28T08:49:49Z NONE

I stumbled across the same problem in xarray 0.9.1 and updating to 0.10.2 solved it. Perhaps this issue may be closed?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How to avoid the auto convert variable dtype from float32 to float64 when read netCDF file use open_dataset? 177754433

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.43ms · About: xarray-datasette