home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

7 rows where state = "closed" and user = 206773 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 7

state 1

  • closed · 7 ✖

repo 1

  • xarray 7
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
258500654 MDU6SXNzdWUyNTg1MDA2NTQ= 1576 Variable of dtype int8 casted to float64 forman 206773 closed 0     11 2017-09-18T14:28:32Z 2020-11-09T07:06:31Z 2020-11-09T07:06:30Z NONE      

I'm using a CF-compliant dataset from the ESA Land Cover CCI Project that contains a variable lccs_class with dtype=int8 and attribute _Unsigned='true'. Its values are class numbers in the range 1 to 220. When I open the dataset with default options, the resulting dtype of that variable will be float64. As the Land Cover maps are quite large (global, 300m grid cells, 129600 x 64800) this produces a considerable memory overhead.

>>> ds = xr.open_dataset(path)
>>> ds['lccs_class'].dtype
dtype('float64')

If I switch off CF decoding I get the original data type.

>>> ds = xr.open_dataset(path, decode_cf=False)
>>> ds['lccs_class'].dtype
dtype('int8')

I'd actually expect it to be converted to uint8 or int16 so that values above 127 are represented correctly.

The dataset is available here: ftp://anon-ftp.ceda.ac.uk/neodc/esacci/land_cover/data/land_cover_maps/v1.6.1/ESACCI-LC-L4-LCCS-Map-300m-P5Y-2010-v1.6.1.nc. Note the file is ~3 GB.

Btw, the attributes of the variable are

>>> ds['lccs_class'].attrs
OrderedDict([('long_name', 'Land cover class defined in LCCS'),
             ('standard_name', 'land_cover_lccs'),
             ('flag_values',
              array([   0,   10,   11,   12,   20,   30,   40,   50,   60,   61,   62,
                       70,   71,   72,   80,   81,   82,   90,  100,  110,  120,  121,
                      122, -126, -116, -106, -104, -103,  -96,  -86,  -76,  -66,  -56,
                      -55,  -54,  -46,  -36], dtype=int8)),
             ('flag_meanings',
              'no_data cropland_rainfed cropland_rainfed_herbaceous_cover cropland_rainfed_tree_or_shrub_cover ...'),
             ('valid_min', array([1])),
             ('valid_max', array([220])),
             ('_Unsigned', 'true'),
             ('_FillValue', array([0], dtype=int8)),
             ('ancillary_variables',
              'processed_flag current_pixel_state observation_count algorithmic_confidence_level')])
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1576/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
146287030 MDU6SXNzdWUxNDYyODcwMzA= 819 N-D rolling forman 206773 closed 0     5 2016-04-06T11:42:42Z 2019-02-27T17:48:20Z 2019-02-27T17:48:20Z NONE      

Dear xarray Team,

We just discovered xarray and it seems to be a fantastic candidate to serve as a core library for our climate data toolbox we are about to implement. While investigating the API we recognized that the windows kwargs in

DataArray.rolling(min_periods=None, center=False, **windows)

is limited to a single dim=window_size entry. Are there any plans to make it rolling in N-D? This could be very useful for efficient gap filling, filtering or other methodologies that use grid cell neighbourhoods in multiple dimensions.

Actually, I also asked myself why the groupby and resample methods don't take an N-D dim argument. This would allow for performing not only a temporal resampling but also a spatial resampling in the lat/lon plane or even a spatio-temporal resampling (including up- and downsampling in either dim).

Anyway, thanks for xarray!

Regards Norman

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/819/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
165540933 MDU6SXNzdWUxNjU1NDA5MzM= 899 Let open_mfdataset() respect cell boundary variables forman 206773 closed 0     5 2016-07-14T11:36:49Z 2019-02-25T19:28:23Z 2019-02-25T19:28:23Z NONE      

I recently faced a problem with open_mfdataset() as it concats variables that are actually used as auxilary coordinate variables, namely the cell boundary variables 'time_bnds', 'lat_bnds' and 'lon_bnds' (see CF Conventions 7.1. Cell Boundaries). open_mfdataset() will attach an extra 'time' dimension to 'lat_bnds' and 'lon_bnds' because they are seen as data variables rather than coordinates variables.

We could solve the problem by using the preprocess argument and turning these data variables into coordinates variables with ds.set_coords('lat_bnds', inplace=True).

However it would be nice to prevent concatenation of variables that don't have the concat_dim, e.g. by a keyword argument selective_concat or respect_cell_bnds_vars or so.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/899/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
146975644 MDU6SXNzdWUxNDY5NzU2NDQ= 822 value scaling wrong in special cases forman 206773 closed 0     13 2016-04-08T16:29:33Z 2019-02-19T02:11:31Z 2019-02-19T02:11:31Z NONE      

For the same netCDF file used in #821, the value scaling seems to be wrongly applied to compute float64 surface temperature values from a (signed) short variable analysed_sst:

short analysed_sst(time=1, lat=3600, lon=7200); :_FillValue = -32768S; // short :units = "kelvin"; :scale_factor = 0.01f; // float :add_offset = 273.15f; // float :long_name = "analysed sea surface temperature"; :valid_min = -300S; // short :valid_max = 4500S; // short :standard_name = "sea_water_temperature"; :depth = "20 cm"; :source = "ATSR<1,2>-ESACCI-L3U-v1.0, AATSR-ESACCI-L3U-v1.0, AVHRR<12,14,15,16,17,18>_G-ESACCI-L2P-v1.0, AVHRRMTA-ESACCI-L2P-v1.0"; :comment = "SST analysis produced for ESA SST CCI project using the OSTIA system in reanalysis mode."; :_ChunkSizes = 1, 1196, 2393; // int

Values are roughly -50 to 600 Kelvin instead of 270 to 310 Kelvin. It seems like the problem arises from misinterpreting the signed short raw values in the netCDF file.

Here is a notebook that better explains the issue: https://github.com/CCI-Tools/sandbox/blob/4c7a98a4efd1ba55152d2799b499cb27027c2b45/notebooks/norman/xarray-sst-issues.ipynb

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/822/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
321553778 MDU6SXNzdWUzMjE1NTM3Nzg= 2109 Dataset.expand_dims() not lazy forman 206773 closed 0     2 2018-05-09T12:39:44Z 2018-05-09T15:45:31Z 2018-05-09T15:45:31Z NONE      

The following won't come back for a very long time or will fail with an out-of-memory error:

```python

ds = xr.open_dataset("D:\EOData\LC-CCI\ESACCI-LC-L4-LCCS-Map-300m-P1Y-2015-v2.0.8.nc") ds <xarray.Dataset> Dimensions: (lat: 64800, lon: 129600) Coordinates: * lat (lat) float32 89.9986 89.9958 89.9931 89.9903 ... * lon (lon) float32 -179.999 -179.996 -179.993 -179.99 ... Data variables: change_count (lat, lon) int8 ... crs int32 ... current_pixel_state (lat, lon) int8 ... observation_count (lat, lon) int16 ... processed_flag (lat, lon) int8 ... lccs_class (lat, lon) uint8 ... Attributes: title: ESA CCI Land Cover Map summary: This dataset contains the global ESA CCI land... type: ESACCI-LC-L4-LCCS-Map-300m-P1Y id: ESACCI-LC-L4-LCCS-Map-300m-P1Y-2015-v2.0.7 project: Climate Change Initiative - European Space Ag... references: http://www.esa-landcover-cci.org/ ... ds_with_time = ds.expand_dims('time') Zzzzzzz... ```

Problem description

When I call Dataset.expand_dims('time') on one of my ~2GB datasets (compressed), it seems to load all data data into memory, at least memory consumption goes beyond 12GB eventually ending in an out-of-memory exception.

(Sorry for the German UI.)

Expected Output

Dataset.expand_dims should execute lazy and fast and not require considerable memory as adding a scalar time dimension should only affect indexing but not an array's memory layout. Array data should not be loaded into memory (through Dask, Zarr, etc).

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.2.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None xarray: 0.10.2 pandas: 0.20.3 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.3.1 h5netcdf: 0.5.0 h5py: 2.7.1 Nio: None zarr: 2.2.0 bottleneck: 1.2.1 cyordereddict: None dask: 0.15.2 distributed: 1.19.1 matplotlib: 2.1.1 cartopy: 0.16.0 seaborn: None setuptools: 36.3.0 pip: 9.0.1 conda: None pytest: 3.1.3 IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2109/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
258744901 MDU6SXNzdWUyNTg3NDQ5MDE= 1579 Support for unsigned data forman 206773 closed 0     3 2017-09-19T08:57:15Z 2017-09-21T15:46:30Z 2017-09-20T13:15:36Z NONE      

The "old" NetCDF 3 format doesn't have explicit support for unsigned integer types and therefore a recommendation/convention exists to set the variable attribute _Unsigned='true', see NetCDF docs, section Unsigned Data.

Are there any plans to interpret the _Unsigned attribute?

I'd really like to help out, but I fear I still don't know enough about dask to provide an efficient PR for that.

My workaround is to manually convert the variables in question which are of type int8, same data as mentioned in #1576:

unsigned_var = signed_int8_var & 0xff

which results in an int16, which is ok but still 1 byte more than the desired uint8.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1579/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
146908323 MDU6SXNzdWUxNDY5MDgzMjM= 821 datetime units interpretation wrong in special cases forman 206773 closed 0     3 2016-04-08T11:55:44Z 2016-04-09T16:55:10Z 2016-04-09T16:54:10Z NONE      

Hi there,

I have a datetime issue with a certain type of (CF-compliant!) netCDF files orginating from the ESA CCI Sea Surface Temperature project. With other climate data, everthings seems fine.

When I open such a netCDF file, the datetime value(s) of the time dimension seem to be wrong. If I do

ds = xr.open_dataset(nc_path) ds.analysed_sst

I get

<xarray.DataArray 'analysed_sst' (time: 1, lat: 3600, lon: 7200)> [25920000 values with dtype=float64] Coordinates: * time (time) datetime64[ns] 1947-05-12T09:58:14 * lat (lat) float32 -89.975 -89.925 -89.875 -89.825 -89.775 -89.725 ... * lon (lon) float32 -179.975 -179.925 -179.875 -179.825 -179.775 ... Attributes: units: kelvin ...

The time dimension is

int time(time=1); :units = "seconds since 1981-01-01 00:00:00"; :standard_name = "time"; :axis = "T"; :calendar = "gregorian"; :bounds = "time_bnds"; :comment = ""; :long_name = "reference time of sst file"; :_ChunkSizes = 1; // int

and the time value is 915192000. Therefore the correctly interpreted time value must be 2010-01-01T12:00:00 which is 1981-01-01 00:00:00 plus 915192000 seconds.

Here is the link to the data: ftp://anon-ftp.ceda.ac.uk/neodc/esacci/sst/data/lt/Analysis/L4/v01.1/2010/01/01/20100101120000-ESACCI-L4_GHRSST-SSTdepth-OSTIA-GLOB_LT-v02.0-fv01.1.nc

I'm not sure whether this is actually a CF-specific issue with which xarray doesn't want to deal with. If so, could you please give some advice to get arround this. I'm sure other xarray lovers will face this issue sooner or later.

Thanks! -- Norman

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/821/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 24.368ms · About: xarray-datasette