id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
233350060,MDU6SXNzdWUyMzMzNTAwNjA=,1440,"If a NetCDF file is chunked on disk, open it with compatible dask chunks",12229877,closed,0,,,26,2017-06-03T06:24:38Z,2023-09-12T14:55:37Z,2023-09-11T23:05:50Z,CONTRIBUTOR,,,,"NetCDF4 data can be saved as chunks on disk, which [has several benefits](https://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters) including efficient reads when using a compatible chunk shape.  This is particularly important for files with chunk-based compression (ie all nc4 files with compression) or on HPC and parallel file systems ([eg](http://anusf.anu.edu.au/~jxa900/pres/COMP4300-2015/jxa900-Parallel-IO-reduced.pdf)), where IO is typically dominated by the number of reads and chunks-from-disk are often cached.  Caches are also common in network data backends such as Thredds OPeNDAP, in which case using disk-compatible chunks will reduce cache pressure as well as latency.

Xarray *can* use chunks, of course, but as of v0.9 the chunk size has to be specified manually - and the easiest way to discover it is to open the file and look at the `_Chunksizes` attribute for each variable.  I propose that `xr.open_dataset` (and `array`, and `mfdataset`) change their default behaviour.

If Dask is available and `chunks=None` (the default), `chunks` should be taken from the file on disk.  This may lead to a chunked or unchunked dataset.  To force an un-chunked load, users can specify `chunks={}`, or simple `.load()` the dataset after opening it.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1440/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
290244473,MDU6SXNzdWUyOTAyNDQ0NzM=,1846,Add a suite of property-based tests with Hypothesis,12229877,open,0,,,3,2018-01-21T03:46:42Z,2022-08-12T17:47:13Z,,CONTRIBUTOR,,,,"[Hypothesis](https://hypothesis.readthedocs.io/en/master/) is a library for writing property-based tests in Python: you describe input data and make assertions that should be true for all examples, then Hypothesis tries to find a counterexample.  This came up in #1840, because `data == decode(encode(data))` is a classic property.

We could add a (initially small) suite of property-based tests, to complement the traditional example-based tests Xarray is already using.  Keeping them in independent files will ensure that they run in CI but the dependency on Hypothesis remains optional for local development.

I have moved jobs and don't have time to do this myself, but I'd be very happy to help anyone who does 😄 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1846/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
411365882,MDU6SXNzdWU0MTEzNjU4ODI=,2773,Feature request: show units in dataset overview,12229877,closed,0,,,5,2019-02-18T08:57:44Z,2021-05-14T21:16:04Z,2021-05-14T21:16:04Z,CONTRIBUTOR,,,,"Here's a hypothetical dataset:

```
<xarray.Dataset>
Dimensions:  (time: 3, x: 988, y: 822)
Coordinates:
  * x         (x) float64 ...
  * y         (y) float64 ...
  * time      (time) datetime64[ns] ...
Data variables:
    rainfall  (time, y, x) float32 ...
    max_temp  (time, y, x) float32 ...
```

It would be really nice if the units of the coordinates and of the data variables were shown in the `Dataset` repr, for example as:

```
<xarray.Dataset>
Dimensions:  (time: 3, x: 988, y: 822)
Coordinates:
  * x, in metres         (x)            float64 ...
  * y, in metres         (y)            float64 ...
  * time                 (time)         datetime64[ns] ...
Data variables:
    rainfall, in mm      (time, y, x)   float32 ...
    max_temp, in deg C   (time, y, x)   float32 ...
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2773/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
256557897,MDU6SXNzdWUyNTY1NTc4OTc=,1566,"When reporting errors, note what value was invalid and why",12229877,closed,0,,,3,2017-09-11T01:25:44Z,2019-08-19T06:50:15Z,2019-08-19T06:50:15Z,CONTRIBUTOR,,,,"I've regularly had to debug problems with unusual or slightly broken data - or my misunderstanding of various layers of the software stack -, and I can't be the only one.  For example:

- `open_mfdataset` tries to open an invalid file.  Which file?  Why is it invalid?  
  (one was truncated when the download crashed - I had to find it by size)
- Xarray can't convert a dtype.  What dtype couldn't it convert?  And of what variable?  (it was a boolean mask)

And of course there are many more examples.  [This manifesto](http://www.drmaciver.com/2013/03/a-rewritten-manifesto-for-error-reporting/) has some good advice, but in essence:

- Think about the information a new user will need to understand what has gone wrong and fix their code.  It's good to be verbose here, because new users need this information most and experienced users won't see it anyway (or might be glad it's there on occasion!).
- Report:
  - The value that was invalid
    (or a summary that rules out validity; eg shape and dtype for arrays)
  - The operation that was attempted
  - Why the value was invalid in this operation
  - If possible, what the user can do to fix this

This is quite an open-ended issue; as well as the code changes it probably requires some process changes to ensure that new errors are equally helpful.  Ultimately, the goal is for errors to become a positive aid to learning rather than a frustrating barrier.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1566/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
411734784,MDU6SXNzdWU0MTE3MzQ3ODQ=,2775,Improved inference of names when concatenating arrays,12229877,closed,0,,,1,2019-02-19T04:01:03Z,2019-03-04T05:39:21Z,2019-03-04T05:39:21Z,CONTRIBUTOR,,,,"#### Problem description

Using the name of the first element to concatenate as the name of the concatenated array is only correct if all names are identical.  When names vary, using a clear placeholder name or the name of the new dimension would avoid misleading data users.

This came up for me recently when stacking several bands of a satellite image to produce a faceted plot - the resulting colorbar was labelled ""blue"", even though that was clearly incorrect.

A similar process is probably also desirable for aggregation of units across concatenated arrays - use first if identical, otherwise discard or error depending on the `compat` argument.

#### Code Sample, a copy-pastable example if possible

```python
ds = xr.Dataset({
    k: xr.DataArray(np.random.random((2, 2)), dims=""x y"".split(), name=k) 
    for k in ""blue green red"".split()
})
# arr.name == ""blue"", could be ""band"" or ""concat_dim""
arr = xr.concat([ds.blue, ds.green, ds.red], dim=""band"")
# label of colorbar is ""blue"", which is meaningless
arr.plot.imshow(col=""band"")
```

![image](https://user-images.githubusercontent.com/12229877/52989142-3276f780-3456-11e9-940a-dd97778736cb.png)

One implementation that would certainly be nice for this use-case (though perhaps not generally) is that concatenating `DataArray`s along an entirely new dimension with unique array names and `dim` passed a string could create a new Index as well, as `pd.Index([a.name for a in objs], name=dim)`.


<details>

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
libhdf5: 1.10.3
libnetcdf: 4.4.1.1

xarray: 0.11.2
pandas: 0.23.1
numpy: 1.14.5
scipy: 1.2.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
cyordereddict: None
dask: None
distributed: None
matplotlib: 3.0.2
cartopy: None
seaborn: 0.9.0
setuptools: 40.6.2
pip: 10.0.1
conda: None
pytest: 4.2.0
IPython: 6.4.0
sphinx: 1.8.0

</details>

I'd be happy to write a PR for this if it would be accepted.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2775/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
282000017,MDU6SXNzdWUyODIwMDAwMTc=,1780,DataArray.plot raises exception if contents are all NaN,12229877,closed,0,,,7,2017-12-14T06:58:38Z,2017-12-15T17:31:39Z,2017-12-15T17:31:39Z,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible

```python
xr.DataArray(np.full((2, 2), 0)).plot.imshow()  # works
xr.DataArray(np.full((2, 2), np.nan)).plot.imshow()  # doesn't
```
#### Problem description

If you try to plot a `DataArray` which is entirely filled with `NaN`, you get an exception.  This is *really, really* annoying for people doing satellite image analysis, especially with timeseries where just one step might be very cloudy.

#### Expected Output

Plot of the array extent, entirely in the missing-value colour as for partially-missing data.

#### Output of ``xr.show_versions()``

Confirmed on Windows/Linux/OSX, Xarray 0.9.6 and 0.10.0; sample show_versions below.  It's a pretty obvious error though in failing to handle the no-non-missing-data case when determining data range.

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-696.10.3.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8

xarray: 0.10.0
pandas: 0.20.3
numpy: 1.13.3
scipy: 0.19.1
netCDF4: 1.3.1
h5netcdf: None
Nio: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.15.3
matplotlib: 2.1.0
cartopy: None
seaborn: 0.8.0
setuptools: 36.5.0.post20170921
pip: 9.0.1
conda: 4.3.30
pytest: 3.2.1
IPython: 6.1.0
sphinx: 1.6.3
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1780/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
237710101,MDU6SXNzdWUyMzc3MTAxMDE=,1462,Dataset.to_dataframe loads dask arrays into memory,12229877,closed,0,,,2,2017-06-22T01:46:30Z,2017-10-13T02:15:47Z,2017-10-13T02:15:47Z,CONTRIBUTOR,,,,"`to_dataframe` should return a Dask Dataframe, instead of eagerly loading data.  This is probably pretty easy to implement ([thanks to dask](http://dask.pydata.org/en/latest/dataframe-api.html#dask.dataframe.from_dask_array)), but will require some care to ensure that no intermediate results (or indices!) are loaded.  We should also check the `to_series` method.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1462/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
216329175,MDU6SXNzdWUyMTYzMjkxNzU=,1319,Truncate long lines in repr of Dataset.attrs,12229877,closed,0,,,5,2017-03-23T07:21:01Z,2017-04-03T00:47:45Z,2017-04-03T00:47:45Z,CONTRIBUTOR,,,,"When loading from NetCDF, `Dataset.attrs` often has a few long strings, which may even have embedded newlines (eg a multi-paragraph `summary` or `references` section).  It's lovely that these are available, but they tend to make the repr very long and poorly formatted - to the point that many Jupyter notebooks begin by discarding the `attrs`, which makes it rather pointless to store or display metadata at all!

Given that these values are already truncated at 500 characters (including the indicative `...`, but not the start point), I propose that they should instead be truncated to 80 characters including the indentation and key (as `values` are).  For the sake of pretty-printing, this should also replace newlines or tabs with spaces and truncate early if an empty line is encountered.

Another solution would be add appropriate indentation following newlines or wrapping, so that the structure remains clear.  However, I think that it is better to print a fairly minimal representation of the metadata by default.

```
>>> xr.open_dataset('http://dapds00.nci.org.au/thredds/dodsC/uc0/rs0_dev/20170215-stacked_sample/LS7_ETM_NBART_3577_15_-40.ncml')

<xarray.Dataset>
Dimensions:  (time: 246, x: 4000, y: 4000)
Coordinates:
  * y        (y) float64 -3.9e+06 -3.9e+06 -3.9e+06 -3.9e+06 -3.9e+06 ...
  * x        (x) float64 1.5e+06 1.5e+06 1.5e+06 1.5e+06 1.5e+06 1.5e+06 ...
  * time     (time) datetime64[ns] 1999-07-16T23:49:39 1999-07-25T23:43:07 ...
Data variables:
    crs      int32 ...
    blue     (time, y, x) float64 ...
    green    (time, y, x) float64 ...
    red      (time, y, x) float64 ...
    nir      (time, y, x) float64 ...
    swir1    (time, y, x) float64 ...
    swir2    (time, y, x) float64 ...
Attributes:
    date_created: 2017-03-07T11:57:26.511217
    Conventions: CF-1.6, ACDD-1.3
    history: 2017-03-07T11:57:26.511307+11:00 adh547 datacube-ncml (1.2.2+23.gd1f3512.dirty) ls7_nbart_albers.yaml, 1.0.6a, /short/v10/datacube/002/LS7_ETM_NBART/LS7_ETM_NBART_3577_15_-40.ncml, (15, -40)  # Created NCML file to aggregate multiple NetCDF files along the time dimension
    geospatial_bounds: POLYGON ((148.49626113888138 -34.828378308133452,148.638689676063308 -35.720318326735864,149.734176111491877 -35.599556747691196,149.582601578289143 -34.708911907843387,148.49626113888138 -34.828378308133452))
    geospatial_bounds_crs: EPSG:4326
    geospatial_lat_min: -35.7203183267
    geospatial_lat_max: -34.7089119078
    geospatial_lat_units: degrees_north
    geospatial_lon_min: 148.496261139
    geospatial_lon_max: 149.734176111
    geospatial_lon_units: degrees_east
    comment: -	Ground Control Points (GCP): new GCP chips released by USGS in Dec 2015 are used for re-processing
-	Geometric QA: each product undergoes geometric assessment and the assessment result will be recorded within v2 AGDC for filtering/masking purposes.
-	Processing parameter settings: the minimum number of GCPs for Ortho-rectified product generation has been reduced from 30 to 10.
-	DEM: 1 second SRTM DSM is used for Ortho-rectification.
-	Updated Calibration Parameter File (CPF): the latest/cu...
    product_suite: Surface Reflectance NBAR+T 25m
    publisher_email: earth.observation@ga.gov.au
    keywords_vocabulary: GCMD
    product_version: 2
    cdm_data_type: Grid
    references: -	Berk, A., Anderson, G.P., Acharya, P.K., Hoke, M.L., Chetwynd, J.H., Bernstein, L.S., Shettle, E.P., Matthew, M.W., and Adler-Golden, S.M. (2003) Modtran 4 Version 3 Revision 1 User s manual. Airforce Research Laboratory, Hanscom, MA, USA.
-	Chander, G., Markham, B.L., and Helder, D.L. (2009) Summary of current radiometric calibration coefficients for Landsat MSS, TM, ETM+, and EO-1 ALI sensors. Remote Sensing of Environment 113, 893-903.
-	Edberg, R., and Oliver, S. (2013) Projection-Indep...
    platform: LANDSAT-7
    keywords: AU/GA,NASA/GSFC/SED/ESD/LANDSAT,REFLECTANCE,ETM+,TM,OLI,EARTH SCIENCE
    publisher_name: Section Leader, Operations Section, NEMO, Geoscience Australia
    institution: Commonwealth of Australia (Geoscience Australia)
    acknowledgment: Landsat data is provided by the United States Geological Survey (USGS) through direct reception of the data at Geoscience Australias satellite reception facility or download.
    license: CC BY Attribution 4.0 International License
    title: Surface Reflectance NBAR+T 25 v2
    summary: Surface Reflectance (SR) is a suite of Earth Observation (EO) products from GA. The SR product suite provides standardised optical surface reflectance datasets using robust 
physical models to correct for variations in image radiance values due to atmospheric properties, and sun and sensor geometry. The resulting stack of surface reflectance
grids are consistent over space and time which is instrumental in identifying and quantifying environmental change. SR is based on radiance data from the...
    instrument: ETM
    source: LANDSAT 7 ETM+ surface observation
    publisher_url: http://www.ga.gov.au
```","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1319/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue