home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

21 rows where user = 12229877 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, closed_at, created_at (date), updated_at (date), closed_at (date)

type 2

  • pull 13
  • issue 8

state 2

  • closed 20
  • open 1

repo 1

  • xarray 21
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
233350060 MDU6SXNzdWUyMzMzNTAwNjA= 1440 If a NetCDF file is chunked on disk, open it with compatible dask chunks Zac-HD 12229877 closed 0     26 2017-06-03T06:24:38Z 2023-09-12T14:55:37Z 2023-09-11T23:05:50Z CONTRIBUTOR      

NetCDF4 data can be saved as chunks on disk, which has several benefits including efficient reads when using a compatible chunk shape. This is particularly important for files with chunk-based compression (ie all nc4 files with compression) or on HPC and parallel file systems (eg), where IO is typically dominated by the number of reads and chunks-from-disk are often cached. Caches are also common in network data backends such as Thredds OPeNDAP, in which case using disk-compatible chunks will reduce cache pressure as well as latency.

Xarray can use chunks, of course, but as of v0.9 the chunk size has to be specified manually - and the easiest way to discover it is to open the file and look at the _Chunksizes attribute for each variable. I propose that xr.open_dataset (and array, and mfdataset) change their default behaviour.

If Dask is available and chunks=None (the default), chunks should be taken from the file on disk. This may lead to a chunked or unchunked dataset. To force an un-chunked load, users can specify chunks={}, or simple .load() the dataset after opening it.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1440/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
290244473 MDU6SXNzdWUyOTAyNDQ0NzM= 1846 Add a suite of property-based tests with Hypothesis Zac-HD 12229877 open 0     3 2018-01-21T03:46:42Z 2022-08-12T17:47:13Z   CONTRIBUTOR      

Hypothesis is a library for writing property-based tests in Python: you describe input data and make assertions that should be true for all examples, then Hypothesis tries to find a counterexample. This came up in #1840, because data == decode(encode(data)) is a classic property.

We could add a (initially small) suite of property-based tests, to complement the traditional example-based tests Xarray is already using. Keeping them in independent files will ensure that they run in CI but the dependency on Hypothesis remains optional for local development.

I have moved jobs and don't have time to do this myself, but I'd be very happy to help anyone who does 😄

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1846/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
411365882 MDU6SXNzdWU0MTEzNjU4ODI= 2773 Feature request: show units in dataset overview Zac-HD 12229877 closed 0     5 2019-02-18T08:57:44Z 2021-05-14T21:16:04Z 2021-05-14T21:16:04Z CONTRIBUTOR      

Here's a hypothetical dataset:

<xarray.Dataset> Dimensions: (time: 3, x: 988, y: 822) Coordinates: * x (x) float64 ... * y (y) float64 ... * time (time) datetime64[ns] ... Data variables: rainfall (time, y, x) float32 ... max_temp (time, y, x) float32 ...

It would be really nice if the units of the coordinates and of the data variables were shown in the Dataset repr, for example as:

<xarray.Dataset> Dimensions: (time: 3, x: 988, y: 822) Coordinates: * x, in metres (x) float64 ... * y, in metres (y) float64 ... * time (time) datetime64[ns] ... Data variables: rainfall, in mm (time, y, x) float32 ... max_temp, in deg C (time, y, x) float32 ...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2773/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
256557897 MDU6SXNzdWUyNTY1NTc4OTc= 1566 When reporting errors, note what value was invalid and why Zac-HD 12229877 closed 0     3 2017-09-11T01:25:44Z 2019-08-19T06:50:15Z 2019-08-19T06:50:15Z CONTRIBUTOR      

I've regularly had to debug problems with unusual or slightly broken data - or my misunderstanding of various layers of the software stack -, and I can't be the only one. For example:

  • open_mfdataset tries to open an invalid file. Which file? Why is it invalid?
    (one was truncated when the download crashed - I had to find it by size)
  • Xarray can't convert a dtype. What dtype couldn't it convert? And of what variable? (it was a boolean mask)

And of course there are many more examples. This manifesto has some good advice, but in essence:

  • Think about the information a new user will need to understand what has gone wrong and fix their code. It's good to be verbose here, because new users need this information most and experienced users won't see it anyway (or might be glad it's there on occasion!).
  • Report:
  • The value that was invalid (or a summary that rules out validity; eg shape and dtype for arrays)
  • The operation that was attempted
  • Why the value was invalid in this operation
  • If possible, what the user can do to fix this

This is quite an open-ended issue; as well as the code changes it probably requires some process changes to ensure that new errors are equally helpful. Ultimately, the goal is for errors to become a positive aid to learning rather than a frustrating barrier.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1566/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
411734784 MDU6SXNzdWU0MTE3MzQ3ODQ= 2775 Improved inference of names when concatenating arrays Zac-HD 12229877 closed 0     1 2019-02-19T04:01:03Z 2019-03-04T05:39:21Z 2019-03-04T05:39:21Z CONTRIBUTOR      

Problem description

Using the name of the first element to concatenate as the name of the concatenated array is only correct if all names are identical. When names vary, using a clear placeholder name or the name of the new dimension would avoid misleading data users.

This came up for me recently when stacking several bands of a satellite image to produce a faceted plot - the resulting colorbar was labelled "blue", even though that was clearly incorrect.

A similar process is probably also desirable for aggregation of units across concatenated arrays - use first if identical, otherwise discard or error depending on the compat argument.

Code Sample, a copy-pastable example if possible

```python ds = xr.Dataset({ k: xr.DataArray(np.random.random((2, 2)), dims="x y".split(), name=k) for k in "blue green red".split() })

arr.name == "blue", could be "band" or "concat_dim"

arr = xr.concat([ds.blue, ds.green, ds.red], dim="band")

label of colorbar is "blue", which is meaningless

arr.plot.imshow(col="band") ```

One implementation that would certainly be nice for this use-case (though perhaps not generally) is that concatenating DataArrays along an entirely new dimension with unique array names and dim passed a string could create a new Index as well, as pd.Index([a.name for a in objs], name=dim).

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None libhdf5: 1.10.3 libnetcdf: 4.4.1.1 xarray: 0.11.2 pandas: 0.23.1 numpy: 1.14.5 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None cyordereddict: None dask: None distributed: None matplotlib: 3.0.2 cartopy: None seaborn: 0.9.0 setuptools: 40.6.2 pip: 10.0.1 conda: None pytest: 4.2.0 IPython: 6.4.0 sphinx: 1.8.0

I'd be happy to write a PR for this if it would be accepted.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2775/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
411755105 MDExOlB1bGxSZXF1ZXN0MjU0MTIyNTUw 2777 Improved default behavior when concatenating DataArrays Zac-HD 12229877 closed 0     14 2019-02-19T05:43:44Z 2019-03-03T22:20:01Z 2019-03-03T22:20:01Z CONTRIBUTOR   0 pydata/xarray/pulls/2777
  • [x] Closes #2775
  • [x] Tests added
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

This is really nice to have when producing faceted plots of satellite observations in various bands, and should be somewhere between useful and harmless in other cases.

Example code:

python ds = xr.Dataset({ k: xr.DataArray(np.random.random((2, 2)), dims="x y".split(), name=k) for k in "blue green red".split() }) xr.concat([ds.blue, ds.green, ds.red], dim="band").plot.imshow(col="band")

Before - facets have an index, colorbar has misleading label:

After - facets have meaningful labels, colorbar has no label:

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2777/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
364247513 MDExOlB1bGxSZXF1ZXN0MjE4NDgxMjkz 2442 Use Hypothesis profile mechanism, not no-op mutation Zac-HD 12229877 closed 0     2 2018-09-26T23:14:33Z 2018-09-27T00:35:46Z 2018-09-26T23:47:27Z CONTRIBUTOR   0 pydata/xarray/pulls/2442

Closes #2441 - Hypothesis 3.72.0 turned a common no-op into an explicit error. Apparently this was such a common misunderstanding that I had done it too :disappointed:

Anyway: while it hasn't been using the deadline at all until now, I've still translated it into the correct form rather than deleting it in order to avoid flaky tests if the Travis VM is slow.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2442/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
289853579 MDExOlB1bGxSZXF1ZXN0MTYzODc5NTc3 1840 Read small integers as float32, not float64 Zac-HD 12229877 closed 0     4 2018-01-19T03:40:51Z 2018-04-19T02:50:25Z 2018-01-23T20:15:29Z CONTRIBUTOR   0 pydata/xarray/pulls/1840
  • [x] Closes #1842
  • [x] Tests added
  • [x] Tests passed
  • [x] Passes flake8 xarray (now part of tests)
  • [x] Fully documented, including whats-new.rst for all changes

Most satellites produce images with color depth in the range of eight to sixteen bits, which are therefore often stored as unsigned integers (with the quality mask in another variable). If you're lucky, they also have a scale_factor attribute and Xarray can automatically convert the integers to floats representing albedo.

This is fantastically convenient, and avoids all the bit-depth bugs from misremembered specifications. However, loading data as float64 when float32 is sufficient doubles memory usage in IO (even on multi-TB datasets...). While immediately downcasting helps, it's no substitute for doing the right thing first.

So this patch does some conservative checks, and if we can be sure float32 is safe we use that instead.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1840/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
303103716 MDExOlB1bGxSZXF1ZXN0MTczNDU1NzQz 1972 Starter property-based test suite Zac-HD 12229877 closed 0     15 2018-03-07T13:45:07Z 2018-03-20T12:51:28Z 2018-03-20T12:40:12Z CONTRIBUTOR   0 pydata/xarray/pulls/1972
  • [x] Closes #1846
  • [x] Tests added - you bet
  • [x] Tests passed - well, the code under test hasn't changed...

This is a small property-based test suite, to give two examples of the kinds of tests that we could write for Xarray using Hypothesis.

  1. For any array, encoding and decoding it with a CF coder outputs an identical array. As you would hope, these tests pass.
  2. For any 2D array, you can call the 2D plotting methods without raising an exception. Alas, this is not the case, and Hypothesis will show you the failing inputs (and matplotlib-related tracebacks) to prove it. (Contributing a very small feature to matplotlib was shockingly painful, so I'm not planning to take a similar suite upstream myself unless something changes)

Things that I would like to know:

  • Have I build-wrangled something reasonable here?
  • Will anyone else contribute property-based tests? I'm happy to help people debug or work out how to test something, but I simply don't have the time to write another test suite for free.
  • Is this something you want?
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1972/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
302695966 MDExOlB1bGxSZXF1ZXN0MTczMTU0MTQ5 1967 Fix RGB imshow with X or Y dim of size one Zac-HD 12229877 closed 0     7 2018-03-06T13:14:04Z 2018-03-09T01:49:08Z 2018-03-08T23:51:45Z CONTRIBUTOR   0 pydata/xarray/pulls/1967
  • [x] Closes #1966
  • [x] Tests added (for all bug fixes or enhancements)
  • [x] Tests passed (for all non-documentation changes)
  • [x] Fully documented, including whats-new.rst for all changes

Not much more to say, really. Thanks to @fmaussion for pinging me - definitely faster to track down when you know the code!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1967/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
295055292 MDExOlB1bGxSZXF1ZXN0MTY3NjMyMDY1 1893 Use correct dtype for RGB image alpha channel Zac-HD 12229877 closed 0     4 2018-02-07T09:00:33Z 2018-02-14T05:42:15Z 2018-02-12T22:12:13Z CONTRIBUTOR   0 pydata/xarray/pulls/1893
  • [x] Closes #1880
  • [x] Tests added (for all bug fixes or enhancements)
  • [ ] Tests passed (for all non-documentation changes)
  • [x] Fully documented (bugfix for earlier change, no additional note)

The cause of the bug in #1880 was that I had forgotten to specify the dtype when creating an alpha channel, and therefore concatenating it cast the all the data to float64. I've fixed that, corrected the alpha value for integer arrays, and avoided a pointless copy to save memory.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1893/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
282369945 MDExOlB1bGxSZXF1ZXN0MTU4NTU5OTM4 1787 Include units (if set) in plot labels Zac-HD 12229877 closed 0     7 2017-12-15T09:40:16Z 2018-02-05T04:01:16Z 2018-02-05T04:01:16Z CONTRIBUTOR   0 pydata/xarray/pulls/1787
  • [x] Closes #1630
  • [x] Tests passed
  • [x] Passes git diff upstream/master **/*py | flake8 --diff
  • [x] Fully documented, including whats-new.rst for all changes - details of label not previously documented
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1787/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
287747803 MDExOlB1bGxSZXF1ZXN0MTYyMzUzNzQ4 1819 Normalisation for RGB imshow Zac-HD 12229877 closed 0     6 2018-01-11T11:09:12Z 2018-01-19T05:01:19Z 2018-01-19T05:01:07Z CONTRIBUTOR   0 pydata/xarray/pulls/1819

Follow-up to #1796, where normalisation and clipping of RGB[A] values were deferred so that we could match any upstream API. matplotlib/matplotlib#10220 implements clipping to the valid range, but a strong consensus against RGB normalisation in matplotlib has emerged.

This pull therefore implements normalisation, and clips values only where our normalisation has pushed them out of range.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1819/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
288322322 MDExOlB1bGxSZXF1ZXN0MTYyNzc2ODAx 1824 Make `flake8 xarray` pass Zac-HD 12229877 closed 0     3 2018-01-13T11:37:43Z 2018-01-14T23:10:01Z 2018-01-14T20:49:20Z CONTRIBUTOR   0 pydata/xarray/pulls/1824

Closes #1741 by @mrocklin (who did most of the work I'm presenting here). I had an evening free, so I rebased the previous pull on master, fixed the conflicts, and then made everything pass with flake8's default settings (including line length). My condolences to whoever gets to review this diff!

The single change any non-pedant will notice: Travis now fails if there is a flake8 warning anywhere. My experience in other projects is that this is the only way to actually keep flake8 passing - it's just unrealistic to expect perfect attention to detail from every contributor, but "make the build green before we merge" is widely understood 😄

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1824/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
283566613 MDExOlB1bGxSZXF1ZXN0MTU5NDE5NjYw 1796 Support RGB[A] arrays in plot.imshow() Zac-HD 12229877 closed 0     16 2017-12-20T13:43:16Z 2018-01-11T03:20:02Z 2018-01-11T03:14:36Z CONTRIBUTOR   0 pydata/xarray/pulls/1796
  • [x] Tests added (for all bug fixes or enhancements)
  • [x] Tests passed (for all non-documentation changes)
  • [x] Passes git diff upstream/master **/*py | flake8 --diff
  • [x] Fully documented, including whats-new.rst for all changes

This patch brings xarray.plot.imshow up to parity with matplotlib.pyplot.imshow:

  • As well as 2D images (greyscale / luminance, using a colormap), it now supports a third dimension for RGB or RGBA channels. For consistency with 2D arrays, missing data is plotted as transparent pixels
  • Being Xarray, users need not care about the order of their dimensions - we infer the right one for color, and warn if it's ambiguous.
  • ~~Using robust=True for easy saturation is really nice. Having it adjust each channel and facet in the same way is essential for this to work, which it does.~~
  • ~~Matplotlib wraps out-of-range colors, leading to crazy maps and serious interpretation problems if it's only a small region. Xarray clips (ie saturates) to the valid range instead.~~

I'm going to implement clip-to-range and color normalization upstream in matplotlib, then open a second PR here so that Xarray can use the same interface.

And that's the commit log! It's not really a big feature, but each of the parts can be fiddly so I've broken the commits up logically 😄

Finally, a motivating example: visible-light Landsat data before, during (top-right), and after a fire at Sampson's Flat, Australia:

arr = ds['red green blue'.split()].to_array(dim='band') / (2 ** 12)
arr.plot.imshow(col='time', col_wrap=5, robust=True)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1796/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
282087995 MDExOlB1bGxSZXF1ZXN0MTU4MzQ3NTU2 1782 Plot nans Zac-HD 12229877 closed 0     3 2017-12-14T12:43:01Z 2017-12-15T21:10:13Z 2017-12-15T17:31:39Z CONTRIBUTOR   0 pydata/xarray/pulls/1782
  • [x] Closes #1780
  • [x] Tests added (for all bug fixes or enhancements)
  • [x] Tests passed (for all non-documentation changes)
  • [x] Passes git diff upstream/master **/*py | flake8 --diff (remove if you did not edit any Python files)
  • [x] Fully documented, including whats-new.rst for all changes

CC @fmaussion for review; @BexDunn for interest

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1782/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
282000017 MDU6SXNzdWUyODIwMDAwMTc= 1780 DataArray.plot raises exception if contents are all NaN Zac-HD 12229877 closed 0     7 2017-12-14T06:58:38Z 2017-12-15T17:31:39Z 2017-12-15T17:31:39Z CONTRIBUTOR      

Code Sample, a copy-pastable example if possible

python xr.DataArray(np.full((2, 2), 0)).plot.imshow() # works xr.DataArray(np.full((2, 2), np.nan)).plot.imshow() # doesn't

Problem description

If you try to plot a DataArray which is entirely filled with NaN, you get an exception. This is really, really annoying for people doing satellite image analysis, especially with timeseries where just one step might be very cloudy.

Expected Output

Plot of the array extent, entirely in the missing-value colour as for partially-missing data.

Output of xr.show_versions()

Confirmed on Windows/Linux/OSX, Xarray 0.9.6 and 0.10.0; sample show_versions below. It's a pretty obvious error though in failing to handle the no-non-missing-data case when determining data range.

INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-696.10.3.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8 xarray: 0.10.0 pandas: 0.20.3 numpy: 1.13.3 scipy: 0.19.1 netCDF4: 1.3.1 h5netcdf: None Nio: None bottleneck: 1.2.1 cyordereddict: None dask: 0.15.3 matplotlib: 2.1.0 cartopy: None seaborn: 0.8.0 setuptools: 36.5.0.post20170921 pip: 9.0.1 conda: 4.3.30 pytest: 3.2.1 IPython: 6.1.0 sphinx: 1.6.3
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1780/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
268011986 MDExOlB1bGxSZXF1ZXN0MTQ4MzgxNzE1 1653 Minor documentation fixes Zac-HD 12229877 closed 0     1 2017-10-24T12:28:07Z 2017-10-25T03:47:25Z 2017-10-25T03:47:18Z CONTRIBUTOR   0 pydata/xarray/pulls/1653

This pull updates the comparison between Xarray and Pandas ND-Panels, fixes the zenodo links, and improves our configuration for the docs build. Closes #1541.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1653/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
237710101 MDU6SXNzdWUyMzc3MTAxMDE= 1462 Dataset.to_dataframe loads dask arrays into memory Zac-HD 12229877 closed 0     2 2017-06-22T01:46:30Z 2017-10-13T02:15:47Z 2017-10-13T02:15:47Z CONTRIBUTOR      

to_dataframe should return a Dask Dataframe, instead of eagerly loading data. This is probably pretty easy to implement (thanks to dask), but will require some care to ensure that no intermediate results (or indices!) are loaded. We should also check the to_series method.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1462/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
216611104 MDExOlB1bGxSZXF1ZXN0MTEyMzY1ODc0 1322 Shorter repr for attributes Zac-HD 12229877 closed 0     6 2017-03-24T00:26:26Z 2017-04-03T00:50:28Z 2017-04-03T00:47:45Z CONTRIBUTOR   0 pydata/xarray/pulls/1322

NetCDF files often have tens of attributes, including multi-paragraph summaries or the full modification history of the file. It's great to have this available in the .attrs, but we can truncate it substantially in the repr! Hopefully this will stop people writing data.attrs = {} and discarding metadata in interactive workflows for the sake of cleaner output.

  • [x] closes #1319
  • [x] test data adjusted
  • [x] passes git diff upstream/master | flake8 --diff
  • [x] whatsnew entry
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1322/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
216329175 MDU6SXNzdWUyMTYzMjkxNzU= 1319 Truncate long lines in repr of Dataset.attrs Zac-HD 12229877 closed 0     5 2017-03-23T07:21:01Z 2017-04-03T00:47:45Z 2017-04-03T00:47:45Z CONTRIBUTOR      

When loading from NetCDF, Dataset.attrs often has a few long strings, which may even have embedded newlines (eg a multi-paragraph summary or references section). It's lovely that these are available, but they tend to make the repr very long and poorly formatted - to the point that many Jupyter notebooks begin by discarding the attrs, which makes it rather pointless to store or display metadata at all!

Given that these values are already truncated at 500 characters (including the indicative ..., but not the start point), I propose that they should instead be truncated to 80 characters including the indentation and key (as values are). For the sake of pretty-printing, this should also replace newlines or tabs with spaces and truncate early if an empty line is encountered.

Another solution would be add appropriate indentation following newlines or wrapping, so that the structure remains clear. However, I think that it is better to print a fairly minimal representation of the metadata by default.

```

xr.open_dataset('http://dapds00.nci.org.au/thredds/dodsC/uc0/rs0_dev/20170215-stacked_sample/LS7_ETM_NBART_3577_15_-40.ncml')

<xarray.Dataset> Dimensions: (time: 246, x: 4000, y: 4000) Coordinates: * y (y) float64 -3.9e+06 -3.9e+06 -3.9e+06 -3.9e+06 -3.9e+06 ... * x (x) float64 1.5e+06 1.5e+06 1.5e+06 1.5e+06 1.5e+06 1.5e+06 ... * time (time) datetime64[ns] 1999-07-16T23:49:39 1999-07-25T23:43:07 ... Data variables: crs int32 ... blue (time, y, x) float64 ... green (time, y, x) float64 ... red (time, y, x) float64 ... nir (time, y, x) float64 ... swir1 (time, y, x) float64 ... swir2 (time, y, x) float64 ... Attributes: date_created: 2017-03-07T11:57:26.511217 Conventions: CF-1.6, ACDD-1.3 history: 2017-03-07T11:57:26.511307+11:00 adh547 datacube-ncml (1.2.2+23.gd1f3512.dirty) ls7_nbart_albers.yaml, 1.0.6a, /short/v10/datacube/002/LS7_ETM_NBART/LS7_ETM_NBART_3577_15_-40.ncml, (15, -40) # Created NCML file to aggregate multiple NetCDF files along the time dimension geospatial_bounds: POLYGON ((148.49626113888138 -34.828378308133452,148.638689676063308 -35.720318326735864,149.734176111491877 -35.599556747691196,149.582601578289143 -34.708911907843387,148.49626113888138 -34.828378308133452)) geospatial_bounds_crs: EPSG:4326 geospatial_lat_min: -35.7203183267 geospatial_lat_max: -34.7089119078 geospatial_lat_units: degrees_north geospatial_lon_min: 148.496261139 geospatial_lon_max: 149.734176111 geospatial_lon_units: degrees_east comment: - Ground Control Points (GCP): new GCP chips released by USGS in Dec 2015 are used for re-processing - Geometric QA: each product undergoes geometric assessment and the assessment result will be recorded within v2 AGDC for filtering/masking purposes. - Processing parameter settings: the minimum number of GCPs for Ortho-rectified product generation has been reduced from 30 to 10. - DEM: 1 second SRTM DSM is used for Ortho-rectification. - Updated Calibration Parameter File (CPF): the latest/cu... product_suite: Surface Reflectance NBAR+T 25m publisher_email: earth.observation@ga.gov.au keywords_vocabulary: GCMD product_version: 2 cdm_data_type: Grid references: - Berk, A., Anderson, G.P., Acharya, P.K., Hoke, M.L., Chetwynd, J.H., Bernstein, L.S., Shettle, E.P., Matthew, M.W., and Adler-Golden, S.M. (2003) Modtran 4 Version 3 Revision 1 User s manual. Airforce Research Laboratory, Hanscom, MA, USA. - Chander, G., Markham, B.L., and Helder, D.L. (2009) Summary of current radiometric calibration coefficients for Landsat MSS, TM, ETM+, and EO-1 ALI sensors. Remote Sensing of Environment 113, 893-903. - Edberg, R., and Oliver, S. (2013) Projection-Indep... platform: LANDSAT-7 keywords: AU/GA,NASA/GSFC/SED/ESD/LANDSAT,REFLECTANCE,ETM+,TM,OLI,EARTH SCIENCE publisher_name: Section Leader, Operations Section, NEMO, Geoscience Australia institution: Commonwealth of Australia (Geoscience Australia) acknowledgment: Landsat data is provided by the United States Geological Survey (USGS) through direct reception of the data at Geoscience Australias satellite reception facility or download. license: CC BY Attribution 4.0 International License title: Surface Reflectance NBAR+T 25 v2 summary: Surface Reflectance (SR) is a suite of Earth Observation (EO) products from GA. The SR product suite provides standardised optical surface reflectance datasets using robust physical models to correct for variations in image radiance values due to atmospheric properties, and sun and sensor geometry. The resulting stack of surface reflectance grids are consistent over space and time which is instrumental in identifying and quantifying environmental change. SR is based on radiance data from the... instrument: ETM source: LANDSAT 7 ETM+ surface observation publisher_url: http://www.ga.gov.au ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1319/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 25.032ms · About: xarray-datasette