home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

30 rows where type = "pull" and user = 1197350 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: title, comments, body, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 27
  • open 3

type 1

  • pull · 30 ✖

repo 1

  • xarray 30
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
421064313 MDExOlB1bGxSZXF1ZXN0MjYxMjAyMDU2 2813 [WIP] added protect_dataset_variables_inplace to open_zarr rabernat 1197350 open 0     3 2019-03-14T14:50:15Z 2024-03-25T14:05:24Z   MEMBER   0 pydata/xarray/pulls/2813

This adds the same call to _protect_dataset_variables_inplace to open_zarr which we find in open_dataset. It wraps the arrays with indexing.MemoryCachedArray.

As far as I can tell, it does not work, in the sense that nothing is cached.

  • [ ] One possible way to close #2812
  • [ ] Tests added
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2813/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
421070999 MDExOlB1bGxSZXF1ZXN0MjYxMjA3MTYz 2814 [WIP] Use zarr internal LRU caching rabernat 1197350 open 0     2 2019-03-14T15:01:06Z 2024-03-25T14:00:50Z   MEMBER   0 pydata/xarray/pulls/2814

Alternative way to close #2812. This uses zarr's own caching.

In contrast to #2813, this does work.

  • [ ] Closes #2812
  • [ ] Tests added
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2814/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1983894219 PR_kwDOAMm_X85e8V31 8428 Add mode='a-': Do not overwrite coordinates when appending to Zarr with `append_dim` rabernat 1197350 closed 0     3 2023-11-08T15:41:58Z 2023-12-01T04:21:57Z 2023-12-01T03:58:54Z MEMBER   0 pydata/xarray/pulls/8428

This implements the 1b option described in #8427.

  • [x] Closes #8427
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8428/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
357808970 MDExOlB1bGxSZXF1ZXN0MjEzNzM2NTAx 2405 WIP: don't create indexes on multidimensional dimensions rabernat 1197350 closed 0     7 2018-09-06T20:13:11Z 2023-07-19T18:33:17Z 2023-07-19T18:33:17Z MEMBER   0 pydata/xarray/pulls/2405
  • [x] Closes #2368, Closes #2233
  • [ ] Tests added (for all bug fixes or enhancements)
  • [ ] Tests passed (for all non-documentation changes)
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

This is just a start to the solution proposed in #2368. A surprisingly small number of tests broke in my local environment.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2405/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
467908830 MDExOlB1bGxSZXF1ZXN0Mjk3NDQ1NDc3 3131 WIP: tutorial on merging datasets rabernat 1197350 open 0 TomNicholas 35968931   10 2019-07-15T01:28:25Z 2022-06-09T14:50:17Z   MEMBER   0 pydata/xarray/pulls/3131
  • [x] Closes #1391
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API

This is a start on a tutorial about merging / combining datasets.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3131/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
837243943 MDExOlB1bGxSZXF1ZXN0NTk3NjA4NTg0 5065 Zarr chunking fixes rabernat 1197350 closed 0     32 2021-03-22T01:35:22Z 2021-04-26T16:37:43Z 2021-04-26T16:37:43Z MEMBER   0 pydata/xarray/pulls/5065
  • [x] Closes #2300, closes #5056
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

This PR contains two small, related updates to how Zarr chunks are handled.

  1. We now delete the encoding attribute at the Variable level whenever chunk is called. The persistence of chunk encoding has been the source of lots of confusion (see #2300, #4046, #4380, https://github.com/dcs4cop/xcube/issues/347)
  2. Added a new option called safe_chunks in to_zarr which allows for bypassing the requirement of the many-to-one relationship between Zarr chunks and Dask chunks (see #5056).

Both these touch the internal logic for how chunks are handled, so I thought it was easiest to tackle them with a single PR.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5065/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
614814400 MDExOlB1bGxSZXF1ZXN0NDE1MjkyMzM3 4047 Document Xarray zarr encoding conventions rabernat 1197350 closed 0     3 2020-05-08T15:29:14Z 2020-05-22T21:59:09Z 2020-05-20T17:04:02Z MEMBER   0 pydata/xarray/pulls/4047

When we implemented the Zarr backend, we made some ad hoc choices about how to encode NetCDF data in Zarr. At this stage, it would be useful to explicitly document this encoding. I decided to put it on the "Xarray Internals" page, but I'm open to moving if folks feel it fits better elsewhere.

cc @jeffdlb, @WardF, @DennisHeimbigner

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4047/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
467776251 MDExOlB1bGxSZXF1ZXN0Mjk3MzU0NTEx 3121 Allow other tutorial filename extensions rabernat 1197350 closed 0     3 2019-07-13T23:27:44Z 2019-07-14T01:07:55Z 2019-07-14T01:07:51Z MEMBER   0 pydata/xarray/pulls/3121
  • [x] Closes #3118
  • [ ] Tests added
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API

Together with https://github.com/pydata/xarray-data/pull/15, this allows us to generalize out tutorial datasets to non netCDF files. But it is backwards compatible--if there is no file suffix, it will append .nc.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3121/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
467674875 MDExOlB1bGxSZXF1ZXN0Mjk3MjgyNzA1 3106 Replace sphinx_gallery with notebook rabernat 1197350 closed 0     3 2019-07-13T05:35:34Z 2019-07-13T14:03:20Z 2019-07-13T14:03:19Z MEMBER   0 pydata/xarray/pulls/3106

Today @jhamman and I discussed how to refactor our somewhat fragmented "examples". We decided to basically copy the approach of the dask-examples repo, but have it live here in the main xarray repo. Basically this approach is: - all examples are notebooks - examples are rendered during doc build by nbsphinx - we will eventually have a binder that works with all of the same examples

This PR removes the dependency on sphinx_gallery and replaces the existing gallery with a standalone notebook called visualization_gallery.ipynb. However, not all of the links that worked in the gallery work here, since we are now using nbsphinx to render the notebooks (see https://github.com/spatialaudio/nbsphinx/issues/308).

Really important to get @dcherian's feedback on this, as he was the one who originally introduced the gallery. My view is that having everything as notebooks makes examples easier to maintain. But I'm curious to hear other views.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3106/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
467658326 MDExOlB1bGxSZXF1ZXN0Mjk3MjcwNjYw 3105 Switch doc examples to use nbsphinx rabernat 1197350 closed 0     4 2019-07-13T02:28:34Z 2019-07-13T04:53:09Z 2019-07-13T04:52:52Z MEMBER   0 pydata/xarray/pulls/3105

This is the beginning of the docs refactor we have in mind for the sprint tomorrow.

We will merge things first to the scipy19-docs branch so we can make sure things build on RTD.

http://xarray.pydata.org/en/scipy19-docs

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3105/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
431199282 MDExOlB1bGxSZXF1ZXN0MjY4OTI3MjU0 2881 decreased pytest verbosity rabernat 1197350 closed 0     1 2019-04-09T21:12:50Z 2019-04-09T23:36:01Z 2019-04-09T23:34:22Z MEMBER   0 pydata/xarray/pulls/2881

This removes the --verbose flag from py.test in .travis.yml.

  • [x] Closes #2880
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2881/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
396501063 MDExOlB1bGxSZXF1ZXN0MjQyNjY4ODEw 2659 to_dict without data rabernat 1197350 closed 0     14 2019-01-07T14:09:25Z 2019-02-12T21:21:13Z 2019-01-21T23:25:56Z MEMBER   0 pydata/xarray/pulls/2659

This PR provides the ability to export Datasets and DataArrays to dictionary without the actual data. This could be useful for generating indices of dataset contents to expose to search indices or other automated data discovery tools

In the process of doing this, I refactored the core dictionary export function to live in the Variable class, since the same code was duplicated in several places.

  • [x] Closes #2656
  • [x] Tests added
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2659/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
362866468 MDExOlB1bGxSZXF1ZXN0MjE3NDYzMTU4 2430 WIP: revise top-level package description rabernat 1197350 closed 0     10 2018-09-22T15:35:47Z 2019-01-07T01:04:19Z 2019-01-06T00:31:57Z MEMBER   0 pydata/xarray/pulls/2430

I have often complained that xarray's top-level package description assumes that the user knows all about pandas. I think this alienates many new users.

This is a first draft at revising that top-level description. Feedback from the community very needed here.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2430/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
382497709 MDExOlB1bGxSZXF1ZXN0MjMyMTkwMjg5 2559 Zarr consolidated rabernat 1197350 closed 0     19 2018-11-20T04:39:41Z 2018-12-05T14:58:58Z 2018-12-04T23:51:00Z MEMBER   0 pydata/xarray/pulls/2559

This PR adds support for reading and writing of consolidated metadata in zarr stores.

  • [x] Closes #2558 (remove if there is no corresponding issue, which should only be the case for minor changes)
  • [x] Tests added (for all bug fixes or enhancements)
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2559/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
312986662 MDExOlB1bGxSZXF1ZXN0MTgwNjUwMjc5 2047 Fix decode cf with dask rabernat 1197350 closed 0     1 2018-04-10T15:56:20Z 2018-04-12T23:38:02Z 2018-04-12T23:38:02Z MEMBER   0 pydata/xarray/pulls/2047
  • [x] Closes #1372
  • [x] Tests added
  • [x] Tests passed
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

This was a very simple fix for an issue that has vexed me for quite a while. Am I missing something obvious here?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2047/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
253136694 MDExOlB1bGxSZXF1ZXN0MTM3ODE5MTA0 1528 WIP: Zarr backend rabernat 1197350 closed 0     103 2017-08-27T02:38:01Z 2018-02-13T21:35:03Z 2017-12-14T02:11:36Z MEMBER   0 pydata/xarray/pulls/1528
  • [x] Closes #1223
  • [x] Tests added / passed
  • [x] Passes git diff upstream/master | flake8 --diff
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

I think that a zarr backend could be the ideal storage format for xarray datasets, overcoming many of the frustrations associated with netcdf and enabling optimal performance on cloud platforms.

This is a very basic start to implementing a zarr backend (as proposed in #1223); however, I am taking a somewhat different approach. I store the whole dataset in a single zarr group. I encode the extra metadata needed by xarray (so far just dimension information) as attributes within the zarr group and child arrays. I hide these special attributes from the user by wrapping the attribute dictionaries in a "HiddenKeyDict", so that they can't be viewed or modified.

I have no tests yet (:flushed:), but the following code works. ```python from xarray.backends.zarr import ZarrStore import xarray as xr import numpy as np

ds = xr.Dataset( {'foo': (('y', 'x'), np.ones((100, 200)), {'myattr1': 1, 'myattr2': 2}), 'bar': (('x',), np.zeros(200))}, {'y': (('y',), np.arange(100)), 'x': (('x',), np.arange(200))}, {'some_attr': 'copana'} ).chunk({'y': 50, 'x': 40})

zs = ZarrStore(store='zarr_test') ds.dump_to_store(zs) ds2 = xr.Dataset.load_store(zs) assert ds2.equals(ds) ```

There is a very long way to go here, but I thought I would just get a PR started. Some questions that would help me move forward.

  1. What is "encoding" at the variable level? (I have never understood this part of xarray.) How should encoding be handled with zarr?
  2. Should we encode / decode CF for zarr stores?
  3. Do we want to always automatically align dask chunks with the underlying zarr chunks?
  4. What sort of public API should the zarr backend have? Should you be able to load zarr stores via open_dataset? Or do we need a new method? I think .to_zarr() would be quite useful.
  5. zarr arrays are extensible along all axes. What does this imply for unlimited dimensions?
  6. Is any autoclose logic needed? As far as I can tell, zarr objects don't need to be closed.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1528/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
287569331 MDExOlB1bGxSZXF1ZXN0MTYyMjI0MTg2 1817 fix rasterio chunking with s3 datasets rabernat 1197350 closed 0     11 2018-01-10T20:37:45Z 2018-01-24T09:33:07Z 2018-01-23T16:33:28Z MEMBER   0 pydata/xarray/pulls/1817
  • [x] Closes #1816 (remove if there is no corresponding issue, which should only be the case for minor changes)
  • [x] Tests added (for all bug fixes or enhancements)
  • [x] Tests passed (for all non-documentation changes)
  • [x] Passes git diff upstream/master **/*py | flake8 --diff (remove if you did not edit any Python files)
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

This is a simple fix for token generation of non-filename targets for rasterio.

The problem is that I have no idea how to test it without actually hitting s3 (which requires boto and aws credentials).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1817/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
229474101 MDExOlB1bGxSZXF1ZXN0MTIxMTQyODkw 1413 concat prealigned objects rabernat 1197350 closed 0     11 2017-05-17T20:16:00Z 2017-07-17T21:53:53Z 2017-07-17T21:53:40Z MEMBER   0 pydata/xarray/pulls/1413
  • [x] Closes #1385
  • [ ] Tests added / passed
  • [ ] Passes git diff upstream/master | flake8 --diff
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API

This is an initial PR to bypass index alignment and coordinate checking when concatenating datasets.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1413/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
229138906 MDExOlB1bGxSZXF1ZXN0MTIwOTAzMjY5 1411 fixed dask prefix naming rabernat 1197350 closed 0     6 2017-05-16T19:10:30Z 2017-05-22T20:39:01Z 2017-05-22T20:38:56Z MEMBER   0 pydata/xarray/pulls/1411
  • [x] Closes #1343
  • [x] Tests added / passed
  • [x] Passes git diff upstream/master | flake8 --diff
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

I am starting a new PR for this since the original one (#1345) was not branched of my own fork.

As the discussion there stood, @shoyer suggested that dataset.chunk should also be updated to match the latest conventions in dask naming. The relevant code is here

```python def maybe_chunk(name, var, chunks): chunks = selkeys(chunks, var.dims) if not chunks: chunks = None if var.ndim > 0: token2 = tokenize(name, token if token else var._data) name2 = '%s%s-%s' % (name_prefix, name, token2) return var.chunk(chunks, name=name2, lock=lock) else: return var

    variables = OrderedDict([(k, maybe_chunk(k, v, chunks))
                             for k, v in self.variables.items()])

```

Currently, chunk has an optional keyword argument name_prefix='xarray-'. Do we want to keep this optional?

IMO, the current naming logic in chunk is not a problem for dask and will not cause problems for the distributed bokeh dashboard (as open_dataset did).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1411/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
218368855 MDExOlB1bGxSZXF1ZXN0MTEzNTU0Njk4 1345 new dask prefix rabernat 1197350 closed 0     2 2017-03-31T00:56:24Z 2017-05-21T09:45:39Z 2017-05-16T19:11:13Z MEMBER   0 pydata/xarray/pulls/1345
  • [x] closes #1343
  • [ ] tests added / passed
  • [ ] passes git diff upstream/master | flake8 --diff
  • [ ] whatsnew entry
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1345/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
225482023 MDExOlB1bGxSZXF1ZXN0MTE4NDA4NDc1 1390 Fix groupby bins tests rabernat 1197350 closed 0     1 2017-05-01T17:46:41Z 2017-05-01T21:52:14Z 2017-05-01T21:52:14Z MEMBER   0 pydata/xarray/pulls/1390
  • [x] closes #1386
  • [x] tests added / passed
  • [x] passes git diff upstream/master | flake8 --diff
  • [x] whatsnew entry
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1390/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
188537472 MDExOlB1bGxSZXF1ZXN0OTMxNzEyODE= 1104 add optimization tips rabernat 1197350 closed 0     1 2016-11-10T15:26:25Z 2016-11-10T16:49:13Z 2016-11-10T16:49:06Z MEMBER   0 pydata/xarray/pulls/1104

This adds some dask optimization tips from the mailing list (closes #1103).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1104/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
180536861 MDExOlB1bGxSZXF1ZXN0ODc2NDc0MDk= 1027 Groupby bins empty groups rabernat 1197350 closed 0     7 2016-10-02T21:31:32Z 2016-10-03T15:22:18Z 2016-10-03T15:22:15Z MEMBER   0 pydata/xarray/pulls/1027

This PR fixes a bug in groupby_bins in which empty bins were dropped from the grouped results. Now groupby_bins restores any empty bins automatically. To recover the old behavior, one could apply dropna after a groupby operation.

Fixes #1019

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1027/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
146182176 MDExOlB1bGxSZXF1ZXN0NjU0MDc4NzA= 818 Multidimensional groupby rabernat 1197350 closed 0     61 2016-04-06T04:14:37Z 2016-07-31T23:02:59Z 2016-07-08T01:50:38Z MEMBER   0 pydata/xarray/pulls/818

Many datasets have a two dimensional coordinate variable (e.g. longitude) which is different from the logical grid coordinates (e.g. nx, ny). (See #605.) For plotting purposes, this is solved by #608. However, we still might want to split / apply / combine over such coordinates. That has not been possible, because groupby only supports creating groups on one-dimensional arrays.

This PR overcomes that issue by using stack to collapse multiple dimensions in the group variable. A minimal example of the new functionality is

``` python

da = xr.DataArray([[0,1],[2,3]], coords={'lon': (['ny','nx'], [[30,40],[40,50]] ), 'lat': (['ny','nx'], [[10,10],[20,20]] )}, dims=['ny','nx']) da.groupby('lon').sum() <xarray.DataArray (lon: 3)> array([0, 3, 3]) Coordinates: * lon (lon) int64 30 40 50 ```

This feature could have broad applicability for many realistic datasets (particularly model output on irregular grids): for example, averaging non-rectangular grids zonally (i.e. in latitude), binning in temperature, etc.

If you think this is worth pursuing, I would love some feedback.

The PR is not complete. Some items to address are - [x] Create a specialized grouper to allow coarser bins. By default, if no grouper is specified, the GroupBy object uses all unique values to define the groups. With a high resolution dataset, this could balloon to a huge number of groups. With the latitude example, we would like to be able to specify e.g. 1-degree bins. Usage would be da.groupby('lon', bins=range(-90,90)). - [ ] Allow specification of which dims to stack. For example, stack in space but keep time dimension intact. (Currently it just stacks all the dimensions of the group variable.) - [x] A nice example for the docs.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/818/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
162974170 MDExOlB1bGxSZXF1ZXN0NzU2ODI3NzM= 892 fix printing of unicode attributes rabernat 1197350 closed 0     2 2016-06-29T16:47:27Z 2016-07-24T02:57:13Z 2016-07-24T02:57:13Z MEMBER   0 pydata/xarray/pulls/892

fixes #834

I would welcome a suggestion of how to test this in a way that works with both python 2 and 3. This is somewhat outside my expertise.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/892/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
100055216 MDExOlB1bGxSZXF1ZXN0NDIwMTYyMDg= 524 Option for closing files with scipy backend rabernat 1197350 closed 0     6 2015-08-10T12:49:23Z 2016-06-24T17:45:07Z 2016-06-24T17:45:07Z MEMBER   0 pydata/xarray/pulls/524

This is the same as #468, which was accidentally closed. I just copied and pasted my comment below

This addresses issue #463, in which open_mfdataset failed when trying to open a list of files longer than my system's ulimit. I tried to find a solution in which the underlying netcdf file objects are kept closed by default and only reopened "when needed".

I ended up subclassing scipy.io.netcdf_file and overwriting the variable attribute with a property which first checks whether the file is open or closed and opens it if needed. That was the easy part. The hard part was figuring out when to close them. The problem is that a couple of different parts of the code (e.g. each individual variable and also the datastore object itself) keep references to the netcdf_file object. In the end I used the debugger to find out when during initialization the variables were actually being read and added some calls to close() in various different places. It is relatively easy to close the files up at the end of the initialization, but it was much harder to make sure that the whole array of files is never open at the same time. I also had to disable mmap when this option is active.

This solution is messy and, moreover, extremely slow. There is a factor of ~100 performance penalty during initialization for reopening and closing the files all the time (but only a factor of 10 for the actual calculation). I am sure this could be reduced if someone who understands the code better found some judicious points at which to call close() on the netcdf_file. The loss of mmap also sucks.

This option can be accessed with the close_files key word, which I added to api.

Timing for loading and doing a calculation with close_files=True:

python count_open_files() %time mfds = xray.open_mfdataset(ddir + '/dt_global_allsat_msla_uv_2014101*.nc', engine='scipy', close_files=True) count_open_files() %time print float(mfds.variables['u'].mean()) count_open_files()

output:

3 open files CPU times: user 11.1 s, sys: 17.5 s, total: 28.5 s Wall time: 27.7 s 2 open files 0.0055650632367 CPU times: user 649 ms, sys: 974 ms, total: 1.62 s Wall time: 633 ms 2 open files

Timing for loading and doing a calculation with close_files=False (default, should revert to old behavior):

python count_open_files() %time mfds = xray.open_mfdataset(ddir + '/dt_global_allsat_msla_uv_2014101*.nc', engine='scipy', close_files=False) count_open_files() %time print float(mfds.variables['u'].mean()) count_open_files()

3 open files CPU times: user 264 ms, sys: 85.3 ms, total: 349 ms Wall time: 291 ms 22 open files 0.0055650632367 CPU times: user 174 ms, sys: 141 ms, total: 315 ms Wall time: 56 ms 22 open files

This is not a very serious pull request, but I spent all day on it, so I thought I would share. Maybe you can see some obvious way to improve it...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/524/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
101719623 MDExOlB1bGxSZXF1ZXN0NDI3MzE1NDg= 538 Fix contour color rabernat 1197350 closed 0     25 2015-08-18T18:24:36Z 2015-09-01T17:48:12Z 2015-09-01T17:20:56Z MEMBER   0 pydata/xarray/pulls/538

This fixes #537 by adding a check for the presence of the colors kwarg.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/538/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
99847237 MDExOlB1bGxSZXF1ZXN0NDE5NjI5MDg= 523 Fix datetime decoding when time units are 'days since 0000-01-01 00:00:00' rabernat 1197350 closed 0     22 2015-08-09T00:12:00Z 2015-08-14T17:22:02Z 2015-08-14T17:22:02Z MEMBER   0 pydata/xarray/pulls/523

This fixes #521 using the workaround described in Unidata/netcdf4-python#442.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/523/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
94508580 MDExOlB1bGxSZXF1ZXN0Mzk3NTI1MTQ= 468 Option for closing files with scipy backend rabernat 1197350 closed 0     7 2015-07-11T21:24:24Z 2015-08-10T12:50:45Z 2015-08-09T00:04:12Z MEMBER   0 pydata/xarray/pulls/468

This addresses issue #463, in which open_mfdataset failed when trying to open a list of files longer than my system's ulimit. I tried to find a solution in which the underlying netcdf file objects are kept closed by default and only reopened "when needed".

I ended up subclassing scipy.io.netcdf_file and overwriting the variable attribute with a property which first checks whether the file is open or closed and opens it if needed. That was the easy part. The hard part was figuring out when to close them. The problem is that a couple of different parts of the code (e.g. each individual variable and also the datastore object itself) keep references to the netcdf_file object. In the end I used the debugger to find out when during initialization the variables were actually being read and added some calls to close() in various different places. It is relatively easy to close the files up at the end of the initialization, but it was much harder to make sure that the whole array of files is never open at the same time. I also had to disable mmap when this option is active.

This solution is messy and, moreover, extremely slow. There is a factor of ~100 performance penalty during initialization for reopening and closing the files all the time (but only a factor of 10 for the actual calculation). I am sure this could be reduced if someone who understands the code better found some judicious points at which to call close() on the netcdf_file. The loss of mmap also sucks.

This option can be accessed with the close_files key word, which I added to api.

Timing for loading and doing a calculation with close_files=True:

python count_open_files() %time mfds = xray.open_mfdataset(ddir + '/dt_global_allsat_msla_uv_2014101*.nc', engine='scipy', close_files=True) count_open_files() %time print float(mfds.variables['u'].mean()) count_open_files()

output:

3 open files CPU times: user 11.1 s, sys: 17.5 s, total: 28.5 s Wall time: 27.7 s 2 open files 0.0055650632367 CPU times: user 649 ms, sys: 974 ms, total: 1.62 s Wall time: 633 ms 2 open files

Timing for loading and doing a calculation with close_files=False (default, should revert to old behavior):

python count_open_files() %time mfds = xray.open_mfdataset(ddir + '/dt_global_allsat_msla_uv_2014101*.nc', engine='scipy', close_files=False) count_open_files() %time print float(mfds.variables['u'].mean()) count_open_files()

3 open files CPU times: user 264 ms, sys: 85.3 ms, total: 349 ms Wall time: 291 ms 22 open files 0.0055650632367 CPU times: user 174 ms, sys: 141 ms, total: 315 ms Wall time: 56 ms 22 open files

This is not a very serious pull request, but I spent all day on it, so I thought I would share. Maybe you can see some obvious way to improve it...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/468/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
99844089 MDExOlB1bGxSZXF1ZXN0NDE5NjI0NDM= 522 Fix datetime decoding when time units are 'days since 0000-01-01 00:00:00' rabernat 1197350 closed 0     1 2015-08-08T23:26:07Z 2015-08-09T00:10:18Z 2015-08-09T00:06:49Z MEMBER   0 pydata/xarray/pulls/522

This fixes #521 using the workaround described in Unidata/netcdf4-python#442.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/522/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 34.975ms · About: xarray-datasette