home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

17 rows where repo = 13221727, state = "open" and user = 2443309 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date)

type 2

  • issue 15
  • pull 2

state 1

  • open · 17 ✖

repo 1

  • xarray · 17 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
602256880 MDU6SXNzdWU2MDIyNTY4ODA= 3981 [Proposal] Expose Variable without Pandas dependency jhamman 2443309 open 0     23 2020-04-17T22:00:10Z 2024-04-24T17:19:55Z   MEMBER      

This issue proposes exposing Xarray's Variable class as a stand-alone array class with named axes (dims) and arbitrary metadata (attrs) but without coordinates (indexes). Yes, this already exists but the Variable class in currently inseparable from our Pandas dependency, despite not utilizing any of its functionality. What would this entail?

The biggest change would be in making Pandas an optional dependency and isolating any imports. This change could be confined to the Variable object or could be propagated further as the Explicit Indexes work proceeds (#1603).

Why?

Within Xarray, the Variable class is a vital building block for many of our internal data structures. Recently, the utility of a simple array with named dimensions has been highlighted by a few potential user communities:

  • Scikit-learn: https://github.com/scikit-learn/enhancement_proposals/pull/18
  • PyTorch: (https://pytorch.org/tutorials/intermediate/named_tensor_tutorial.html, http://nlp.seas.harvard.edu/NamedTensor)

An example from the above linked SLEP as to why users may not want Pandas a dependency in Xarray:

@amueller: ...If we go this route, I think we need to make xarray, and therefore pandas, a mandatory dependency... ... @adrinjalali: ...And we still do have the option of making a NamedArray. xarray uses the pandas' index classes for the indexing and stuff, which is something we really don't need...

Since we already have a class developed that meets these applications' use cases, its seems only prudent to evaluate the feasibility in exposing the Variable as a low-level api object.

In conclusion, I'm not sure this is currently worth the effort but its probably worth exploring at this point.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3981/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
606530049 MDU6SXNzdWU2MDY1MzAwNDk= 4001 [community] Bi-weekly community developers meeting jhamman 2443309 open 0     14 2020-04-24T19:22:01Z 2024-03-27T15:33:28Z   MEMBER      

Hello Xarray Community and @pydata/xarray,

Starting next week, we will be hosting a bi-weekly 30-minute community/developers meeting. The goal of this meeting is to help coordinate Xarray development efforts and better connect the user/developer community.

When

Every other Wednesday at 8:30a PT (11:30a ET) beginning April 29th, 2020.

Calendar options: - Google Calendar - Ical format

Where

https://us02web.zoom.us/j/87503265754?pwd=cEFJMzFqdTFaS3BMdkx4UkNZRk1QZz09

Rolling agenda and meeting notes

We'll keep a rolling agenda and set of meeting notes - Through Sept. 2022. - Starting October 2022 (requires sign-in) - Starting March 2024

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4001/reactions",
    "total_count": 5,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 5,
    "eyes": 0
}
    xarray 13221727 issue
1564627108 I_kwDOAMm_X85dQlCk 7495 Deprecate open_zarr in favor of open_dataset(..., engine='zarr') jhamman 2443309 open 0     2 2023-01-31T16:21:07Z 2023-12-12T18:00:15Z   MEMBER      

What is your issue?

We have discussed many time deprecating xarray.open_zarr in favor of xarray.open_dataset(..., engine='zarr'). This issue tracks that process and is a place for us to discuss any issues that may arise as a result of the change.

xref: https://github.com/pydata/xarray/issues/2812, https://github.com/pydata/xarray/issues/7293

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7495/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1564661430 PR_kwDOAMm_X85I7qzk 7496 deprecate open_zarr jhamman 2443309 open 0     13 2023-01-31T16:40:38Z 2023-10-27T05:14:02Z   MEMBER   0 pydata/xarray/pulls/7496

This PR deprecates open_zarr in favor of open_dataset(..., engine='zarr').

  • [x] Closes #7495
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7496/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1673579421 I_kwDOAMm_X85jwMud 7765 Revisiting Xarray's Minimum dependency versions policy jhamman 2443309 open 0     9 2023-04-18T17:46:03Z 2023-09-19T15:54:09Z   MEMBER      

What is your issue?

We have recently had a few reports expressing frustration with our minimum dependency version policy. This issue aims to discuss if changes to our policy are needed.

Background

  1. Our current minimum dependency versions policy reads:

    Minimum dependency versions

    Xarray adopts a rolling policy regarding the minimum supported version of its dependencies:

    • Python: 24 months (NEP-29)
    • numpy: 18 months (NEP-29)
    • all other libraries: 12 months

    This means the latest minor (X.Y) version from N months prior. Patch versions (x.y.Z) are not pinned, and only the latest available at the moment of publishing the xarray release is guaranteed to work.

    You can see the actual minimum tested versions:

    pydata/xarray

  2. We have a script that checks versions and dates and advises us on when to bump minimum versions.

    https://github.com/pydata/xarray/blob/main/ci/min_deps_check.py

Diagnosis

  1. Our policy and min_deps_check.py script have greatly reduced our deliberations on which versions to support and the maintenance burden of supporting out dated versions of dependencies.
  2. We likely need to update our policy and min_deps_check.py script to properly account for Python's SEMVER bugfix releases. Depending on how you interpret the policy, we may have prematurely dropped Python 3.8 (see below for a potential action item).

Discussion questions

  1. Is the policy working as designed, are the support windows documented above still appropriate for where Xarray is at today?
  2. Is this policy still in line with how our peer libraries are operating?

Action items

  1. There is likely a bug in the patch-version comparison in the minimum Python version. Moreover, we don't differentiate between bugfix and security releases. I suggest we have a special policy for our minimum supported Python version that reads something like: > Python: 24 months from the last bugfix release (security releases are not considered).

xref: https://github.com/pydata/xarray/issues/4179, https://github.com/pydata/xarray/pull/7461

Moderators note: I suspect a number of folks will want to comment on this issue with "Please support Python 3.8 for longer...". If that is the nature of your comment, please just give this a ❤️ reaction rather than filling up the discussion.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7765/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  reopened xarray 13221727 issue
95114700 MDU6SXNzdWU5NTExNDcwMA== 475 API design for pointwise indexing jhamman 2443309 open 0     39 2015-07-15T06:04:47Z 2023-08-23T12:37:23Z   MEMBER      

There have been a number of threads discussing possible improvements/extensions to xray indexing. The current indexing behavior for isel is orthogonal indexing - in other words, each coordinate is treated independently (see #214 and #411 for more discussion).

So the question: what is the best way to incorporate diagonal or pointwise indexing in xray? I see two main goals / applications: 1. support simple form of numpy style integer array indexing 2. support pointwise array indexing along coordinates via computation of nearest-neighbor indexes - I think this can also be thought of as a form of resampling.

Input from @WeatherGod, @wholmgren, and @shoyer would be great.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/475/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1383037028 I_kwDOAMm_X85Sb3hk 7071 Should Xarray have a read_csv method? jhamman 2443309 open 0     5 2022-09-22T21:28:46Z 2023-06-13T01:45:33Z   MEMBER      

Is your feature request related to a problem?

Most users of Xarray/Pandas start with an IO call of some sort. In Xarray, our open_dataset(..., engine=engine) interface provides an extensible interface to more complex backends (NetCDF, Zarr, GRIB, etc.). For tabular data types, we have traditionally pointed users to Pandas. While this works for users that are comfortable with Pandas, it is an added hurdle to users getting started with Xarray.

Describe the solution you'd like

It should be easy and obvious how a user can get a CSV (or other tabular data) into Xarray. Ideally, we don't force the user to use a third part library.

Describe alternatives you've considered

I can think of three possible solutions:

  1. We expose a new function read_csv, it may do something like this:

python def read_csv(filepath_or_buffer, **kwargs): df = pd.read_csv(filepath_or_buffer, **kwargs) ds = xr.Dataset.from_dataframe(df) return ds

  1. We develop a storage backend to support reading CSV-like data:

python ds = open_dataset(filepath, engine='csv')

  1. We copy (1) as an example and put it in Xarray's documentation. Explicitly showing how you would use Pandas to produce a Dataset from a CSV.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7071/reactions",
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1575494367 I_kwDOAMm_X85d6CLf 7515 Aesara as an array backend in Xarray jhamman 2443309 open 0     11 2023-02-08T05:15:35Z 2023-05-01T14:40:39Z   MEMBER      

Is your feature request related to a problem?

I recently learned about a meta-tensor library called Aesara which got me wondering if it would be a good array backend for Xarray.

Aesara is a Python library that allows you to define, optimize/rewrite, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It is composed of different parts: - Symbolic representation of mathematical operations on arrays - Speed and stability optimization - Efficient symbolic differentiation - Powerful rewrite system to programmatically modify your models - Extendable backends. Aesara currently compiles to C, Jax and Numba.

xref: https://github.com/aesara-devs/aesara/issues/352, @OriolAbril, @twiecki

Has anyone looked into this yet?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7515/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
    xarray 13221727 issue
110820316 MDU6SXNzdWUxMTA4MjAzMTY= 620 Don't squeeze DataArray before plotting jhamman 2443309 open 0     5 2015-10-10T22:26:51Z 2023-04-08T17:20:50Z   MEMBER      

As was discussed in #608, we should honor the shape of the DataArray when selecting plot methods. Currently, we're squeezing the DataArray before plotting. This ends up plotting a line plot for a DataArray with shape (N, 1). We should find a way to plot a pcolormesh or imshow plot in this case. The trick will be figuring out what to do in _infer_interval_breaks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/620/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1651243130 PR_kwDOAMm_X85Nclrx 7708 deprecate encoding setters jhamman 2443309 open 0     0 2023-04-03T02:59:15Z 2023-04-03T22:12:31Z   MEMBER   0 pydata/xarray/pulls/7708
  • [x] Toward #6323
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7708/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
681291824 MDU6SXNzdWU2ODEyOTE4MjQ= 4348 maximum recursion with dask and pydap backend jhamman 2443309 open 0     2 2020-08-18T19:47:26Z 2022-12-15T18:47:38Z   MEMBER      

What happened:

I'm getting a maximum recursion error when using the Pydap backend with Dask distributed. It seems the we're failing to successfully pickle the pydap backend store.

What you expected to happen:

Successful parallel loading of opendap dataset.

Minimal Complete Verifiable Example:

```python import xarray as xr from dask.distributed import Client

client = Client()

ds = xr.open_dataset('http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_terraclimate_pet_1958_CurrentYear_GLOBE.nc', engine='pydap', chunks={'lat': 1024, 'lon': 1024, 'time': 12}).load() ```

yields:

Killed worker on the client:

--------------------------------------------------------------------------- KilledWorker Traceback (most recent call last) <ipython-input-4-713e4114ee96> in <module> 4 client = Client() 5 ----> 6 ds = xr.open_dataset('http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_terraclimate_pet_1958_CurrentYear_GLOBE.nc', 7 engine='pydap', chunks={'lat': 1024, 'lon': 1024, 'time': 12}).load() ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/xarray/core/dataset.py in load(self, **kwargs) 652 653 # evaluate all the dask arrays simultaneously --> 654 evaluated_data = da.compute(*lazy_data.values(), **kwargs) 655 656 for k, data in zip(lazy_data, evaluated_data): ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/dask/base.py in compute(*args, **kwargs) 435 keys = [x.__dask_keys__() for x in collections] 436 postcomputes = [x.__dask_postcompute__() for x in collections] --> 437 results = schedule(dsk, keys, **kwargs) 438 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)]) 439 ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs) 2594 should_rejoin = False 2595 try: -> 2596 results = self.gather(packed, asynchronous=asynchronous, direct=direct) 2597 finally: 2598 for f in futures.values(): ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous) 1886 else: 1887 local_worker = None -> 1888 return self.sync( 1889 self._gather, 1890 futures, ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs) 775 return future 776 else: --> 777 return sync( 778 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs 779 ) ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs) 346 if error[0]: 347 typ, exc, tb = error[0] --> 348 raise exc.with_traceback(tb) 349 else: 350 return result[0] ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/utils.py in f() 330 if callback_timeout is not None: 331 future = asyncio.wait_for(future, callback_timeout) --> 332 result[0] = yield future 333 except Exception as exc: 334 error[0] = sys.exc_info() ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/tornado/gen.py in run(self) 733 734 try: --> 735 value = future.result() 736 except Exception: 737 exc_info = sys.exc_info() ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker) 1751 exc = CancelledError(key) 1752 else: -> 1753 raise exception.with_traceback(traceback) 1754 raise exc 1755 if errors == "skip": KilledWorker: ('open_dataset-54c87cd25bf4e9df37cb3030e6602974pet-d39db76f8636f3803611948183e52c13', <Worker 'tcp://127.0.0.1:57343', name: 0, memory: 0, processing: 1>)

and the above mentioned recursion error on the workers:

distributed.worker - INFO - ------------------------------------------------- distributed.worker - INFO - Registered to: tcp://127.0.0.1:57334 distributed.worker - INFO - ------------------------------------------------- distributed.worker - ERROR - maximum recursion depth exceeded Traceback (most recent call last): File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/worker.py", line 931, in handle_scheduler await self.handle_stream( File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/core.py", line 455, in handle_stream msgs = await comm.read() File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/comm/tcp.py", line 211, in read msg = await from_frames( File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/comm/utils.py", line 75, in from_frames res = _from_frames() File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/comm/utils.py", line 60, in _from_frames return protocol.loads( File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/protocol/core.py", line 130, in loads value = _deserialize(head, fs, deserializers=deserializers) File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/protocol/serialize.py", line 269, in deserialize return loads(header, frames) File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/protocol/serialize.py", line 59, in pickle_loads return pickle.loads(b"".join(frames)) File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 59, in loads return pickle.loads(x) File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/pydap/model.py", line 235, in __getattr__ return self.attributes[attr] File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/pydap/model.py", line 235, in __getattr__ return self.attributes[attr] File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/pydap/model.py", line 235, in __getattr__ return self.attributes[attr] [Previous line repeated 973 more times] RecursionError: maximum recursion depth exceeded distributed.worker - INFO - Connection to scheduler broken. Reconnecting...

Anything else we need to know?:

I've found this to be reproducible with a few kinds of Dask clusters. Setting Client(processes=False) does correct the problem at the expense of multiprocessiing.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.2 | packaged by conda-forge | (default, Mar 5 2020, 16:54:44) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 19.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.1 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: installed h5netcdf: 0.8.0 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.1.1.2 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.0.28 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.13.0 distributed: 2.13.0 matplotlib: 3.2.1 cartopy: 0.17.0 seaborn: 0.10.0 numbagg: installed setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: installed pytest: 5.4.1 IPython: 7.13.0 sphinx: 3.1.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4348/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
110807626 MDU6SXNzdWUxMTA4MDc2MjY= 619 Improve plot aspect handling when using cartopy jhamman 2443309 open 0     5 2015-10-10T17:43:55Z 2021-01-03T16:17:29Z   MEMBER      

This applies to single plots and FacetGrids.

The current plotting behavior when using a projection that changes the plot aspect is as follows:

``` Python from xray.tutorial import load_dataset

ds = load_dataset('air_temperature')

ax = plt.subplot(projection=ccrs.LambertConformal()) ds.air.isel(time=0).plot(transform=ccrs.PlateCarree()) ax.coastlines() ax.gridlines() ```

Python fg = ds.air.isel(time=slice(0, 9)).plot(col='time', col_wrap=3, transform=ccrs.PlateCarree(), subplot_kws=dict(projection=ccrs.LambertConformal())) for ax in fg.axes.flat: ax.coastlines() ax.gridlines()

There are two problems here, I think both are related to the aspect of the subplot: 1. In the single case, the subplot aspect is correct but the colorbar is not scaled appropriately 2. In the FacetGrid case, the subplot aspects are not correct but the colorbar is.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/619/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
140264913 MDU6SXNzdWUxNDAyNjQ5MTM= 792 ENH: Don't infer pcolormesh interval breaks for unevenly spaced coordiantes jhamman 2443309 open 0     7 2016-03-11T19:06:30Z 2020-12-29T17:50:33Z   MEMBER      

Based on discussion in #781 and #782, it seems like a bad idea to infer (guess) the spacing of coordinates when they are unevenly spaced. As @ocefpaf points out:

guessing should be an active user choice, not the automatic behavior.

So the options moving forward are to 1. never infer the interval breaks and be okay with pcolormesh and imshow producing dissimilar plots, or 2. only infer the interval breaks when the coordinates are evenly spaced.

cc @clarkfitzg

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/792/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
302806158 MDU6SXNzdWUzMDI4MDYxNTg= 1970 API Design for Xarray Backends jhamman 2443309 open 0     9 2018-03-06T18:02:05Z 2020-10-06T06:15:56Z   MEMBER      

It has come time to formalize the API for Xarray backends. We now have the following backends implemented in xarray:

| Backend | Read | Write | |----------------|------|-------| | netcdf4-python | x | x | | h5netcdf | x | x | | pydap | x | | | pynio | x | | | scipy | x | x | | rasterio* | x | | | zarr | x | x |

* currently does not inherit from backends.AbstractDatastore

And there are conversations about adding additional backends, for example:

  • TileDB: https://github.com/pangeo-data/storage-benchmarks/issues/6
  • PseudoNetCDF: #1905

However, as anyone who has worked on implementing or optimizing any of our current backends can attest, the existing DataStore API is not particularly user/developer friendly. @shoyer asked me to open an issue to discuss what a more user friendly backend API would look like so that is what this issue will be. I have left out a thorough description of the current API because, well, I don't think it can done in a succinct manner (thats the problem).

Note that @shoyer started down a API refactor some time ago in #1087 but that effort has stalled, presumably because we don't have a well defined set of development goals here.

cc @pydata/xarray

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1970/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
264049503 MDU6SXNzdWUyNjQwNDk1MDM= 1614 Rules for propagating attrs and encoding jhamman 2443309 open 0     15 2017-10-09T22:56:02Z 2020-04-05T19:12:10Z   MEMBER      

We need to come up with some clear rules for when and how xarray should propagate metadata (attrs/encoding). This has come up routinely (e.g. #25, #138, #442, #688, #828, #988, #1009, #1271, #1297, #1586) and we don't have a clear direction as to when to keep/drop metadata.

I'll take a first cut:

| operation | attrs | encoding | status | |------------ |------------ |------------ |------------ | | reduce | drop | drop | | | arithmetic | drop | drop | implemented | | copy | keep | keep | | | concat | keep first | keep first | implemented | | slice | keep | drop | | | where | keep | keep | |

cc @shoyer (following up on https://github.com/pydata/xarray/issues/1586#issuecomment-334954046)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1614/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
132774456 MDU6SXNzdWUxMzI3NzQ0NTY= 757 Ordered Groupby Keys jhamman 2443309 open 0     6 2016-02-10T18:05:08Z 2019-11-20T16:12:41Z   MEMBER      

The current behavior of the xarray's Groupby.groups property provides a standard (unordered) dictionary. This is fine for most cases but leads to odd orderings in use cases like this one where I am using xarray's FacetGrid plotting:

``` Python plot_kwargs = dict(col='season', vmin=15, vmax=35, levels=12, extend='both')

da_obs = ds_obs.SALT.isel(depth=0).groupby('time.season').mean('time') da_obs.plot(**plot_kwargs) ```

Note that MAM and JJA are out of order.

I think this could be easily fixed by using an OrderedDict in xarray.core.Groupby.groups.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/757/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
280385592 MDU6SXNzdWUyODAzODU1OTI= 1769 Extend to_masked_array to support dask MaskedArrays jhamman 2443309 open 0     5 2017-12-08T06:22:56Z 2019-11-08T17:19:44Z   MEMBER      

Following @shoyer's comment, it will be pretty straightforward to support creating dask masked arrays within the to_masked_array method. My thought would be that data arrays use dask, would be converted to dask masked arrays, rather than to numpy arrays as they are currently.

Two kinks:

1) The dask masked array feature requires dask 0.15.3 or newer. 2) I'm not sure how to test if an object is a dask.array.ma.MaskedArray (Dask doesn't have a MaskedArray class). @mrocklin - thoughts?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1769/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 45.407ms · About: xarray-datasette