home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

2 rows where user = 41797673 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 2

state 1

  • closed 2

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
664611511 MDU6SXNzdWU2NjQ2MTE1MTE= 4261 to_zarr with append_dim behavior changed in 0.16.0 release maximemorariu 41797673 closed 0     4 2020-07-23T16:26:16Z 2020-11-19T15:19:48Z 2020-11-19T15:19:48Z NONE      

What happened: In version 0.15.1, calling to_zarr on a Dataset with a given append_dim would create a new zarr if it did not already exist. In version 0.16.0, this is not the case anymore: the call fails if the zarr does not exist.

Minimal Complete Verifiable Example:

```python import xarray as xr

a = xr.DataArray([1, 2], {"t": [1, 2]}, ("t",)) ds = xr.Dataset({"v": a}) ds.to_zarr("CHOOSE_PATH", append_dim="t") ```

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-177.el8.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: None libnetcdf: None xarray: 0.16.0 pandas: 1.0.5 numpy: 1.18.1 scipy: 1.5.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.3.2 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.14.0 distributed: 2.14.0 matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.2.0.post20200714 pip: 20.1.1 conda: None pytest: 5.4.3 IPython: None sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4261/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
662982199 MDU6SXNzdWU2NjI5ODIxOTk= 4241 Parallel tasks on subsets of a dask array wrapped in an xarray Dataset maximemorariu 41797673 closed 0     5 2020-07-21T12:47:41Z 2020-07-27T08:18:13Z 2020-07-27T08:18:13Z NONE      

I have a large xarray.Dataset stored as a zarr. I want to perform some custom operations on it that cannot be done by just using numpy-like functions that a Dask cluster will automatically deal with. Therefore, I partition the dataset into small subsets and for each subset submit to my Dask cluster a task of the form def my_task(zarr_path, subset_index): ds = xarray.open_zarr(zarr_path) # this returns an xarray.Dataset containing a dask.array sel = ds.sel(partition_index) sel = sel.load() # I want to get the data into memory # then do my custom operations ... However, I have noticed this creates a "task within a task": when a worker receives "my_task", it in turn submits tasks to the cluster to load the relevant part of the dataset. To avoid this and ensure that the full task is executed within the worker, I am submitting instead the task: def my_task_2(zarr_path, subset_index): with dask.config.set(scheduler="threading"): my_task(zarr_path, subset_index) Is this the best way to do this? What's the best practice for this kind of situation?

I have already posted this on stackoverflow but did not get any answer, so I am adding this here hoping it increases visibility. Apologies if this is considered "pollution". https://stackoverflow.com/questions/62874267/parallel-tasks-on-subsets-of-a-dask-array-wrapped-in-an-xarray-dataset

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4241/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 21.625ms · About: xarray-datasette