home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

6 rows where comments = 7 and user = 1197350 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 3
  • pull 3

state 2

  • closed 4
  • open 2

repo 1

  • xarray 6
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
396806015 MDU6SXNzdWUzOTY4MDYwMTU= 2660 DataArrays to/from Zarr Arrays rabernat 1197350 open 0     7 2019-01-08T08:56:05Z 2023-10-27T14:00:20Z   MEMBER      

Right now, open_zarr and Dataset.to_zarr only work with Zarr groups. Zarr Groups can contain multiple Array objects.

It would be nice if we could open Zarr Arrays directly as xarray DataArrays and write xarray DataArrays directly to Zarr Arrays.

However, this might not make sense, because, unlike xarray DataArrays, zarr Arrays can't hold any coordinates.

Just raising this idea for discussion.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2660/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  reopened xarray 13221727 issue
357808970 MDExOlB1bGxSZXF1ZXN0MjEzNzM2NTAx 2405 WIP: don't create indexes on multidimensional dimensions rabernat 1197350 closed 0     7 2018-09-06T20:13:11Z 2023-07-19T18:33:17Z 2023-07-19T18:33:17Z MEMBER   0 pydata/xarray/pulls/2405
  • [x] Closes #2368, Closes #2233
  • [ ] Tests added (for all bug fixes or enhancements)
  • [ ] Tests passed (for all non-documentation changes)
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

This is just a start to the solution proposed in #2368. A surprisingly small number of tests broke in my local environment.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2405/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1047608434 I_kwDOAMm_X84-cTxy 5954 Writeable backends via entrypoints rabernat 1197350 open 0     7 2021-11-08T15:47:12Z 2021-11-09T16:28:59Z   MEMBER      

The backend refactor has gone a long way towards making it easier to implement custom backend readers via entry points. However, it is still not clear how to implement a writeable backend from a third party package as an entry point. Some of the reasons for this are:

  • While our reading function (open_dataset) has a generic name, our writing functions (Dataset.to_netcdf / Dataset.to_zarr) are still format specific. (Related to https://github.com/pydata/xarray/issues/3638). I propose we introduce a generic Dataset.to method and deprecate the others.
  • The BackendEntrypoint base class does not have a writing method, just open_dataset: https://github.com/pydata/xarray/blob/e0deb9cf0a5cd5c9e3db033fd13f075added9c1e/xarray/backends/common.py#L356-L370 (Related to https://github.com/pydata/xarray/issues/1970)
  • As a result, writing is implemented ad-hoc for each backend.
  • This makes it impossible for a third-party package to to implement writing.

We should fix this situation! Here are the steps I would take.

  • [ ] Decide on the desired API for writeable backends.
  • [ ] Formalize this in the BackendEntrypoint base class.
  • [ ] Refactor the existing writeable backends (netcdf4-python, h5netcdf, scipy, Zarr) to use this API
  • [ ] Maybe deprecate to_zarr and to_netcdf (or at least refactor to make a shallow call to a generic method)
  • [ ] Encourage third party implementors to try it (e.g. TileDB)
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5954/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
    xarray 13221727 issue
193657418 MDU6SXNzdWUxOTM2NTc0MTg= 1154 netCDF reading is not prominent in the docs rabernat 1197350 closed 0     7 2016-12-06T01:18:40Z 2019-02-02T06:33:44Z 2019-02-02T06:33:44Z MEMBER      

Just opening an issue to highlight what I think is a problem with the docs.

For me, the primary use of xarray is to read and process existing netCDF data files. @shoyer's popular blog post illustrates this use case extremely well.

However, when I open the docs, I have to dig quite deep before I can see how to read a netCDF file. This could be turning away many potential users. The stuff about netCDF reading is hidden under "Serialization and IO". Many potential users will have no idea what either of these words mean.

IMO the solution to this is to reorganize the docs to make reading netCDF much more prominent and obvious.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1154/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
180536861 MDExOlB1bGxSZXF1ZXN0ODc2NDc0MDk= 1027 Groupby bins empty groups rabernat 1197350 closed 0     7 2016-10-02T21:31:32Z 2016-10-03T15:22:18Z 2016-10-03T15:22:15Z MEMBER   0 pydata/xarray/pulls/1027

This PR fixes a bug in groupby_bins in which empty bins were dropped from the grouped results. Now groupby_bins restores any empty bins automatically. To recover the old behavior, one could apply dropna after a groupby operation.

Fixes #1019

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1027/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
94508580 MDExOlB1bGxSZXF1ZXN0Mzk3NTI1MTQ= 468 Option for closing files with scipy backend rabernat 1197350 closed 0     7 2015-07-11T21:24:24Z 2015-08-10T12:50:45Z 2015-08-09T00:04:12Z MEMBER   0 pydata/xarray/pulls/468

This addresses issue #463, in which open_mfdataset failed when trying to open a list of files longer than my system's ulimit. I tried to find a solution in which the underlying netcdf file objects are kept closed by default and only reopened "when needed".

I ended up subclassing scipy.io.netcdf_file and overwriting the variable attribute with a property which first checks whether the file is open or closed and opens it if needed. That was the easy part. The hard part was figuring out when to close them. The problem is that a couple of different parts of the code (e.g. each individual variable and also the datastore object itself) keep references to the netcdf_file object. In the end I used the debugger to find out when during initialization the variables were actually being read and added some calls to close() in various different places. It is relatively easy to close the files up at the end of the initialization, but it was much harder to make sure that the whole array of files is never open at the same time. I also had to disable mmap when this option is active.

This solution is messy and, moreover, extremely slow. There is a factor of ~100 performance penalty during initialization for reopening and closing the files all the time (but only a factor of 10 for the actual calculation). I am sure this could be reduced if someone who understands the code better found some judicious points at which to call close() on the netcdf_file. The loss of mmap also sucks.

This option can be accessed with the close_files key word, which I added to api.

Timing for loading and doing a calculation with close_files=True:

python count_open_files() %time mfds = xray.open_mfdataset(ddir + '/dt_global_allsat_msla_uv_2014101*.nc', engine='scipy', close_files=True) count_open_files() %time print float(mfds.variables['u'].mean()) count_open_files()

output:

3 open files CPU times: user 11.1 s, sys: 17.5 s, total: 28.5 s Wall time: 27.7 s 2 open files 0.0055650632367 CPU times: user 649 ms, sys: 974 ms, total: 1.62 s Wall time: 633 ms 2 open files

Timing for loading and doing a calculation with close_files=False (default, should revert to old behavior):

python count_open_files() %time mfds = xray.open_mfdataset(ddir + '/dt_global_allsat_msla_uv_2014101*.nc', engine='scipy', close_files=False) count_open_files() %time print float(mfds.variables['u'].mean()) count_open_files()

3 open files CPU times: user 264 ms, sys: 85.3 ms, total: 349 ms Wall time: 291 ms 22 open files 0.0055650632367 CPU times: user 174 ms, sys: 141 ms, total: 315 ms Wall time: 56 ms 22 open files

This is not a very serious pull request, but I spent all day on it, so I thought I would share. Maybe you can see some obvious way to improve it...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/468/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 7314.97ms · About: xarray-datasette