issues
16 rows where repo = 13221727, state = "open" and user = 1197350 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: comments, created_at (date), updated_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
421064313 | MDExOlB1bGxSZXF1ZXN0MjYxMjAyMDU2 | 2813 | [WIP] added protect_dataset_variables_inplace to open_zarr | rabernat 1197350 | open | 0 | 3 | 2019-03-14T14:50:15Z | 2024-03-25T14:05:24Z | MEMBER | 0 | pydata/xarray/pulls/2813 | This adds the same call to As far as I can tell, it does not work, in the sense that nothing is cached.
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2813/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | ||||||
421070999 | MDExOlB1bGxSZXF1ZXN0MjYxMjA3MTYz | 2814 | [WIP] Use zarr internal LRU caching | rabernat 1197350 | open | 0 | 2 | 2019-03-14T15:01:06Z | 2024-03-25T14:00:50Z | MEMBER | 0 | pydata/xarray/pulls/2814 | Alternative way to close #2812. This uses zarr's own caching. In contrast to #2813, this does work.
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2814/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | ||||||
663148659 | MDU6SXNzdWU2NjMxNDg2NTk= | 4242 | Expose xarray's h5py serialization capabilites as public API? | rabernat 1197350 | open | 0 | 5 | 2020-07-21T16:27:45Z | 2024-03-20T13:33:15Z | MEMBER | Xarray has a magic ability to serialize h5py datasets. We should expose this somehow and allow it to be used outside of xarray. Consider the following example: ```python import s3fs import h5py import dask.array as dsa import xarray as xr import cloudpickle url = 'noaa-goes16/ABI-L2-RRQPEF/2020/001/00/OR_ABI-L2-RRQPEF-M6_G16_s20200010000216_e20200010009524_c20200010010034.nc' fs = s3fs.S3FileSystem(anon=True) f = fs.open(url) ds = h5py.File(f, mode='r') data = dsa.from_array(ds['RRQPE']) _ = cloudpickle.dumps(data) ``` This raises However, if I read the file with xarray...
It works just fine. This has come up in several places (e.g. https://github.com/dask/s3fs/issues/337, https://github.com/dask/distributed/issues/2787). It seems like the ability to pickle these arrays is broadly useful, beyond xarray.
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4242/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
224553135 | MDU6SXNzdWUyMjQ1NTMxMzU= | 1385 | slow performance with open_mfdataset | rabernat 1197350 | open | 0 | 52 | 2017-04-26T18:06:32Z | 2024-03-14T01:31:21Z | MEMBER | We have a dataset stored across multiple netCDF files. We are getting very slow performance with Each individual netCDF file looks like this:
<xarray.Dataset> Dimensions: (npart: 8192000, time: 1) Coordinates: * time (time) datetime64[ns] 1993-01-01 * npart (npart) int32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ... Data variables: z (time, npart) float32 -0.5 -0.5 -0.5 -0.5 -0.5 -0.5 -0.5 -0.5 ... vort (time, npart) float32 -9.71733e-10 -9.72858e-10 -9.73001e-10 ... u (time, npart) float32 0.000545563 0.000544884 0.000544204 ... v (time, npart) float32 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... x (time, npart) float32 180.016 180.047 180.078 180.109 180.141 ... y (time, npart) float32 -79.9844 -79.9844 -79.9844 -79.9844 ... ``` As shown above, a single data file opens in ~60 ms. When I call
<xarray.Dataset> Dimensions: (npart: 8192000, time: 49) Coordinates: * npart (npart) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ... * time (time) datetime64[ns] 1993-01-01 1993-01-02 1993-01-03 ... Data variables: z (time, npart) float64 -0.5 -0.5 -0.5 -0.5 -0.5 -0.5 -0.5 -0.5 ... vort (time, npart) float64 -9.717e-10 -9.729e-10 -9.73e-10 -9.73e-10 ... u (time, npart) float64 0.0005456 0.0005449 0.0005442 0.0005437 ... v (time, npart) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... x (time, npart) float64 180.0 180.0 180.1 180.1 180.1 180.2 180.2 ... y (time, npart) float64 -79.98 -79.98 -79.98 -79.98 -79.98 -79.98 ... ``` It takes over 2 minutes to open the dataset. Specifying Here is ``` 748994 function calls (724222 primitive calls) in 142.160 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 49 62.455 1.275 62.458 1.275 {method 'get_indexer' of 'pandas.index.IndexEngine' objects} 49 47.207 0.963 47.209 0.963 base.py:1067(is_unique) 196 7.198 0.037 7.267 0.037 {operator.getitem} 49 4.632 0.095 4.687 0.096 netCDF4_.py:182(_open_netcdf4_group) 240 3.189 0.013 3.426 0.014 numeric.py:2476(array_equal) 98 1.937 0.020 1.937 0.020 {numpy.core.multiarray.arange} 4175/3146 1.867 0.000 9.296 0.003 {numpy.core.multiarray.array} 49 1.525 0.031 119.144 2.432 alignment.py:251(reindex_variables) 24 1.065 0.044 1.065 0.044 {method 'cumsum' of 'numpy.ndarray' objects} 12 1.010 0.084 1.010 0.084 {method 'sort' of 'numpy.ndarray' objects} 5227/4035 0.660 0.000 1.688 0.000 collections.py:50(init) 12 0.600 0.050 3.238 0.270 core.py:2761(insert) 12691/7497 0.473 0.000 0.875 0.000 indexing.py:363(shape) 110728 0.425 0.000 0.663 0.000 {isinstance} 12 0.413 0.034 0.413 0.034 {method 'flatten' of 'numpy.ndarray' objects} 12 0.341 0.028 0.341 0.028 {numpy.core.multiarray.where} 2 0.333 0.166 0.333 0.166 {pandas._join.outer_join_indexer_int64} 1 0.331 0.331 142.164 142.164 <string>:1(<module>) ``` It looks like most of the time is being spent on Is there any obvious way I could improve the load time? For example, can I give a hint to xarray that this Possibly related to #1301 and #1340. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1385/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
396806015 | MDU6SXNzdWUzOTY4MDYwMTU= | 2660 | DataArrays to/from Zarr Arrays | rabernat 1197350 | open | 0 | 7 | 2019-01-08T08:56:05Z | 2023-10-27T14:00:20Z | MEMBER | Right now, It would be nice if we could open Zarr Arrays directly as xarray DataArrays and write xarray DataArrays directly to Zarr Arrays. However, this might not make sense, because, unlike xarray DataArrays, zarr Arrays can't hold any coordinates. Just raising this idea for discussion. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2660/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
reopened | xarray 13221727 | issue | |||||||
527323165 | MDU6SXNzdWU1MjczMjMxNjU= | 3564 | DOC: from examples to tutorials | rabernat 1197350 | open | 0 | 14 | 2019-11-22T17:30:14Z | 2023-02-21T20:01:05Z | MEMBER | It's awesome to see the work we did at Scipy2019 finally hit the live docs! Thanks @keewis and @dcherian for pushing it through. Now that we have these more detailed, realistic examples, let's think about how we can take our documentation to the next level. I think we need TUTORIALS. The examples are a good start. I think we can build on these to create tutorials which walk through most of xarray's core features with a domain-specific datasets. We could have different tutorials for different fields. For example.
Each tutorial would cover the same core elements (loading data, indexing, aligning, grouping, computations, plotting, etc.), but using a familiar, real dataset, rather than the generic, made-up ones in our current docs. Yes, this would be a lot of work, but I think it would have a huge impact. Just raising here for discussion. xref #2980 #2378 #3131 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3564/reactions", "total_count": 6, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 6, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
750985364 | MDU6SXNzdWU3NTA5ODUzNjQ= | 4610 | Add histogram method | rabernat 1197350 | open | 0 | 21 | 2020-11-25T17:05:02Z | 2023-02-16T21:17:57Z | MEMBER | On today's dev call, we discussed the possible role that numpy_groupies could play in xarray (#4540). I noted that many of the use cases for advanced grouping overlap significantly with histogram-type operations. A major need that we have is to take [weighted] histograms over some, but not all, axes of DataArrays. Since groupby doesn't allow this (see #1013), we started the standalone xhistogram package. Given the broad usefulness of this feature, I suggested that we might want to deprecate xhistogram and move the histogram function to xarray. We may want to also reimplement it using numpy_groupies, which I think is smarter than our implementation in xhistogram. I've opened this issue to keep track of the idea. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4610/reactions", "total_count": 9, "+1": 9, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
421029352 | MDU6SXNzdWU0MjEwMjkzNTI= | 2812 | expose zarr caching from xarray | rabernat 1197350 | open | 0 | 12 | 2019-03-14T13:50:16Z | 2022-09-14T01:33:03Z | MEMBER | Zarr has its own internal mechanism for caching, described here: - https://zarr.readthedocs.io/en/stable/tutorial.html#distributed-cloud-storage - https://zarr.readthedocs.io/en/stable/api/storage.html#zarr.storage.LRUStoreCache However, this capability is currently inaccessible from xarray. I propose to add a new keyword |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2812/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
710357592 | MDU6SXNzdWU3MTAzNTc1OTI= | 4470 | xarray / vtk integration | rabernat 1197350 | open | 0 | 21 | 2020-09-28T15:14:32Z | 2022-06-22T18:20:39Z | MEMBER | I just had a great chat with @aashish24 and @banesullivan of Kitware about how we could improve interoperability between xarray and the VTK stack They also made me aware of pyvista, which looks very cool. As a user of both tools, I can see it would be great if I could quickly drop into VTK from xarray for advanced 3D visualization. A low-hanging fruit would be to simply be able to round-trip data between vtk and xarray in memory, much like we do with pandas. This should be doable because vtk already has a netCDF file reader. Rather than reading the data from a file, vtk could initialize its objects from an xarray dataset which, in principle, should contain all the same data / metadata Beyond this, there are many possibilities for deeper integration around the treatment of finite-volume cells, structured / unstructured meshes, etc. Possibly related to https://github.com/pydata/xarray/issues/4222. I just thought I would open this issue to track the general topic of xarray / vtk integration. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4470/reactions", "total_count": 23, "+1": 13, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 5, "rocket": 5, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
467908830 | MDExOlB1bGxSZXF1ZXN0Mjk3NDQ1NDc3 | 3131 | WIP: tutorial on merging datasets | rabernat 1197350 | open | 0 | TomNicholas 35968931 | 10 | 2019-07-15T01:28:25Z | 2022-06-09T14:50:17Z | MEMBER | 0 | pydata/xarray/pulls/3131 |
This is a start on a tutorial about merging / combining datasets. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3131/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
208312826 | MDU6SXNzdWUyMDgzMTI4MjY= | 1273 | replace a dim with a coordinate from another dataset | rabernat 1197350 | open | 0 | 4 | 2017-02-17T02:15:36Z | 2022-04-09T15:26:20Z | MEMBER | I often want a function that takes a dataarray / dataset and replaces a dimension with a coordinate from a different dataset. @shoyer proposed the following simple solution. ```python def replace_dim(da, olddim, newdim): renamed = da.rename({olddim: newdim.name})
``` Is this of broad enough interest to add a build in method for? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1273/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
439875798 | MDU6SXNzdWU0Mzk4NzU3OTg= | 2937 | encoding of boolean dtype in zarr | rabernat 1197350 | open | 0 | 3 | 2019-05-03T03:53:27Z | 2022-04-09T01:22:42Z | MEMBER | I want to store an array with 1364688000 boolean values in zarr. I will have to read this array many times, so I am trying to do it as efficiently as possible. I have noticed that, if we try to write boolean data to zarr from xarray, zarr stores it as Example
So it seems like, during serialization of bool data, xarray is converting the data to int8 and then adding a Problem descriptionSince zarr is fully capable of storing bool data directly, we should not need to encode the data as i8. I think this happens in which calls So maybe we make the boolean encoding optional? Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2937/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
517338735 | MDU6SXNzdWU1MTczMzg3MzU= | 3484 | Need documentation on sparse / cupy integration | rabernat 1197350 | open | 0 | 6 | 2019-11-04T18:57:05Z | 2022-02-24T17:12:21Z | MEMBER | In https://github.com/pydata/xarray/issues/1375#issuecomment-526432439, @fjanoos asked:
@dcherian:
If we want people to take advantage of this cool new capability, we need to document it! I'm at pydata NYC and want to share something about this, but it's hard to know where to start without docs. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3484/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1047608434 | I_kwDOAMm_X84-cTxy | 5954 | Writeable backends via entrypoints | rabernat 1197350 | open | 0 | 7 | 2021-11-08T15:47:12Z | 2021-11-09T16:28:59Z | MEMBER | The backend refactor has gone a long way towards making it easier to implement custom backend readers via entry points. However, it is still not clear how to implement a writeable backend from a third party package as an entry point. Some of the reasons for this are:
We should fix this situation! Here are the steps I would take.
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5954/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 } |
xarray 13221727 | issue | ||||||||
1006588071 | I_kwDOAMm_X847_1Cn | 5816 | Link API docs to user guide and other examples | rabernat 1197350 | open | 0 | 3 | 2021-09-24T15:34:31Z | 2021-10-10T16:39:18Z | MEMBER | Noting down a comment by @danjonesocean on Twitter: https://twitter.com/DanJonesOcean/status/1441392596362874882
Our API docs are generated by the function docstrings, and these are usually the first thing users hit when they search for functions. However, these docstring uniformly lack examples, often leaving users stuck. I see two ways to mitigate this: - Add examples directly to the docstings (suggested by @jklymak) - Cross reference other examples from the user guide or other tutorials |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5816/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 } |
xarray 13221727 | issue | ||||||||
403359614 | MDU6SXNzdWU0MDMzNTk2MTQ= | 2712 | improve docs on zarr + cloud storage | rabernat 1197350 | open | 0 | 1 | 2019-01-25T22:35:08Z | 2020-12-26T14:34:37Z | MEMBER | In the Pangeo gitter chat, @birdsarah helped identify some shortcomings in the documentation about zarr cloud storage (https://github.com/pydata/xarray/blob/master/doc/io.rst#cloud-storage-buckets). We don't mention s3fs or how to use authentication. A more detailed set of examples would probably help people struggling to make the pieces fit together. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2712/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);