issues
12 rows where repo = 13221727, state = "open" and user = 6213168 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: comments, created_at (date), updated_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
309686915 | MDU6SXNzdWUzMDk2ODY5MTU= | 2027 | square-bracket slice a Dataset with a DataArray | crusaderky 6213168 | open | 0 | 4 | 2018-03-29T09:39:57Z | 2022-04-18T03:51:25Z | MEMBER | Given this: ``` ds = xarray.Dataset( data_vars={ 'vote': ('pupil', [5, 7, 8]), 'age': ('pupil', [15, 14, 16]) }, coords={ 'pupil': ['Alice', 'Bob', 'Charlie'] }) <xarray.Dataset> Dimensions: (pupil: 3) Coordinates: * pupil (pupil) <U7 'Alice' 'Bob' 'Charlie' Data variables: vote (pupil) int64 5 7 8 age (pupil) int64 15 14 16 ``` Why does this work: ``` ds.age[ds.vote >= 6] <xarray.DataArray 'age' (pupil: 2)> array([14, 16]) Coordinates: * pupil (pupil) <U7 'Bob' 'Charlie' ``` But this doesn't? ``` ds[ds.vote >= 6] KeyError: False
Workaround: ``` ds.sel(pupil=ds.vote >= 6) <xarray.Dataset> Dimensions: (pupil: 2) Coordinates: * pupil (pupil) <U7 'Bob' 'Charlie' Data variables: vote (pupil) int64 7 8 age (pupil) int64 14 16 ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2027/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
666896781 | MDU6SXNzdWU2NjY4OTY3ODE= | 4279 | intersphinx looks for implementation modules | crusaderky 6213168 | open | 0 | 0 | 2020-07-28T08:55:12Z | 2022-04-09T03:03:30Z | MEMBER | This is a widespread issue caused by the pattern of defining objects in private module and then exposing them to the final user by importing them in the top-level Exact same issue in different projects: - https://github.com/aio-libs/aiohttp/issues/3714 - https://jira.mongodb.org/browse/MOTOR-338 - https://github.com/tkem/cachetools/issues/178 - https://github.com/AmphoraInc/xarray_mongodb/pull/22 - https://github.com/jonathanslenders/asyncio-redis/issues/143 If a project
1. uses xarray, intersphinx, and autodoc
2. subclasses any of the classes exposed by Then Sphinx emits a warning and fails to create a hyperlink, because intersphinx uses the WorkaroundIn conf.py:
SolutionPut the above hack in |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4279/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
193294569 | MDU6SXNzdWUxOTMyOTQ1Njk= | 1151 | Scalar coords vs. concat | crusaderky 6213168 | open | 0 | 11 | 2016-12-03T15:42:18Z | 2021-07-08T17:42:18Z | MEMBER | Why does this work: ```
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1151/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
305757822 | MDU6SXNzdWUzMDU3NTc4MjI= | 1995 | apply_ufunc support for chunks on input_core_dims | crusaderky 6213168 | open | 0 | 13 | 2018-03-15T23:50:22Z | 2021-05-17T18:59:18Z | MEMBER | I am trying to optimize the following function:
where a and b are xarray.DataArray's, both with dimension x and both with dask backend. I successfully obtained a 5.5x speedup with the following:
The problem is that this introduces a (quite problematic, in my case) constraint that a and b can't be chunked on dimension x - which is theoretically avoidable as long as the kernel function doesn't need interaction between x[i] and x[j] (e.g. it can't work for an interpolator, which would require to rely on dask ghosting). ProposalAdd a parameter to apply_ufunc, e.g. my use case above would simply become:
So if I have 2 chunks in a and b on dimension x, apply_ufunc will internally do
Note that reduce_func will be invoked exclusively in presence of dask='parallelized' and when there's chunking on one or more of the input_core_dims. If reduce_func is left to None, apply_ufunc will keep crashing like it does now. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1995/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
417356439 | MDU6SXNzdWU0MTczNTY0Mzk= | 2801 | NaN-sized chunks | crusaderky 6213168 | open | 0 | 2 | 2019-03-05T15:30:14Z | 2021-04-24T02:41:34Z | MEMBER | It would be nice to have support for NaN-sized dask chunks, e.g.
ValueError: replacement data must match the Variable's shape ``` I didn't investigate but I suspect it should be trivial to fix. I'm not sure why there is a check at all? Any such health check should be in dask only IMHO. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2801/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
311573817 | MDU6SXNzdWUzMTE1NzM4MTc= | 2039 | open_mfdataset: skip loading for indexes and coordinates from all but the first file | crusaderky 6213168 | open | 0 | 1 | 2018-04-05T11:32:02Z | 2021-01-27T17:49:21Z | MEMBER | This is a follow-up from #1521. When invoking open_mfdataset, very frequently the user knows in advance that all of his coords that aren't on the concat_dim are already aligned, and may be willing to blindly trust such assumption in exchange of a huge performance boost. My production data: 200x NetCDF files on a not very performant NFS file system, concatenated on the "scenario" dimension: ``` xarray.open_mfdataset('cube.*.nc', engine='h5netcdf', concat_dim='scenario') <xarray.Dataset> Dimensions: (attribute: 1, fx_id: 40, instr_id: 10765, scenario: 500001, timestep: 1) Coordinates: * attribute (attribute) object 'THEO/Value' currency (instr_id) object 'ZAR' 'EUR' 'EUR' 'EUR' 'EUR' 'EUR' 'GBP' ... * fx_id (fx_id) object 'GBP' 'USD' 'EUR' 'JPY' 'ARS' 'AUD' 'BRL' ... * instr_id (instr_id) object 'S01626556_ZAE000204921' '537805_1275' ... * timestep (timestep) datetime64[ns] 2016-12-31 type (instr_id) object 'American' 'Bond Future' 'Bond Future' ... * scenario (scenario) object 'Base Scenario' 'SSMC_1' 'SSMC_2' ... Data variables: FX (fx_id, timestep, scenario) float64 dask.array<shape=(40, 1, 500001), chunksize=(40, 1, 2501)> instruments (instr_id, attribute, timestep, scenario) float64 dask.array<shape=(10765, 1, 1, 500001), chunksize=(10765, 1, 1, 2501)> CPU times: user 19.6 s, sys: 981 ms, total: 20.6 s Wall time: 24.4 s ``` If I skip loading and comparing the non-index coords from all 200 files: ``` xarray.open_mfdataset('cube.*.nc'), engine='h5netcdf', concat_dim='scenario', coords='all') <xarray.Dataset> Dimensions: (attribute: 1, fx_id: 40, instr_id: 10765, scenario: 500001, timestep: 1) Coordinates: * attribute (attribute) object 'THEO/Value' * fx_id (fx_id) object 'GBP' 'USD' 'EUR' 'JPY' 'ARS' 'AUD' 'BRL' ... * instr_id (instr_id) object 'S01626556_ZAE000204921' '537805_1275' ... * timestep (timestep) datetime64[ns] 2016-12-31 currency (scenario, instr_id) object dask.array<shape=(500001, 10765), chunksize=(2501, 10765)> * scenario (scenario) object 'Base Scenario' 'SSMC_1' 'SSMC_2' ... type (scenario, instr_id) object dask.array<shape=(500001, 10765), chunksize=(2501, 10765)> Data variables: FX (fx_id, timestep, scenario) float64 dask.array<shape=(40, 1, 500001), chunksize=(40, 1, 2501)> instruments (instr_id, attribute, timestep, scenario) float64 dask.array<shape=(10765, 1, 1, 500001), chunksize=(10765, 1, 1, 2501)> CPU times: user 12.7 s, sys: 305 ms, total: 13 s Wall time: 14.8 s ``` If I skip loading and comparing also the index coords from all 200 files: ``` cube = xarray.open_mfdataset(sh.resolve_env(f'{dynamic}/mtf/{cubename}/nc/cube.*.nc'), engine='h5netcdf', concat_dim='scenario', drop_variables=['attribute', 'fx_id', 'instr_id', 'timestep', 'currency', 'type']) <xarray.Dataset> Dimensions: (attribute: 1, fx_id: 40, instr_id: 10765, scenario: 500001, timestep: 1) Coordinates: * scenario (scenario) object 'Base Scenario' 'SSMC_1' 'SSMC_2' ... Dimensions without coordinates: attribute, fx_id, instr_id, timestep Data variables: FX (fx_id, timestep, scenario) float64 dask.array<shape=(40, 1, 500001), chunksize=(40, 1, 2501)> instruments (instr_id, attribute, timestep, scenario) float64 dask.array<shape=(10765, 1, 1, 500001), chunksize=(10765, 1, 1, 2501)> CPU times: user 7.31 s, sys: 61 ms, total: 7.37 s Wall time: 9.05 s ``` Proposed designAdd a new optional parameter to open_mfdataset, Algorithm
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2039/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
272004812 | MDU6SXNzdWUyNzIwMDQ4MTI= | 1699 | apply_ufunc(dask='parallelized') output_dtypes for datasets | crusaderky 6213168 | open | 0 | 8 | 2017-11-07T22:18:23Z | 2020-04-06T15:31:17Z | MEMBER | When a Dataset has variables with different dtypes, there's no way to tell apply_ufunc that the same function applied to different variables will produce different dtypes: ``` ds1 = xarray.Dataset(data_vars={'a': ('x', [1, 2]), 'b': ('x', [3.0, 4.5])}).chunk() ds2 = xarray.apply_ufunc(lambda x: x + 1, ds1, dask='parallelized', output_dtypes=[float]) ds2 <xarray.Dataset> Dimensions: (x: 2) Dimensions without coordinates: x Data variables: a (x) float64 dask.array<shape=(2,), chunksize=(2,)> b (x) float64 dask.array<shape=(2,), chunksize=(2,)> ds2.compute() <xarray.Dataset> Dimensions: (x: 2) Dimensions without coordinates: x Data variables: a (x) int64 2 3 b (x) float64 4.0 5.5 ``` Proposed solutionWhen the output is a dataset, apply_ufunc could accept either |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1699/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
503983776 | MDU6SXNzdWU1MDM5ODM3NzY= | 3382 | Improve indexing performance benchmarks | crusaderky 6213168 | open | 0 | 0 | 2019-10-08T11:20:39Z | 2019-11-14T15:52:33Z | MEMBER | As discussed in #3375 - FYI @jhamman
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3382/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
272002705 | MDU6SXNzdWUyNzIwMDI3MDU= | 1698 | apply_ufunc(dask='parallelized') to infer output_dtypes | crusaderky 6213168 | open | 0 | 3 | 2017-11-07T22:11:11Z | 2019-10-22T08:33:38Z | MEMBER | If one doesn't provide the |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1698/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
485708282 | MDU6SXNzdWU0ODU3MDgyODI= | 3268 | Stateful user-defined accessors | crusaderky 6213168 | open | 0 | 15 | 2019-08-27T09:54:28Z | 2019-10-08T11:13:25Z | MEMBER | If anybody decorates a stateful class with ```python In [1]: @xarray.register_dataarray_accessor('foo')
...: class Foo:
...: def init(self, obj):
...: self.obj = obj
...: self.x = 1
...: In [2]: a = xarray.DataArray() In [3]: a.foo.x In [4]: a.foo.x = 2 In [5]: a.foo.x In [6]: a.roll().foo.x In [7]: a.copy(deep=False).foo.x While in the case of This issue is so glaring that it makes me strongly suspect that nobody saves any state in accessor classes. This kind of use would also be problematic in practical terms, as the accessor object would have a hard time realising when its own state is no longer coherent with the referenced DataArray/Dataset. This design also carries the problem that it introduces a circular reference in the DataArray/Dataset. This means that, after someone invokes an accessor method on his DataArray/Dataset, then the whole object - including the numpy buffers! - won't be instantly collected when it's dereferenced by the user, and it will have to instead wait for the next Finally, with https://github.com/pydata/xarray/pull/3250/, this statefulness forces us to increase the RAM usage of all datasets and dataarrays by an extra slot, for all users, even if this feature is quite niche. Proposed solution Get rid of accessor caching altogether, and just recreate the accessor object from scratch every time it is invoked. In the documentation, clarify that the |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3268/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
311578894 | MDU6SXNzdWUzMTE1Nzg4OTQ= | 2040 | to_netcdf() to automatically switch to fixed-length strings for compressed variables | crusaderky 6213168 | open | 0 | 2 | 2018-04-05T11:50:16Z | 2019-01-13T01:42:03Z | MEMBER | When you have fixed-length numpy arrays of unicode characters (<U...) in a dataset, and you invoke to_netcdf() without any particular encoding, they are automatically stored as variable-length strings, unless you explicitly specify Is this in order to save disk space in case strings vary wildly in size? I may be able to see the point in this case. However, this approach is disastrous if variables are compressed, as any compression algorithm will reduce the zero-panning at the end of the strings to a negligible size. My test data: a dataset with \~50 variables, of which half are strings of 10\~100 english characters and the other half are floats, all on a single dimension with 12k points. Test 1:
Test 2:
Test 3:
ProposalIn case of string variables, if no dtype is explicitly defined, to_netcdf() should dynamically assign it to S1 if compression is enabled, str if disabled. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2040/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
330473082 | MDU6SXNzdWUzMzA0NzMwODI= | 2219 | to_netcdf broken encoding: dtype='S1' + chunksizes | crusaderky 6213168 | open | 0 | 2 | 2018-06-07T23:46:13Z | 2019-01-13T01:38:51Z | MEMBER | ``` xarray.Dataset({'x': ['foo', 'bar', 'baz']}).to_netcdf( 'foo.nc', engine='h5netcdf', encoding={'x': {'dtype': 'S1', 'zlib': True, 'chunksizes': (2, )}}) ValueError: "chunks" must have same rank as dataset shape
The workaround is to omit chunksizes or set it to True. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2219/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);