issues
50 rows where state = "closed", type = "issue" and user = 6213168 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1678587031 | I_kwDOAMm_X85kDTSX | 7777 | xarray minimum versions policy is more aggressive than NEP-29 | crusaderky 6213168 | closed | 0 | 1 | 2023-04-21T14:06:15Z | 2023-05-01T22:26:57Z | 2023-05-01T22:26:57Z | MEMBER | What is your issue?In #4179 / #4907, the xarray policy around minimum supported version of dependencies was changed, with the reasoning that the previous policy (based on NEP-29) was too aggressive. Ironically, this caused xarray to drop Python 3.8 on Jan 26th (#7461), 3 months before what NEP-29 recommends (Apr 14th). This is hard to defend - and in fact it sparked discontent (see late comments in #7461). Regardless of what policy xarray decides to use internally, it should never be more aggressive than NEP-29. The xarray documentation is also incorrect, as it states "Python: 24 months (NEP-29)" which is not, in fact, in NEP-29. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7777/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
309691307 | MDU6SXNzdWUzMDk2OTEzMDc= | 2028 | slice using non-index coordinates | crusaderky 6213168 | closed | 0 | 21 | 2018-03-29T09:53:33Z | 2023-02-08T19:47:22Z | 2022-10-03T10:38:57Z | MEMBER | It should be relatively straightforward to allow slicing on coordinates that are not backed by an IndexVariable, or in other words coordinates that are on a dimension with a different name, as long as they are 1-dimensional (unsure about the multidimensional case). E.g. given this array: ``` a = xarray.DataArray( [10, 20, 30], dims=['country'], coords={ 'country': ['US', 'Germany', 'France'], 'currency': ('country', ['USD', 'EUR', 'EUR']) }) <xarray.DataArray (country: 3)> array([10, 20, 30]) Coordinates: * country (country) <U7 'US' 'Germany' 'France' currency (country) <U3 'USD' 'EUR' 'EUR' ``` This is currently not possible: ``` a.sel(currency='EUR') ValueError: dimensions or multi-index levels ['currency'] do not exist ``` It should be interpreted as a shorthand for: ``` a.sel(country=a.currency == 'EUR') <xarray.DataArray (country: 2)> array([20, 30]) Coordinates: * country (country) <U7 'Germany' 'France' currency (country) <U3 'EUR' 'EUR' ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2028/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
166441031 | MDU6SXNzdWUxNjY0NDEwMzE= | 907 | unstack() treats string coords as objects | crusaderky 6213168 | closed | 0 | 7 | 2016-07-19T21:33:28Z | 2022-09-27T12:11:36Z | 2022-09-27T12:11:35Z | MEMBER | unstack() should be smart enough to recognise that all labels in a coord are strings, and convert them to numpy strings. This is particularly relevant e.g. if you want to dump the xarray to netcdf and then read it with a non-python library. ``` python import xarray a = xarray.DataArray([[1,2],[3,4]], dims=['x', 'y'], coords={'x': ['x1', 'x2'], 'y': ['y1', 'y2']}) a ```
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/907/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
264509098 | MDU6SXNzdWUyNjQ1MDkwOTg= | 1624 | Improve documentation and error validation for set_options(arithmetic_join) | crusaderky 6213168 | closed | 0 | 7 | 2017-10-11T09:05:49Z | 2022-06-25T20:01:07Z | 2022-06-25T20:01:07Z | MEMBER | The documentation for set_options laconically says:
leaving the user wonder what the other options are. Also, the set_options code does not make any kind of domain check on the possible values. By scanning the code I gathered that the valid values (and their meanings) should be the same as align(join=...), but I'd like confirmation on that... |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1624/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
502130982 | MDU6SXNzdWU1MDIxMzA5ODI= | 3370 | Hundreds of Sphinx errors | crusaderky 6213168 | closed | 0 | 14 | 2019-10-03T15:17:09Z | 2022-04-17T20:33:05Z | 2022-04-17T20:33:05Z | MEMBER | sphinx-build emits a ton of errors that need to be polished out: https://readthedocs.org/projects/xray/builds/ -> latest -> open last step Options for the long term: - Change the "Docs" azure pipelines job to crash if there are new failures. From past experience though, this should come together with a sensible way to whitelist errors that can't be fixed. This will severely slow down development as PRs will systematically fail on such a check. - Add a task in the release process where, immediately before closing a release, the maintainer needs to manually go through the sphinx-build log and fix any new issues. This would be a major extra piece of work for the maintainer. I am honestly not excited by either of the above. Alternative suggestions are welcome. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3370/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
505550120 | MDU6SXNzdWU1MDU1NTAxMjA= | 3391 | map_blocks doesn't work when dask isn't installed | crusaderky 6213168 | closed | 0 | 1 | 2019-10-10T22:53:55Z | 2021-11-24T17:25:24Z | 2021-11-24T17:25:24Z | MEMBER | Iterative improvement on #3276 @dcherian map_blocks crashes with ImportError if dask isn't installed, even if it's legal to run it on a DataArray/Dataset without any dask variables. This forces writers of extension libraries to either not use map_blocks, add dask as a strict requirement, or write a switch in their own code. Please change the code so that it works without dask (you'll need to write a stub of |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3391/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
502082831 | MDU6SXNzdWU1MDIwODI4MzE= | 3369 | Define a process to test the readthedocs CI before merging into master | crusaderky 6213168 | closed | 0 | 3 | 2019-10-03T13:56:02Z | 2020-01-22T15:40:34Z | 2020-01-22T15:40:33Z | MEMBER | This is an offshoot of #3358. The readthedocs CI has a bad habit of failing even after the Azure Pipelines job "Docs" has succeeded. After major changes that impact the documentation, and before merging everything into master, it would be advisable to explicitly verify that RTD builds correctly. So far I tried to 1. create my own readthedocs project, https://readthedocs.org/projects/crusaderky-xarray/ 2. point it to my fork https://github.com/crusaderky/xarray/ 3. enable build for the branch I want to merge This is currently failing because of an issue with versioneer, which incorrectly sets In the master RTD project https://readthedocs.org/projects/xray/, I can instead read So far the only workaround I could find was to downgrade pandas to 0.24 in |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3369/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
510915725 | MDU6SXNzdWU1MTA5MTU3MjU= | 3434 | v0.14.1 Release | crusaderky 6213168 | closed | 0 | 18 | 2019-10-22T21:08:15Z | 2019-11-19T23:44:52Z | 2019-11-19T23:44:52Z | MEMBER | I think with the multiple recent breakages we've just had due to dependency upgrades, we should push out a patch release with some haste. Please comment/add/object Must have
Nice to have
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3434/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
329251342 | MDU6SXNzdWUzMjkyNTEzNDI= | 2214 | Simplify graph of DataArray.chunk() | crusaderky 6213168 | closed | 0 | 2 | 2018-06-04T23:30:19Z | 2019-11-10T04:34:58Z | 2019-11-10T04:34:58Z | MEMBER | ```
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2214/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
506885041 | MDU6SXNzdWU1MDY4ODUwNDE= | 3397 | "How Do I..." formatting issues | crusaderky 6213168 | closed | 0 | 4 | 2019-10-14T21:32:27Z | 2019-10-16T21:41:06Z | 2019-10-16T21:41:06Z | MEMBER | @dcherian The new page http://xarray.pydata.org/en/stable/howdoi.html (#3357) is somewhat painful to read on readthedocs. The table goes out of the screen and one is forced to scroll left and right non stop. Maybe a better alternative could be with Sphinx definitions syntax (which allows for automatic reflowing)?
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3397/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
481250429 | MDU6SXNzdWU0ODEyNTA0Mjk= | 3222 | Minimum versions for optional libraries | crusaderky 6213168 | closed | 0 | 12 | 2019-08-15T17:18:16Z | 2019-10-08T21:23:47Z | 2019-10-08T21:23:47Z | MEMBER | In CI there are:
There are no tests for legacy versions of the optional libraries. Today I tried downgrading dask in the py37 environment to dask=1.1.2, which is 6 months old... ...it's a bloodbath. 383 errors of the most diverse kind. In the codebase I found mentions to much older minimum versions: installing.rst mentions dask >=0.16.1, and Dataset.chunk() even asks for dask>=0.9. It think we should add CI tests for old versions of the optional dependencies. What policy should we adopt when we find an incompatibility? How old a library should be not to bother fixing bugs and just require a newer version? I personally would go for an aggressive 6 months worth' of backwards compatibility; less if the time it takes to fix the issues is excessive. The tests should run on py36 because py35 builds are becoming very scarce in anaconda. This has the outlook of being an exercise in extreme frustration. I'm afraid I personally hold zero interest towards packages older than the latest available in the anaconda official repo, so I'm not volunteering for this one (sorry). I'd like to hear other people's opinions and/or offers of self-immolation... :) |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3222/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
470714103 | MDU6SXNzdWU0NzA3MTQxMDM= | 3154 | pynio causes dependency conflicts in py36 CI build | crusaderky 6213168 | closed | 0 | 9 | 2019-07-20T21:00:43Z | 2019-10-03T15:22:17Z | 2019-10-03T15:22:17Z | MEMBER | On Saturday night, all Python 3.6 CI builds started failing. Python 3.7 is unaffected. See https://dev.azure.com/xarray/xarray/_build/results?buildId=362&view=logs MacOSX py36:
Linux py36:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3154/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
501461397 | MDU6SXNzdWU1MDE0NjEzOTc= | 3366 | CI offline? | crusaderky 6213168 | closed | 0 | 2 | 2019-10-02T12:35:00Z | 2019-10-02T17:32:03Z | 2019-10-02T17:32:03Z | MEMBER | Azure pipelines is not being triggered by PRs this morning. See https://github.com/pydata/xarray/pull/3358 and https://github.com/pydata/xarray/pull/3365. Last run was 12 hours ago. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3366/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
478343417 | MDU6SXNzdWU0NzgzNDM0MTc= | 3191 | DataArray.chunk() from sparse array produces malformed dask array | crusaderky 6213168 | closed | 0 | 1 | 2019-08-08T09:08:56Z | 2019-08-12T21:02:24Z | 2019-08-12T21:02:24Z | MEMBER | 3117 by @nvictus introduces support for sparse in plain xarray.dask already supports it. Running with: - xarray git head - dask 2.2.0 - numpy 1.16.4 - sparse 0.7.0 - NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1 ```python
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3191/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
202423683 | MDU6SXNzdWUyMDI0MjM2ODM= | 1224 | fast weighted sum | crusaderky 6213168 | closed | 0 | 5 | 2017-01-23T00:29:19Z | 2019-08-09T08:36:11Z | 2019-08-09T08:36:11Z | MEMBER | In my project I'm struggling with weighted sums of 2000-4000 dask-based xarrays. The time to reach the final dask-based array, the size of the final dask dict, and the time to compute the actual result are horrendous. So I wrote the below which - as laborious as it may look - gives a performance boost nothing short of miraculous. At the bottom you'll find some benchmarks as well. https://gist.github.com/crusaderky/62832a5ffc72ccb3e0954021b0996fdf In my project, this deflated the size of the final dask dict from 5.2 million keys to 3.3 million and cut a 30% from the time required to define it. I think it's generic enough to be a good addition to the core xarray module. Impressions? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1224/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
466750687 | MDU6SXNzdWU0NjY3NTA2ODc= | 3092 | black formatting | crusaderky 6213168 | closed | 0 | 14 | 2019-07-11T08:43:55Z | 2019-08-08T22:34:53Z | 2019-08-08T22:34:53Z | MEMBER | I, like many others, have irreversibly fallen in love with black. Can we apply it to the existing codebase and as an enforced CI test? The only (big) problem is that developers will need to manually apply it to any open branches and then merge from master - and even then, merging likely won't be trivial. How did the dask project tackle the issue? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3092/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
475599589 | MDU6SXNzdWU0NzU1OTk1ODk= | 3174 | CI failure downloading external data | crusaderky 6213168 | closed | 0 | 2 | 2019-08-01T10:21:36Z | 2019-08-07T08:41:13Z | 2019-08-07T08:41:13Z | MEMBER | The 'Docs' ci project is failing because http://naciscdn.org is unresponsive: Excerpt: ``` /usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/cartopy/io/init.py:260: DownloadWarning: Downloading: http://naciscdn.org/naturalearth/110m/physical/ne_110m_coastline.zip warnings.warn('Downloading: {}'.format(url), DownloadWarning) Exception occurred: File "/usr/share/miniconda/envs/xarray-tests/lib/python3.7/urllib/request.py", line 1319, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [Errno 110] Connection timed out> The full traceback has been saved in /tmp/sphinx-err-nq73diee.log, if you want to report the issue to the developers. Please also report this if it was a user error, so that a better error message can be provided next time. A bug report can be filed in the tracker at https://github.com/sphinx-doc/sphinx/issues. Thanks! [error]Bash exited with code '2'.[section]Finishing: Build HTML docs``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3174/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
466815556 | MDU6SXNzdWU0NjY4MTU1NTY= | 3094 | REGRESSION: copy(deep=True) casts unicode indices to object | crusaderky 6213168 | closed | 0 | 3 | 2019-07-11T10:46:28Z | 2019-08-02T14:02:50Z | 2019-08-02T14:02:50Z | MEMBER | Dataset.copy(deep=True) and DataArray.copy (deep=True/False) accidentally cast IndexVariable's with dtype='<U*' to object. Same applies to copy.copy() and copy.deepcopy(). This is a regression in xarray >= 0.12.2. xarray 0.12.1 and earlier are unaffected. ``` In [1]: ds = xarray.Dataset( ...: coords={'x': ['foo'], 'y': ('x', ['bar'])}, ...: data_vars={'z': ('x', ['baz'])}) In [2]: ds In [3]: ds.copy() In [4]: ds.copy(deep=True) In [5]: ds.z In [6]: ds.z.copy() In [7]: ds.z.copy(deep=True) |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3094/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
475244610 | MDU6SXNzdWU0NzUyNDQ2MTA= | 3171 | distributed.Client.compute fails on DataArray | crusaderky 6213168 | closed | 0 | 2 | 2019-07-31T16:33:01Z | 2019-08-01T21:43:11Z | 2019-08-01T21:43:11Z | MEMBER | As of - dask 2.1.0 - distributed 2.1.0 - xarray 0.12.1 or git head (didn't try older versions): ```python
KeyError Traceback (most recent call last) <ipython-input-8-2dbfe1b2ff17> in <module> ----> 1 client.compute(ds.d).result() /anaconda3/lib/python3.7/site-packages/distributed/client.py in result(self, timeout) 226 result = self.client.sync(self._result, callback_timeout=timeout, raiseit=False) 227 if self.status == "error": --> 228 six.reraise(*result) 229 elif self.status == "cancelled": 230 raise result /anaconda3/lib/python3.7/site-packages/six.py in reraise(tp, value, tb) 690 value = tp() 691 if value.traceback is not tb: --> 692 raise value.with_traceback(tb) 693 raise value 694 finally: ~/PycharmProjects/xarray/xarray/core/dataarray.py in _dask_finalize() 706 def _dask_finalize(results, func, args, name): 707 ds = func(results, *args) --> 708 variable = ds._variables.pop(_THIS_ARRAY) 709 coords = ds._variables 710 return DataArray(variable, coords, name=name, fastpath=True) ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3171/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
252548859 | MDU6SXNzdWUyNTI1NDg4NTk= | 1524 | (trivial) xarray.quantile silently resolves dask arrays | crusaderky 6213168 | closed | 0 | 9 | 2017-08-24T09:54:11Z | 2019-07-23T00:18:06Z | 2017-08-28T17:31:57Z | MEMBER | In variable.py, line 1116, you're missing a raise statement:
Currently looking into extending dask.percentile() to support more than 1D arrays, and then use it in xarray too. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1524/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
465984161 | MDU6SXNzdWU0NjU5ODQxNjE= | 3089 | Python 3.5.0-3.5.1 support | crusaderky 6213168 | closed | 0 | 5 | 2019-07-09T21:04:28Z | 2019-07-13T21:58:31Z | 2019-07-13T21:58:31Z | MEMBER | Python 3.5.0 has gone out of the conda-forge repository. 3.5.1 is still there... for now. The anaconda repository starts directly from 3.5.4. 3.5.0 and 3.5.1 are a colossal pain in the back for typing support. Is this a good time to increase the requirement to >= 3.5.2? I honestly can't think how anybody could be unable to upgrade to the latest available 3.5 with minimal effort... |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3089/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
264517839 | MDU6SXNzdWUyNjQ1MTc4Mzk= | 1625 | Option for arithmetics to ignore nans created by alignment | crusaderky 6213168 | closed | 0 | 3 | 2017-10-11T09:33:34Z | 2019-07-11T09:48:07Z | 2019-07-11T09:48:07Z | MEMBER | Can anybody tell me if there is anybody who benefits from this behaviour? I can't think of any good use cases. ``` wallet = xarray.DataArray([50, 70], dims=['currency'], coords={'currency': ['EUR', 'USD']}) restaurant_bill = xarray.DataArray([30], dims=['currency'], coords={'currency': ['USD']}) with xarray.set_options(arithmetic_join="outer"): print(wallet - restaurant_bill) <xarray.DataArray (currency: 2)> array([ nan, 40.]) Coordinates: * currency (currency) object 'EUR' 'USD' ``` While it is fairly clear why it can be desirable to have Proposal:
- add a parameter to In theory the setting could be left as an opt-in as |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1625/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
341355638 | MDU6SXNzdWUzNDEzNTU2Mzg= | 2289 | DataArray.to_csv() | crusaderky 6213168 | closed | 0 | 6 | 2018-07-15T21:56:20Z | 2019-03-12T15:01:18Z | 2019-03-12T15:01:18Z | MEMBER | I'm using xarray to aggregate 38 GB worth of NetCDF data into a bunch of CSV reports. I have two problems:
To solve both problems, I wrote a new function: http://xarray-extras.readthedocs.io/en/latest/api/csv.html And now my high level wrapper code looks like this: ``` DataSet from 200 .nc files, with a total of 500000 points on the 'row' dimensionnc = xarray.open_mfdataset('inputs..nc') reports = [ # DataArrays with shape (500000, 2000), with the rows split in 200 chunks gen_report0(nc), gen_report1(nc), .... gen_report39(nc), ] futures = [ # dask.delayed objects to_csv(reports[0], 'report0.csv.gz', compression='gzip'), to_csv(reports[1], 'report1.csv.gz', compression='gzip'), .... to_csv(reports[39], 'report39.csv.gz', compression='gzip'), ] dask.compute(futures) ``` The function is currently production quality in xarray-extras, but it would be very easy to refactor it as a method of xarray.DataArray in the main library. Opinions? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2289/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
166439490 | MDU6SXNzdWUxNjY0Mzk0OTA= | 906 | unstack() sorts data alphabetically | crusaderky 6213168 | closed | 0 | 14 | 2016-07-19T21:25:26Z | 2019-02-23T12:47:00Z | 2019-02-23T12:47:00Z | MEMBER | DataArray.unstack() sorts the data alphabetically by label. Besides being poor for performance, this is very problematic whenever the order matters, and the labels are not in alphabetical order to begin with. ``` python import xarray import pandas index = [ ['x1', 'first' ], ['x1', 'second'], ['x1', 'third' ], ['x1', 'fourth'], ['x0', 'first' ], ['x0', 'second'], ['x0', 'third' ], ['x0', 'fourth'], ] index = pandas.MultiIndex.from_tuples(index, names=['x', 'count']) s = pandas.Series(list(range(8)), index) a = xarray.DataArray(s) a ```
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/906/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
168469112 | MDU6SXNzdWUxNjg0NjkxMTI= | 926 | stack() on dask array produces inefficient chunking | crusaderky 6213168 | closed | 0 | 4 | 2016-07-30T14:12:34Z | 2019-02-01T16:04:43Z | 2019-02-01T16:04:43Z | MEMBER | Whe the stack() method is used on a xarray with dask backend, one would expect that every output chunk is produced by exactly 1 input chunk. This is not the case, as stack() actually produces an extremely fragmented dask array: https://gist.github.com/crusaderky/07991681d49117bfbef7a8870e3cba67 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/926/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
193294729 | MDU6SXNzdWUxOTMyOTQ3Mjk= | 1152 | Scalar coords seep into index coords | crusaderky 6213168 | closed | 0 | 8 | 2016-12-03T15:43:53Z | 2019-02-01T16:02:12Z | 2019-02-01T16:02:12Z | MEMBER | Is this by design? I can't put any sense in it ```
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1152/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
172291585 | MDU6SXNzdWUxNzIyOTE1ODU= | 979 | align() should align chunks | crusaderky 6213168 | closed | 0 | 4 | 2016-08-20T21:25:01Z | 2019-01-24T17:19:30Z | 2019-01-24T17:19:30Z | MEMBER | In the xarray docs I read
While chunk auto-alignment could be done within the dask library, that would be limited to arrays with the same dimensionality and same dims order. For example it would not be possible to have a dask library call to align the chunks on xarrays with the following dims: - (time, latitude, longitude) - (time) - (longitude, latitude) even if it makes perfect sense in xarray. I think xarray.align() should take care of it automatically. A safe algorithm would be to always scale down the chunksize when in conflict. This would prevent having chunks larger than expected, and should minimise (in a greedy way) the number of operations. It's also a good idea on dask.distributed, where merging two chunks could cause one of them to travel on the network - which is very expensive. e.g. to reconcile chunksizes a: (5, 10, 6) b: (5, 7, 9) the algorithm would rechunk both arrays to (5, 7, 3, 6). Finally, when served with a numpy-based array and a dask-based array, align() should convert the numpy array to dask. The critical use case that would benefit from this behaviour is when align() is invoked inside a broadcast() between a tiny constant you just loaded from csv/pandas/pure python list/whatever - e.g. dims=(time, ) shape=(100, ) - and a huge dask-backed array e.g. dims=(time, scenario) shape=(100, 2**30) chunks=(25, 2**20). |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/979/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
296927704 | MDU6SXNzdWUyOTY5Mjc3MDQ= | 1909 | Failure in test_cross_engine_read_write_netcdf3 | crusaderky 6213168 | closed | 0 | 3 | 2018-02-13T23:48:44Z | 2019-01-13T20:56:14Z | 2019-01-13T20:56:14Z | MEMBER | Two unit tests are failing in the latest git master: - GenericNetCDFDataTest.test_cross_engine_read_write_netcdf3 - GenericNetCDFDataTestAutocloseTrue.test_cross_engine_read_write_netcdf3 Both with the message: ``` xarray/tests/test_backends.py:1558: xarray/backends/api.py:286: in open_dataset autoclose=autoclose) xarray/backends/netCDF4_.py:275: in open ds = opener() xarray/backends/netCDF4_.py:199: in _open_netcdf4_group ds = nc4.Dataset(filename, mode=mode, **kwargs) netCDF4/_netCDF4.pyx:2015: in netCDF4._netCDF4.Dataset.init ???
netCDF4/_netCDF4.pyx:1636: OSError ``` Attaching conda list: conda.txt |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1909/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
339611449 | MDU6SXNzdWUzMzk2MTE0NDk= | 2273 | to_netcdf uses deprecated and unnecessary dask call | crusaderky 6213168 | closed | 0 | 4 | 2018-07-09T21:20:20Z | 2018-07-31T20:03:41Z | 2018-07-31T19:42:20Z | MEMBER | ```
Stack trace: ```
@shoyer opinion? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2273/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
324040111 | MDU6SXNzdWUzMjQwNDAxMTE= | 2149 | [REGRESSION] to_netcdf doesn't accept dtype=S1 encoding anymore | crusaderky 6213168 | closed | 0 | 5 | 2018-05-17T14:09:15Z | 2018-06-01T01:09:38Z | 2018-06-01T01:09:38Z | MEMBER | In xarray 0.10.4, the dtype encoding in to_netcdf has stopped working, for all engines: ```
xarray/backends/netCDF4_.py in _extract_nc4_variable_encoding(variable, raise_on_invalid, lsd_okay, h5py_okay, backend, unlimited_dims) 196 if invalid: 197 raise ValueError('unexpected encoding parameters for %r backend: ' --> 198 ' %r' % (backend, invalid)) 199 else: 200 for k in list(encoding): ValueError: unexpected encoding parameters for 'netCDF4' backend: ['dtype'] ``` I'm still trying to figure out how the regression tests didn't pick it up and what change introduced it. @shoyer I'm working on this as my top priority. Do you agree this is serious enough for an emergency re-release? (0.10.4.1 or 0.10.5, your choice) |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2149/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
324410381 | MDU6SXNzdWUzMjQ0MTAzODE= | 2161 | Regression: Dataset.update(Dataset) | crusaderky 6213168 | closed | 0 | 0 | 2018-05-18T13:26:58Z | 2018-05-29T04:34:47Z | 2018-05-29T04:34:47Z | MEMBER | Dataset().update(Dataset()) FutureWarning: iteration over an xarray.Dataset will change in xarray v0.11 to only include data variables, not coordinates. Iterate over the Dataset.variables property instead to preserve existing behavior in a forwards compatible manner. This is a regression in xarray 0.10.4. @shoyer this isn't serious enough to warrant an immediate release on its own, but we're already doing one so we might as well include it. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2161/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
324409064 | MDU6SXNzdWUzMjQ0MDkwNjQ= | 2160 | pandas-0.23 breaks stack with duplicated indices | crusaderky 6213168 | closed | 0 | 3 | 2018-05-18T13:23:26Z | 2018-05-26T03:29:46Z | 2018-05-26T03:29:46Z | MEMBER | In this script: ``` import pandas import xarray df = pandas.DataFrame( [[1, 2], [3, 4]], index=['foo', 'foo'], columns=['bar', 'baz']) print(df.stack()) a = xarray.DataArray(df) print(a.stack(s=a.dims)) ``` The first part works both with pandas 0.22 and 0.23. The second part works in xarray 0.10. 4 + pandas 0.22, and crashes with pandas 0.23:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2160/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
253476466 | MDU6SXNzdWUyNTM0NzY0NjY= | 1536 | Better compression algorithms for NetCDF | crusaderky 6213168 | closed | 0 | 28 | 2017-08-28T22:35:31Z | 2018-05-08T02:25:40Z | 2018-05-08T02:25:40Z | MEMBER | As of today, I already tested that, once you manage to write to disk with LZF (using h5netcdf directly), Options:
- write a new engine for |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1536/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
317421267 | MDU6SXNzdWUzMTc0MjEyNjc= | 2079 | New feature: interp1d | crusaderky 6213168 | closed | 0 | 8 | 2018-04-24T22:45:03Z | 2018-05-06T19:30:32Z | 2018-05-06T19:30:32Z | MEMBER | I've written a series of wrappers for the 1-dimensional scipy interpolators. Prototype code and colourful demo plots: https://gist.github.com/crusaderky/b0aa6b8fdf6e036cb364f6f40476cc67 Features
Design hacks
Limitations
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2079/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
316618290 | MDU6SXNzdWUzMTY2MTgyOTA= | 2074 | xarray.dot() dask problems | crusaderky 6213168 | closed | 0 | 10 | 2018-04-22T22:18:10Z | 2018-05-04T21:51:00Z | 2018-05-04T21:51:00Z | MEMBER | xarray.dot() has comparable performance with numpy.einsum. However, when it uses a dask backend, it's much slower than the new dask.array.einsum function (https://github.com/dask/dask/pull/3412). The performance gap widens when the dimension upon which you are reducing is chunked. Also, for some reason The proposed solution is to simply wait for https://github.com/dask/dask/pull/3412 to reach the next release and then reimplement xarray.dot to use dask.array.einsum. This means that dask users will lose the ability to use xarray.dot if they upgrade xarray version but not dask version, but I believe it shouldn't be a big problem for most? ``` import numpy import dask.array import xarray def bench(tchunk, a_by_a, dims, iis): print(f"\nbench({tchunk}, {a_by_a}, {dims}, {iis})")
bench(100, False, ['t'], '...i,...i')
bench( 20, False, ['t'], '...i,...i')
bench(100, True, ['t'], '...i,...i')
bench( 20, True, ['t'], '...i,...i')
bench(100, True, ['s', 't'], '...ij,...ij')
bench( 20, True, ['s', 't'], '...ij,...ij')
bench(20, False, ['t'], ...i,...i) xarray.dot(numpy backend): 297 ms ± 16.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) numpy.einsum: 254 ms ± 15.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) xarray.dot(dask backend): 732 ms ± 74.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) dask.array.einsum: 274 ms ± 12.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) bench(100, True, ['t'], ...i,...i) xarray.dot(numpy backend): 438 ms ± 43.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) numpy.einsum: 415 ms ± 17.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) xarray.dot(dask backend): 633 ms ± 31.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) dask.array.einsum: 431 ms ± 17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) bench(20, True, ['t'], ...i,...i)
xarray.dot(numpy backend):
457 ms ± 17.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
numpy.einsum:
463 ms ± 24.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
xarray.dot(dask backend):
dimension 't' on 0th function argument to apply_ufunc with dask='parallelized' consists of multiple chunks, but is also a core dimension. To fix, rechunk into a single dask array chunk along this dimension, i.e., bench(100, True, ['s', 't'], ...ij,...ij) xarray.dot(numpy backend): 418 ms ± 14.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) numpy.einsum: 444 ms ± 43.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) xarray.dot(dask backend): 384 ms ± 57.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) dask.array.einsum: 415 ms ± 19.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) bench(20, True, ['s', 't'], ...ij,...ij) xarray.dot(numpy backend): 489 ms ± 2.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) numpy.einsum: 443 ms ± 3.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) xarray.dot(dask backend): 585 ms ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) dask.array.einsum: 455 ms ± 13.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2074/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
320104170 | MDU6SXNzdWUzMjAxMDQxNzA= | 2103 | An elegant way to guarantee single chunk along dim | crusaderky 6213168 | closed | 0 | 2 | 2018-05-03T22:40:48Z | 2018-05-04T20:11:30Z | 2018-05-04T20:10:50Z | MEMBER | Algorithms that are wrapped by I have lost count of how many times I prefixed my invocations of apply_ufunc on a DataArray with the same blurb, over and over again:
I can see a few possible solutions to the problem: Design 1Change DataArray.chunk etc. to accept a special chunk size, e.g. -1, which means "whatever the size of that dim is". The above would become:
Design 2Add properties to DataArray and Variable, The above would become:
Design 3Change The above would become:
Design 4Introduce a convenience method for DataArray, Dataset, and Variable,
``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2103/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
271998358 | MDU6SXNzdWUyNzE5OTgzNTg= | 1697 | apply_ufunc(dask='parallelized') won't accept scalar *args | crusaderky 6213168 | closed | 0 | 0.10 2415632 | 1 | 2017-11-07T21:56:11Z | 2017-11-10T16:46:26Z | 2017-11-10T16:46:26Z | MEMBER | As of xarray-0.10-rc1: Works: ``` import xarray import scipy.stats a = xarray.DataArray([1,2], dims=['x']) xarray.apply_ufunc(scipy.stats.norm.cdf, a, 0, 1) <xarray.DataArray (x: 2)> array([ 0.841345, 0.97725 ]) Dimensions without coordinates: x ``` Broken: ``` xarray.apply_ufunc( scipy.stats.norm.cdf, a.chunk(), 0, 1, dask='parallelized', output_dtypes=[a.dtype] ).compute() IndexError Traceback (most recent call last) <ipython-input-35-1d4025e1ebdb> in <module>() ----> 1 xarray.apply_ufunc(scipy.stats.norm.cdf, a.chunk(), 0, 1, dask='parallelized', output_dtypes=[a.dtype]).compute() ~/anaconda3/lib/python3.6/site-packages/xarray/core/computation.py in apply_ufunc(func, args, kwargs) 913 join=join, 914 exclude_dims=exclude_dims, --> 915 keep_attrs=keep_attrs) 916 elif any(isinstance(a, Variable) for a in args): 917 return variables_ufunc(args) ~/anaconda3/lib/python3.6/site-packages/xarray/core/computation.py in apply_dataarray_ufunc(func, args, kwargs) 210 211 data_vars = [getattr(a, 'variable', a) for a in args] --> 212 result_var = func(data_vars) 213 214 if signature.num_outputs > 1: ~/anaconda3/lib/python3.6/site-packages/xarray/core/computation.py in apply_variable_ufunc(func, args, kwargs) 561 raise ValueError('unknown setting for dask array handling in ' 562 'apply_ufunc: {}'.format(dask)) --> 563 result_data = func(input_data) 564 565 if signature.num_outputs > 1: ~/anaconda3/lib/python3.6/site-packages/xarray/core/computation.py in <lambda>(arrays) 555 func = lambda arrays: _apply_with_dask_atop( 556 numpy_func, arrays, input_dims, output_dims, signature, --> 557 output_dtypes, output_sizes) 558 elif dask == 'allowed': 559 pass ~/anaconda3/lib/python3.6/site-packages/xarray/core/computation.py in _apply_with_dask_atop(func, args, input_dims, output_dims, signature, output_dtypes, output_sizes) 624 for element in (arg, dims[-getattr(arg, 'ndim', 0):])] 625 return da.atop(func, out_ind, *atop_args, dtype=dtype, concatenate=True, --> 626 new_axes=output_sizes) 627 628 ~/anaconda3/lib/python3.6/site-packages/dask/array/core.py in atop(func, out_ind, args, kwargs) 2231 raise ValueError("Must specify dtype of output array") 2232 -> 2233 chunkss, arrays = unify_chunks(args) 2234 for k, v in new_axes.items(): 2235 chunkss[k] = (v,) ~/anaconda3/lib/python3.6/site-packages/dask/array/core.py in unify_chunks(args, *kwargs) 2117 chunks = tuple(chunkss[j] if a.shape[n] > 1 else a.shape[n] 2118 if not np.isnan(sum(chunkss[j])) else None -> 2119 for n, j in enumerate(i)) 2120 if chunks != a.chunks and all(a.chunks): 2121 arrays.append(a.rechunk(chunks)) ~/anaconda3/lib/python3.6/site-packages/dask/array/core.py in <genexpr>(.0) 2117 chunks = tuple(chunkss[j] if a.shape[n] > 1 else a.shape[n] 2118 if not np.isnan(sum(chunkss[j])) else None -> 2119 for n, j in enumerate(i)) 2120 if chunks != a.chunks and all(a.chunks): 2121 arrays.append(a.rechunk(chunks)) IndexError: tuple index out of range ``` Workaround: ``` xarray.apply_ufunc( scipy.stats.norm.cdf, a, kwargs={'loc': 0, 'scale': 1}, dask='parallelized', output_dtypes=[a.dtype]).compute() <xarray.DataArray (x: 2)> array([ 0.841345, 0.97725 ]) Dimensions without coordinates: x ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1697/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | |||||
252541496 | MDU6SXNzdWUyNTI1NDE0OTY= | 1521 | open_mfdataset reads coords from disk multiple times | crusaderky 6213168 | closed | 0 | 14 | 2017-08-24T09:29:57Z | 2017-10-09T21:15:31Z | 2017-10-09T21:15:31Z | MEMBER | I have 200x of the below dataset, split on the 'scenario' axis:
I individually dump them to disk with Dataset.to_netcdf(fname, engine='h5netcdf'). Then I try loading them back up with open_mfdataset, but it's mortally slow: ``` %%time xarray.open_mfdataset('*.nc', engine='h5netcdf') Wall time: 30.3 s ``` The problem is caused by the coords being read from disk multiple times. Workaround:
Proposed solutions: 1. Implement the above workaround directly inside open_mfdataset() 2. change open_dataset() to always eagerly load the coords to memory, regardless of the chunks parameter. Is there any valid use case where lazy coords are actually desirable? An additional, more radical observation is that, very frequently, a user knows in advance that all coords are aligned. In this use case, the user could explicitly request xarray to blindly trust this assumption, and thus skip loading the coords not based on concat_dim in all datasets beyond the first. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1521/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
259935100 | MDU6SXNzdWUyNTk5MzUxMDA= | 1586 | Dataset.copy() drops encoding | crusaderky 6213168 | closed | 0 | 6 | 2017-09-22T20:58:30Z | 2017-10-08T16:01:20Z | 2017-10-08T16:01:20Z | MEMBER |
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1586/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
253279298 | MDU6SXNzdWUyNTMyNzkyOTg= | 1531 | @requires_pinio mass disables unrelated tests | crusaderky 6213168 | closed | 0 | 3 | 2017-08-28T09:45:29Z | 2017-10-04T23:12:48Z | 2017-10-04T23:12:48Z | MEMBER | I think I'm losing my sanity here. I have a anaconda3 Python 3.6 environment with all required and optional dependencies of xarray installed and updated to the latest available version, except pyNio. If I run test.py on the latest xarray package from the git tip, the vast majority of the tests in test_backends.py are skipped - including those that have nothing to do with pyNio! e.g.
If I comment out line 1462: ``` @requires_scipy @requires_pynioclass TestPyNio(CFEncodedDataTest, Only32BitTypes, TestCase): ``` Then magically everything starts working again!
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1531/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
260097045 | MDU6SXNzdWUyNjAwOTcwNDU= | 1588 | concat() loads dask arrays if the first array is numpy | crusaderky 6213168 | closed | 0 | 0 | 2017-09-24T16:29:09Z | 2017-09-25T00:55:36Z | 2017-09-25T00:55:36Z | MEMBER |
``` xarray.concat([ xarray.DataArray([1]).chunk(), xarray.DataArray([1]), ], dim='dim_0') Out[1]: <xarray.DataArray (dim_0: 2)> dask.array<shape=(2,), dtype=int64, chunksize=(1,)> Dimensions without coordinates: dim_0 xarray.concat([ xarray.DataArray([1]), xarray.DataArray([1]).chunk(), ], dim='dim_0') Out[2]: <xarray.DataArray (dim_0: 2)> array([1, 1]) Dimensions without coordinates: dim_0 ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1588/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
252543868 | MDU6SXNzdWUyNTI1NDM4Njg= | 1522 | Dataset.__repr__ computes dask variables | crusaderky 6213168 | closed | 0 | 8 | 2017-08-24T09:37:12Z | 2017-09-21T20:55:43Z | 2017-09-21T20:55:43Z | MEMBER | DataArray.__repr__ and Variable.__repr__ print a placeholder if the data uses the dask backend. Not so Dataset.__repr__, which tries computing the data before printing a tiny preview of it. This issue is extremely annoying when working in Jupyter, and particularly acute if the chunks are very big or are at the end of a very long chain of computation. For data variables, the expected behaviour is to print a placeholder just like DataArray does. For coords, we could either - print a placeholders (same treatment as data variables) - automatically invoke load() when the coord is added to the dataset - see #1521 for discussion. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1522/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
252547273 | MDU6SXNzdWUyNTI1NDcyNzM= | 1523 | Pass arguments to dask.compute() | crusaderky 6213168 | closed | 0 | 5 | 2017-08-24T09:48:14Z | 2017-09-05T19:55:46Z | 2017-09-05T19:55:46Z | MEMBER | I work with a very large dask-based algorithm in xarray, and I do my optimization by hand before hitting compute(). In other cases, I need using multiple dask schedulers at once (e.g. a multithreaded one for numpy-based work and a multiprocessing one for pure python work). This change proposal (which I'm happy to do) is about accepting *args, **kwds parameters in all .compute(), .load(), and .persist() xarray methods and pass them verbatim to the underlying dask compute() and persist() functions. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1523/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
184722754 | MDU6SXNzdWUxODQ3MjI3NTQ= | 1058 | shallow copies become deep copies when pickling | crusaderky 6213168 | closed | 0 | 10 | 2016-10-23T23:12:03Z | 2017-02-05T21:13:41Z | 2017-01-17T01:53:18Z | MEMBER | Whenever xarray performs a shallow copy of any object (DataArray, Dataset, Variable), it creates a view of the underlying numpy arrays. This design fails when the object is pickled. Whenever a numpy view is pickled, it becomes a regular array: ```
This has devastating effects in my use case. I start from a dask-backed DataArray with a dimension of 500,000 elements and no coord, so the coord is auto-assigned by xarray as an incremental integer. Then, I perform ~3000 transformations and dump the resulting dask-backed array with pickle. However, I have to dump all intermediate steps for audit purposes as well. This means that xarray invokes numpy.arange to create (500k * 4 bytes) ~ 2MB worth of coord, then creates 3000 views of it, which the moment they're pickled expand to several GBs as they become 3000 independent copies. I see a few possible solutions to this: 1. Implement pandas range indexes in xarray. This would be nice as a general thing and would solve my specific problem, but anybody who does not fall in my very specific use case won't benefit from it. 2. Do not auto-generate a coord with numpy.arange() if the user doesn't explicitly ask for it; just leave a None and maybe generate it on the fly when requested. Again, this would solve my specific problem but not other people's. 3. Force the coord to be a dask.array.arange. Actually supporting unconverted dask arrays as coordinates would take a considerable amount of work; they would get converted to numpy several times, and other issues. Again it wouldn't solve the general problem. 4. Fix the issue upstream in numpy. I didn't look into it yet and it's definitely worth investigating, but I found about it as early as 2012, so I suspect there might be some pretty good reason why it works like that... 5. Whenever xarray performs a shallow copy, take the numpy array instead of creating a view. I implemented (5) as a workaround in my getstate method. Before:
Workaround: ```
def get_base(array):
if not isinstance(array, numpy.ndarray):
return array for v in cache.values(): if isinstance(v, xarray.DataArray): v.data = get_base(v.data) for coord in v.coords.values(): coord.data = get_base(coord.data) elif isinstance(v, xarray.Dataset): for var in v.variables(): var.data = get_base(var.data) ``` After:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1058/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
172290413 | MDU6SXNzdWUxNzIyOTA0MTM= | 978 | broadcast() broken on dask backend | crusaderky 6213168 | closed | 0 | 4 | 2016-08-20T20:56:33Z | 2016-12-09T20:28:42Z | 2016-12-09T20:28:42Z | MEMBER | ``` python
The problem is actually somewhere in the constructor of DataArray.
In alignment.py:362, we have After that however there's a new issue: whenever broadcast adds a dimension to an array, it creates it in a single chunk, as opposed to copying the chunking of the other arrays. This can easily call a host to go out of memory, and makes it harder to work with the arrays afterwards because chunks won't match. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/978/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
188395497 | MDU6SXNzdWUxODgzOTU0OTc= | 1102 | full_like, zeros_like, ones_like | crusaderky 6213168 | closed | 0 | 2 | 2016-11-10T01:12:58Z | 2016-11-28T03:42:39Z | 2016-11-28T03:42:39Z | MEMBER | I'd like to add the following top-level functions to xarray: ``` def const_like(array, value=0): """Return a new array with the same shape of array and the given constant value. If array is dask-backed, return a new dask-backed array with the same chunks.
def zeros_like(array): return const_like(array, 0) def ones_like(array): return const_like(array, 1) ``` The above would need to be expanded to support Dataset and Variable objects. In Datasets, the data_vars would be constants whereas all other variables would be copied verbatim. Thoughts? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1102/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
166287789 | MDU6SXNzdWUxNjYyODc3ODk= | 902 | Pickle and .value vs. dask backend | crusaderky 6213168 | closed | 0 | 6 | 2016-07-19T09:34:30Z | 2016-11-14T16:56:44Z | 2016-11-14T16:56:44Z | MEMBER | Pickling a xarray.DataArray with dask backend will cause it to resolve the .data to a numpy array. This is not desirable, as there's legitimate use cases where you may want to e.g. save a computation for later, or send it somewhere across the network. Analogously, auto-converting a dask xarray to a numpy xarray as soon as you invoke the .value property is probably nice when you are working on a jupyter terminal, but not in a general purpose situation, particularly when xarray is used at the foundation of a very complex framework. Most of my headaches so far have been caused trying to figure out when, where and why the dask backend was replaced with numpy. IMHO a module-wide switch to disable implicit dask->numpy conversion would be a nice solution. A new method, compute(), could explicitly convert in place from dask to numpy. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/902/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
168470276 | MDU6SXNzdWUxNjg0NzAyNzY= | 927 | align() and broadcast() before concat() | crusaderky 6213168 | closed | 0 | 9 | 2016-07-30T14:35:33Z | 2016-08-21T01:00:27Z | 2016-08-21T01:00:27Z | MEMBER | I have two arrays with misaligned dimensions x and y, and I want to concatenate them on dimension y. I can't seem to find any way to do it, because: 1. If I do not invoke align(), it will fail complaining that dimension x is not aligned 2. if I invoke align(), it will create unwanted elements on dimension y See example: https://gist.github.com/crusaderky/a96db5b59396d94fe1e22694bc091d55 Am I missing something obvious?
Possibly align() should accept an optional parameter e.g. Thanks in advance |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/927/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
166286097 | MDU6SXNzdWUxNjYyODYwOTc= | 901 | Pickle xarray.ufuncs | crusaderky 6213168 | closed | 0 | 3 | 2016-07-19T09:26:06Z | 2016-08-02T17:34:15Z | 2016-08-02T17:34:15Z | MEMBER | It's currently impossible to pickle xarray.ufuncs. import xarray.ufuncs, pickle pickle.dumps(xarray.ufuncs.maximum) AttributeError: Can't pickle local object '_create_op.<locals>.func' |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/901/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
159117442 | MDU6SXNzdWUxNTkxMTc0NDI= | 876 | xarray.ufuncs.maximum() between constant and dask array | crusaderky 6213168 | closed | 0 | 1 | 2016-06-08T09:23:01Z | 2016-07-20T05:51:02Z | 2016-07-20T05:51:02Z | MEMBER | Take a dask-backed array:
In the second case, xarray.ufuncs.maximum is resolving the dask array - in other wods, it's doing numpy.maximum(0, a.values) |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/876/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);