html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/5631#issuecomment-885927432,https://api.github.com/repos/pydata/xarray/issues/5631,885927432,IC_kwDOAMm_X840zi4I,1328158,2021-07-23T21:39:54Z,2021-07-23T21:39:54Z,NONE,"Thanks to all for your help. Installing typing-extensions did solve the problem, thanks for the heads up @rhkleijn ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,951644054
https://github.com/pydata/xarray/issues/2507#issuecomment-433789634,https://api.github.com/repos/pydata/xarray/issues/2507,433789634,MDEyOklzc3VlQ29tbWVudDQzMzc4OTYzNA==,1328158,2018-10-29T05:07:06Z,2018-10-29T05:07:06Z,NONE,"You're a wizard, Stephan. That was my bug. I really appreciate your help!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373646673
https://github.com/pydata/xarray/issues/2507#issuecomment-433768856,https://api.github.com/repos/pydata/xarray/issues/2507,433768856,MDEyOklzc3VlQ29tbWVudDQzMzc2ODg1Ng==,1328158,2018-10-29T02:20:30Z,2018-10-29T02:20:30Z,NONE,"Any guidance as to where I should start when looking into this further?

At this point, all I've been able to surmise is that the arrays returned by the applied function appear to be present, but are present as a list of arrays rather than as a tuple. That's where things go wonky in computation.py where it's checking for a tuple instance. Is xarray responsible for putting the arrays into a tuple upon function completion, and if so where should I go to look into that?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373646673
https://github.com/pydata/xarray/issues/2507#issuecomment-433458618,https://api.github.com/repos/pydata/xarray/issues/2507,433458618,MDEyOklzc3VlQ29tbWVudDQzMzQ1ODYxOA==,1328158,2018-10-26T16:03:26Z,2018-10-26T16:03:26Z,NONE,"Thanks, Stephan. I don't think this is related to numba, as I'm running this using the environment variable `NUMBA_DISABLE_JIT=1` (I do this when debugging my code since numba JIT prevents stepping into and inspection of JIT annotated code since it's compiled). In any event, I can comment out the `@numba.jit` annotations and report any discrepancies.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373646673
https://github.com/pydata/xarray/issues/2499#issuecomment-432846749,https://api.github.com/repos/pydata/xarray/issues/2499,432846749,MDEyOklzc3VlQ29tbWVudDQzMjg0Njc0OQ==,1328158,2018-10-24T22:14:08Z,2018-10-24T22:14:08Z,NONE,"I have had some success using `apply_ufunc` in tandem with `multiprocessing`. Apparently, I can't (seamlessly) use dask arrays in place of numpy arrays within the functions where I am performing my computations, as [it's not possible to assign values into dask arrays using integer indexing](https://stackoverflow.com/questions/52933553/dask-assignment-error-when-updating-a-value-in-a-dask-array-using-typical-numpy).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372244156
https://github.com/pydata/xarray/issues/2499#issuecomment-431684522,https://api.github.com/repos/pydata/xarray/issues/2499,431684522,MDEyOklzc3VlQ29tbWVudDQzMTY4NDUyMg==,1328158,2018-10-21T16:49:35Z,2018-10-21T19:43:27Z,NONE,"Thanks, Zac.

I have used various options with the `chunks` argument, e.g. `chunks={'lat': 10, 'lon': 10}`, all of which appear to have a similar effect. Maybe I just haven't yet hit upon the sweet spot chunk sizes?

Is there a rule-of-thumb approach to determining the chunk sizes for a dataset? Perhaps before setting the chunk sizes I could open the dataset to poll the dimensions of the variables and based on that come up with reasonable chunk sizes, or none at all if the dataset is reasonably small?

My computations typically use a full time series per lat/lon point, so my assumption has been that I don't want to use chunking on the time dimension -- is this correct?

I have been testing this code using two versions of a precipitation dataset, the full resolution is (time=1481, lat=596, lon=1385) and the low-resolution version (for faster tests) is (time=1466, lat=38, lon=87). Results of `ncdump` and `repr(xr.open_dataset(netcdf_precip))` are below.

```
$ ncdump -h nclimgrid_prcp.nc
netcdf nclimgrid_prcp {
dimensions:
        time = UNLIMITED ; // (1481 currently)
        lat = 596 ;
        lon = 1385 ;
variables:
        int time(time) ;
                time:long_name = ""Time, in monthly increments"" ;
                time:standard_name = ""time"" ;
                time:calendar = ""gregorian"" ;
                time:units = ""days since 1800-01-01 00:00:00"" ;
                time:axis = ""T"" ;
        float lat(lat) ;
                lat:standard_name = ""latitude"" ;
                lat:long_name = ""Latitude"" ;
                lat:units = ""degrees_north"" ;
                lat:axis = ""Y"" ;
                lat:valid_min = 24.56253f ;
                lat:valid_max = 49.3542f ;
        float lon(lon) ;
                lon:standard_name = ""longitude"" ;
                lon:long_name = ""Longitude"" ;
                lon:units = ""degrees_east"" ;
                lon:axis = ""X"" ;
                lon:valid_min = -124.6875f ;
                lon:valid_max = -67.02084f ;
        float prcp(time, lat, lon) ;
                prcp:_FillValue = NaNf ;
                prcp:least_significant_digit = 3LL ;
                prcp:valid_min = 0.f ;
                prcp:coordinates = ""time lat lon"" ;
                prcp:long_name = ""Precipitation, monthly total"" ;
                prcp:standard_name = ""precipitation_amount"" ;
                prcp:references = ""GHCN-Monthly Version 3 (Vose et al. 2011), NCEI/NOAA, https://www.ncdc.noaa.gov/ghcnm/v3.php"" ;
                prcp:units = ""millimeter"" ;
                prcp:valid_max = 2000.f ;

// global attributes:
                :date_created = ""2018-02-15 10:29:25.485927"" ;
                :date_modified = ""2018-02-15 10:29:25.486042"" ;
                :Conventions = ""CF-1.6, ACDD-1.3"" ;
                :ncei_template_version = ""NCEI_NetCDF_Grid_Template_v2.0"" ;
                :title = ""nClimGrid"" ;
                :naming_authority = ""gov.noaa.ncei"" ;
                :standard_name_vocabulary = ""Standard Name Table v35"" ;
                :institution = ""National Centers for Environmental Information (NCEI), NOAA, Department of Commerce"" ;
                :geospatial_lat_min = 24.56253f ;
                :geospatial_lat_max = 49.3542f ;
                :geospatial_lon_min = -124.6875f ;
                :geospatial_lon_max = -67.02084f ;
                :geospatial_lat_units = ""degrees_north"" ;
                :geospatial_lon_units = ""degrees_east"" ;
}

/* repr(ds) below: */
<xarray.Dataset>
Dimensions:  (lat: 596, lon: 1385, time: 1481)
Coordinates:
  * time     (time) datetime64[ns] 1895-01-01 1895-02-01 ... 2018-05-01
  * lat      (lat) float32 49.3542 49.312534 49.270866 ... 24.6042 24.562532
  * lon      (lon) float32 -124.6875 -124.645836 ... -67.0625 -67.020836
Data variables:
    prcp     (time, lat, lon) float32 ...
Attributes:
    date_created:              2018-02-15 10:29:25.485927
    date_modified:             2018-02-15 10:29:25.486042
    Conventions:               CF-1.6, ACDD-1.3
    ncei_template_version:     NCEI_NetCDF_Grid_Template_v2.0
    title:                     nClimGrid
    naming_authority:          gov.noaa.ncei
    standard_name_vocabulary:  Standard Name Table v35
    institution:               National Centers for Environmental Information...
    geospatial_lat_min:        24.562532
    geospatial_lat_max:        49.3542
    geospatial_lon_min:        -124.6875
    geospatial_lon_max:        -67.020836
    geospatial_lat_units:      degrees_north
    geospatial_lon_units:      degrees_east
```

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372244156
https://github.com/pydata/xarray/issues/585#issuecomment-249059201,https://api.github.com/repos/pydata/xarray/issues/585,249059201,MDEyOklzc3VlQ29tbWVudDI0OTA1OTIwMQ==,1328158,2016-09-22T23:39:41Z,2017-03-07T05:32:04Z,NONE,"This is good news for me as the functions I will apply take a ndarray as
input and return a corresponding ndarray as output. Once this is available
in xarray I'll be eager to give it a whirl...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,107424151
https://github.com/pydata/xarray/issues/585#issuecomment-248969870,https://api.github.com/repos/pydata/xarray/issues/585,248969870,MDEyOklzc3VlQ29tbWVudDI0ODk2OTg3MA==,1328158,2016-09-22T17:23:22Z,2016-09-22T17:23:22Z,NONE,"I'm adding this note to express an interest in the functionality described in Stephan's original description, i.e. a `parallel_apply` method/function which would apply a function in parallel utilizing multiple CPUs. I have (finally) worked out how to use `groupby` and `apply` for my application but it would be much more useful if I could apply functions in parallel to take advantage of multiple CPUs. What's the expected effort to make something like this available in xarray? Several months ago I worked on doing this sort of thing without xarray using the multiprocessing module and a shared memory object and I may revisit that soon, but I expect that a solution using xarray will be more elegant so if such a thing is coming in the foreseeable future then I may wait on that and focus on other tasks. Can anyone advise?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,107424151
https://github.com/pydata/xarray/issues/873#issuecomment-248409634,https://api.github.com/repos/pydata/xarray/issues/873,248409634,MDEyOklzc3VlQ29tbWVudDI0ODQwOTYzNA==,1328158,2016-09-20T19:37:07Z,2016-09-20T19:37:07Z,NONE,"Thanks for this clarification, Stephan. Apparently I didn't read the API
documentation closely enough, as I was assuming that the function is
applied to the underlying ndarray rather than to all data variables of a
Dataset object. Now that I've taken the approach you suggested I'm cooking
with gas, and it's very encouraging. I really appreciate your help.

​--James
​

On Tue, Sep 20, 2016 at 11:54 AM, Stephan Hoyer notifications@github.com
wrote:

> GroupBy is working as intended here. ds.groupby('something').apply(func)
> calls func on objects of the same type as ds. If you group a Dataset,
> each time you apply to a Dataset, too.
> 
> You can certainly still use np.convolve, but you'll need to manually
> apply it to numpy arrays extracted from a Dataset and then rebuild another
> Dataset or DataArray.
> 
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> https://github.com/pydata/xarray/issues/873#issuecomment-248345053, or mute
> the thread
> https://github.com/notifications/unsubscribe-auth/ABREHkSB-DvW5OD3DHuYxYzP7l7yfjL2ks5qsAGcgaJpZM4IwE4g
> .
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,158958801
https://github.com/pydata/xarray/issues/873#issuecomment-248216388,https://api.github.com/repos/pydata/xarray/issues/873,248216388,MDEyOklzc3VlQ29tbWVudDI0ODIxNjM4OA==,1328158,2016-09-20T06:42:53Z,2016-09-20T06:42:53Z,NONE,"Thanks, Stephan. My code uses numpy.convolve() in several key places, so if that function is a deal breaker for using xarray then I'll hold off until that's fixed. In the meantime if there's anything else I can do to help you work this out then please let me know.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,158958801
https://github.com/pydata/xarray/issues/873#issuecomment-242535724,https://api.github.com/repos/pydata/xarray/issues/873,242535724,MDEyOklzc3VlQ29tbWVudDI0MjUzNTcyNA==,1328158,2016-08-25T20:48:45Z,2016-08-25T20:48:45Z,NONE,"Thanks, Stephan. In general things appear to be working much more as
expected now, probably (hopefully) this is just an edge case/nuance that
won't be too difficult for you guys to address. If so and if I don't run
across any other issues then my code will be dramatically simplified by
leveraging xarray rather than writing code to enable shared memory objects
for the multiprocessing side of things (my assumption being that you guys
have done a better job of that than I can).

A gist with example code and a smallish data file attached to the comment
is here: https://gist.github.com/monocongo/e8e883c2355f7a92bb0b9d24db5407a8

Please let me know if I can do anything else to help you help me. Godspeed!

--James

On Tue, Aug 23, 2016 at 12:42 AM, Stephan Hoyer notifications@github.com
wrote:

> Could you please share a data file and/or code which I can run to
> reproduce each of these issues?
> 
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> https://github.com/pydata/xarray/issues/873#issuecomment-241625354, or mute
> the thread
> https://github.com/notifications/unsubscribe-auth/ABREHhoGvwv30D2Qk858lHB-U5oWtRQnks5qinpDgaJpZM4IwE4g
> .
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,158958801
https://github.com/pydata/xarray/issues/873#issuecomment-241540585,https://api.github.com/repos/pydata/xarray/issues/873,241540585,MDEyOklzc3VlQ29tbWVudDI0MTU0MDU4NQ==,1328158,2016-08-22T20:32:20Z,2016-08-22T20:32:20Z,NONE,"I get the following error now when I try to run the gist code referenced in the original message above: 

```
$ python -u xarray_gist.py /dev/shm/nclimgrid_prcp_reduced.nc nclimgrid_prcp_doubled.nc
Traceback (most recent call last):
  File ""xarray_gist.py"", line 45, in <module>
    encoding = {variable_name: {'_FillValue': np.nan, 'dtype': 'float32'}})
  File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/core/dataset.py"", line 782, in to_netcdf
    engine=engine, encoding=encoding)
  File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/api.py"", line 354, in to_netcdf
    dataset.dump_to_store(store, sync=sync, encoding=encoding)
  File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/core/dataset.py"", line 728, in dump_to_store
    store.store(variables, attrs, check_encoding)
  File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/common.py"", line 234, in store
    check_encoding_set)
  File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/common.py"", line 209, in store
    self.set_variables(variables, check_encoding_set)
  File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/common.py"", line 219, in set_variables
    target, source = self.prepare_variable(name, v, check)
  File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/netCDF4_.py"", line 266, in prepare_variable
    raise_on_invalid=check_encoding)
  File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/netCDF4_.py"", line 167, in _extract_nc4_encoding
    ' %r' % (backend, invalid))
ValueError: unexpected encoding parameters for 'netCDF4' backend:  ['dtype']

```

Additionally I see the following errors when I run some other code which uses the same dataset.groupby().apply() technique (the trouble appears to show up within numpy.convolve()):

```
 Traceback (most recent call last):
  File ""C:\home\git\indices\src\main\python\indices\spi_gamma_xarray.py"", line 46, in <module>
    dataset = dataset.groupby('grid_cells').apply(function_to_be_applied)
  File ""C:\Anaconda3\lib\site-packages\xarray\core\groupby.py"", line 567, in apply
    combined = self._concat(applied)
  File ""C:\Anaconda3\lib\site-packages\xarray\core\groupby.py"", line 572, in _concat
    applied_example, applied = peek_at(applied)
  File ""C:\Anaconda3\lib\site-packages\xarray\core\utils.py"", line 90, in peek_at
    peek = next(gen)
  File ""C:\Anaconda3\lib\site-packages\xarray\core\groupby.py"", line 566, in <genexpr>
    applied = (func(ds, **kwargs) for ds in self._iter_grouped())
  File ""C:\home\git\indices\src\main\python\indices\spi_gamma_xarray.py"", line 27, in function_to_be_applied
    valid_max)
  File ""C:\Anaconda3\lib\site-packages\numpy\core\numeric.py"", line 1005, in convolve
    return multiarray.correlate(a, v[::-1], mode)
TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'
Traceback (most recent call last):
  File ""C:\home\git\indices\src\main\python\indices\spi_gamma_xarray.py"", line 46, in <module>
    dataset = dataset.groupby('grid_cells').apply(function_to_be_applied)
  File ""C:\Anaconda3\lib\site-packages\xarray\core\groupby.py"", line 567, in apply
    combined = self._concat(applied)
  File ""C:\Anaconda3\lib\site-packages\xarray\core\groupby.py"", line 572, in _concat
    applied_example, applied = peek_at(applied)
  File ""C:\Anaconda3\lib\site-packages\xarray\core\utils.py"", line 90, in peek_at
    peek = next(gen)
  File ""C:\Anaconda3\lib\site-packages\xarray\core\groupby.py"", line 566, in <genexpr>
    applied = (func(ds, **kwargs) for ds in self._iter_grouped())
  File ""C:\home\git\indices\src\main\python\indices\spi_gamma_xarray.py"", line 27, in function_to_be_applied
    valid_max)
  File ""C:\Anaconda3\lib\site-packages\numpy\core\numeric.py"", line 1005, in convolve
    return multiarray.correlate(a, v[::-1], mode)
TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'
```

Please advise if I can provide any further information which might help work this out, or if I have made wrong assumptions as to how this feature should be used. Thanks.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,158958801
https://github.com/pydata/xarray/pull/818#issuecomment-219231028,https://api.github.com/repos/pydata/xarray/issues/818,219231028,MDEyOklzc3VlQ29tbWVudDIxOTIzMTAyOA==,1328158,2016-05-14T16:56:37Z,2016-05-14T16:56:37Z,NONE,"I would also like to do what is described below but so far have had little
success using xarray.

I have time series data (x years of monthly values) at each lat/lon point
of a grid (x*12 times, lons, lats). I want to apply a function f() against
the time series to return a corresponding time series of values. I then
write these values to an output NetCDF which corresponds to the input
NetCDF in terms of dimensions and coordinate variables. So instead of
looping over every lat and every lon I want to apply f() in a vectorized
manner such as what's described for xarray's groupby (in order to gain the
expected performance from using xarray for the split-apply-combine
pattern), but it needs to work for more than a single dimension which is
the current capability.

Has anyone done what is described above using xarray? What sort of
performance gains can be expected using your approach?

Thanks in advance for any help with this topic. My apologies if there is a
more appropriate forum for this sort of discussion (please redirect if so),
as this may not be applicable to the original issue...

--James

On Wed, May 11, 2016 at 2:24 AM, naught101 notifications@github.com wrote:

> I want to be able to run a scikit-learn model over a bunch of variables in
> a 3D (lat/lon/time) dataset, and return values for each coordinate point.
> Is something like this multi-dimensional groupby required (I'm thinking
> groupby(lat, lon) => 2D matrices that can be fed straight into
> scikit-learn), or is there already some other mechanism that could achieve
> something like this? Or is the best way at the moment just to create a null
> dataset, and loop over lat/lon and fill in the blanks as you go?
> 
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly or view it on GitHub
> https://github.com/pydata/xarray/pull/818#issuecomment-218372591
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176