html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/5631#issuecomment-885927432,https://api.github.com/repos/pydata/xarray/issues/5631,885927432,IC_kwDOAMm_X840zi4I,1328158,2021-07-23T21:39:54Z,2021-07-23T21:39:54Z,NONE,"Thanks to all for your help. Installing typing-extensions did solve the problem, thanks for the heads up @rhkleijn ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,951644054 https://github.com/pydata/xarray/issues/2507#issuecomment-433789634,https://api.github.com/repos/pydata/xarray/issues/2507,433789634,MDEyOklzc3VlQ29tbWVudDQzMzc4OTYzNA==,1328158,2018-10-29T05:07:06Z,2018-10-29T05:07:06Z,NONE,"You're a wizard, Stephan. That was my bug. I really appreciate your help!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373646673 https://github.com/pydata/xarray/issues/2507#issuecomment-433768856,https://api.github.com/repos/pydata/xarray/issues/2507,433768856,MDEyOklzc3VlQ29tbWVudDQzMzc2ODg1Ng==,1328158,2018-10-29T02:20:30Z,2018-10-29T02:20:30Z,NONE,"Any guidance as to where I should start when looking into this further? At this point, all I've been able to surmise is that the arrays returned by the applied function appear to be present, but are present as a list of arrays rather than as a tuple. That's where things go wonky in computation.py where it's checking for a tuple instance. Is xarray responsible for putting the arrays into a tuple upon function completion, and if so where should I go to look into that?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373646673 https://github.com/pydata/xarray/issues/2507#issuecomment-433458618,https://api.github.com/repos/pydata/xarray/issues/2507,433458618,MDEyOklzc3VlQ29tbWVudDQzMzQ1ODYxOA==,1328158,2018-10-26T16:03:26Z,2018-10-26T16:03:26Z,NONE,"Thanks, Stephan. I don't think this is related to numba, as I'm running this using the environment variable `NUMBA_DISABLE_JIT=1` (I do this when debugging my code since numba JIT prevents stepping into and inspection of JIT annotated code since it's compiled). In any event, I can comment out the `@numba.jit` annotations and report any discrepancies.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373646673 https://github.com/pydata/xarray/issues/2499#issuecomment-432846749,https://api.github.com/repos/pydata/xarray/issues/2499,432846749,MDEyOklzc3VlQ29tbWVudDQzMjg0Njc0OQ==,1328158,2018-10-24T22:14:08Z,2018-10-24T22:14:08Z,NONE,"I have had some success using `apply_ufunc` in tandem with `multiprocessing`. Apparently, I can't (seamlessly) use dask arrays in place of numpy arrays within the functions where I am performing my computations, as [it's not possible to assign values into dask arrays using integer indexing](https://stackoverflow.com/questions/52933553/dask-assignment-error-when-updating-a-value-in-a-dask-array-using-typical-numpy).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372244156 https://github.com/pydata/xarray/issues/2499#issuecomment-431684522,https://api.github.com/repos/pydata/xarray/issues/2499,431684522,MDEyOklzc3VlQ29tbWVudDQzMTY4NDUyMg==,1328158,2018-10-21T16:49:35Z,2018-10-21T19:43:27Z,NONE,"Thanks, Zac. I have used various options with the `chunks` argument, e.g. `chunks={'lat': 10, 'lon': 10}`, all of which appear to have a similar effect. Maybe I just haven't yet hit upon the sweet spot chunk sizes? Is there a rule-of-thumb approach to determining the chunk sizes for a dataset? Perhaps before setting the chunk sizes I could open the dataset to poll the dimensions of the variables and based on that come up with reasonable chunk sizes, or none at all if the dataset is reasonably small? My computations typically use a full time series per lat/lon point, so my assumption has been that I don't want to use chunking on the time dimension -- is this correct? I have been testing this code using two versions of a precipitation dataset, the full resolution is (time=1481, lat=596, lon=1385) and the low-resolution version (for faster tests) is (time=1466, lat=38, lon=87). Results of `ncdump` and `repr(xr.open_dataset(netcdf_precip))` are below. ``` $ ncdump -h nclimgrid_prcp.nc netcdf nclimgrid_prcp { dimensions: time = UNLIMITED ; // (1481 currently) lat = 596 ; lon = 1385 ; variables: int time(time) ; time:long_name = ""Time, in monthly increments"" ; time:standard_name = ""time"" ; time:calendar = ""gregorian"" ; time:units = ""days since 1800-01-01 00:00:00"" ; time:axis = ""T"" ; float lat(lat) ; lat:standard_name = ""latitude"" ; lat:long_name = ""Latitude"" ; lat:units = ""degrees_north"" ; lat:axis = ""Y"" ; lat:valid_min = 24.56253f ; lat:valid_max = 49.3542f ; float lon(lon) ; lon:standard_name = ""longitude"" ; lon:long_name = ""Longitude"" ; lon:units = ""degrees_east"" ; lon:axis = ""X"" ; lon:valid_min = -124.6875f ; lon:valid_max = -67.02084f ; float prcp(time, lat, lon) ; prcp:_FillValue = NaNf ; prcp:least_significant_digit = 3LL ; prcp:valid_min = 0.f ; prcp:coordinates = ""time lat lon"" ; prcp:long_name = ""Precipitation, monthly total"" ; prcp:standard_name = ""precipitation_amount"" ; prcp:references = ""GHCN-Monthly Version 3 (Vose et al. 2011), NCEI/NOAA, https://www.ncdc.noaa.gov/ghcnm/v3.php"" ; prcp:units = ""millimeter"" ; prcp:valid_max = 2000.f ; // global attributes: :date_created = ""2018-02-15 10:29:25.485927"" ; :date_modified = ""2018-02-15 10:29:25.486042"" ; :Conventions = ""CF-1.6, ACDD-1.3"" ; :ncei_template_version = ""NCEI_NetCDF_Grid_Template_v2.0"" ; :title = ""nClimGrid"" ; :naming_authority = ""gov.noaa.ncei"" ; :standard_name_vocabulary = ""Standard Name Table v35"" ; :institution = ""National Centers for Environmental Information (NCEI), NOAA, Department of Commerce"" ; :geospatial_lat_min = 24.56253f ; :geospatial_lat_max = 49.3542f ; :geospatial_lon_min = -124.6875f ; :geospatial_lon_max = -67.02084f ; :geospatial_lat_units = ""degrees_north"" ; :geospatial_lon_units = ""degrees_east"" ; } /* repr(ds) below: */ Dimensions: (lat: 596, lon: 1385, time: 1481) Coordinates: * time (time) datetime64[ns] 1895-01-01 1895-02-01 ... 2018-05-01 * lat (lat) float32 49.3542 49.312534 49.270866 ... 24.6042 24.562532 * lon (lon) float32 -124.6875 -124.645836 ... -67.0625 -67.020836 Data variables: prcp (time, lat, lon) float32 ... Attributes: date_created: 2018-02-15 10:29:25.485927 date_modified: 2018-02-15 10:29:25.486042 Conventions: CF-1.6, ACDD-1.3 ncei_template_version: NCEI_NetCDF_Grid_Template_v2.0 title: nClimGrid naming_authority: gov.noaa.ncei standard_name_vocabulary: Standard Name Table v35 institution: National Centers for Environmental Information... geospatial_lat_min: 24.562532 geospatial_lat_max: 49.3542 geospatial_lon_min: -124.6875 geospatial_lon_max: -67.020836 geospatial_lat_units: degrees_north geospatial_lon_units: degrees_east ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372244156 https://github.com/pydata/xarray/issues/585#issuecomment-249059201,https://api.github.com/repos/pydata/xarray/issues/585,249059201,MDEyOklzc3VlQ29tbWVudDI0OTA1OTIwMQ==,1328158,2016-09-22T23:39:41Z,2017-03-07T05:32:04Z,NONE,"This is good news for me as the functions I will apply take a ndarray as input and return a corresponding ndarray as output. Once this is available in xarray I'll be eager to give it a whirl...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,107424151 https://github.com/pydata/xarray/issues/585#issuecomment-248969870,https://api.github.com/repos/pydata/xarray/issues/585,248969870,MDEyOklzc3VlQ29tbWVudDI0ODk2OTg3MA==,1328158,2016-09-22T17:23:22Z,2016-09-22T17:23:22Z,NONE,"I'm adding this note to express an interest in the functionality described in Stephan's original description, i.e. a `parallel_apply` method/function which would apply a function in parallel utilizing multiple CPUs. I have (finally) worked out how to use `groupby` and `apply` for my application but it would be much more useful if I could apply functions in parallel to take advantage of multiple CPUs. What's the expected effort to make something like this available in xarray? Several months ago I worked on doing this sort of thing without xarray using the multiprocessing module and a shared memory object and I may revisit that soon, but I expect that a solution using xarray will be more elegant so if such a thing is coming in the foreseeable future then I may wait on that and focus on other tasks. Can anyone advise? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,107424151 https://github.com/pydata/xarray/issues/873#issuecomment-248409634,https://api.github.com/repos/pydata/xarray/issues/873,248409634,MDEyOklzc3VlQ29tbWVudDI0ODQwOTYzNA==,1328158,2016-09-20T19:37:07Z,2016-09-20T19:37:07Z,NONE,"Thanks for this clarification, Stephan. Apparently I didn't read the API documentation closely enough, as I was assuming that the function is applied to the underlying ndarray rather than to all data variables of a Dataset object. Now that I've taken the approach you suggested I'm cooking with gas, and it's very encouraging. I really appreciate your help. ​--James ​ On Tue, Sep 20, 2016 at 11:54 AM, Stephan Hoyer notifications@github.com wrote: > GroupBy is working as intended here. ds.groupby('something').apply(func) > calls func on objects of the same type as ds. If you group a Dataset, > each time you apply to a Dataset, too. > > You can certainly still use np.convolve, but you'll need to manually > apply it to numpy arrays extracted from a Dataset and then rebuild another > Dataset or DataArray. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > https://github.com/pydata/xarray/issues/873#issuecomment-248345053, or mute > the thread > https://github.com/notifications/unsubscribe-auth/ABREHkSB-DvW5OD3DHuYxYzP7l7yfjL2ks5qsAGcgaJpZM4IwE4g > . ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,158958801 https://github.com/pydata/xarray/issues/873#issuecomment-248216388,https://api.github.com/repos/pydata/xarray/issues/873,248216388,MDEyOklzc3VlQ29tbWVudDI0ODIxNjM4OA==,1328158,2016-09-20T06:42:53Z,2016-09-20T06:42:53Z,NONE,"Thanks, Stephan. My code uses numpy.convolve() in several key places, so if that function is a deal breaker for using xarray then I'll hold off until that's fixed. In the meantime if there's anything else I can do to help you work this out then please let me know. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,158958801 https://github.com/pydata/xarray/issues/873#issuecomment-242535724,https://api.github.com/repos/pydata/xarray/issues/873,242535724,MDEyOklzc3VlQ29tbWVudDI0MjUzNTcyNA==,1328158,2016-08-25T20:48:45Z,2016-08-25T20:48:45Z,NONE,"Thanks, Stephan. In general things appear to be working much more as expected now, probably (hopefully) this is just an edge case/nuance that won't be too difficult for you guys to address. If so and if I don't run across any other issues then my code will be dramatically simplified by leveraging xarray rather than writing code to enable shared memory objects for the multiprocessing side of things (my assumption being that you guys have done a better job of that than I can). A gist with example code and a smallish data file attached to the comment is here: https://gist.github.com/monocongo/e8e883c2355f7a92bb0b9d24db5407a8 Please let me know if I can do anything else to help you help me. Godspeed! --James On Tue, Aug 23, 2016 at 12:42 AM, Stephan Hoyer notifications@github.com wrote: > Could you please share a data file and/or code which I can run to > reproduce each of these issues? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > https://github.com/pydata/xarray/issues/873#issuecomment-241625354, or mute > the thread > https://github.com/notifications/unsubscribe-auth/ABREHhoGvwv30D2Qk858lHB-U5oWtRQnks5qinpDgaJpZM4IwE4g > . ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,158958801 https://github.com/pydata/xarray/issues/873#issuecomment-241540585,https://api.github.com/repos/pydata/xarray/issues/873,241540585,MDEyOklzc3VlQ29tbWVudDI0MTU0MDU4NQ==,1328158,2016-08-22T20:32:20Z,2016-08-22T20:32:20Z,NONE,"I get the following error now when I try to run the gist code referenced in the original message above: ``` $ python -u xarray_gist.py /dev/shm/nclimgrid_prcp_reduced.nc nclimgrid_prcp_doubled.nc Traceback (most recent call last): File ""xarray_gist.py"", line 45, in encoding = {variable_name: {'_FillValue': np.nan, 'dtype': 'float32'}}) File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/core/dataset.py"", line 782, in to_netcdf engine=engine, encoding=encoding) File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/api.py"", line 354, in to_netcdf dataset.dump_to_store(store, sync=sync, encoding=encoding) File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/core/dataset.py"", line 728, in dump_to_store store.store(variables, attrs, check_encoding) File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/common.py"", line 234, in store check_encoding_set) File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/common.py"", line 209, in store self.set_variables(variables, check_encoding_set) File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/common.py"", line 219, in set_variables target, source = self.prepare_variable(name, v, check) File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/netCDF4_.py"", line 266, in prepare_variable raise_on_invalid=check_encoding) File ""/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/netCDF4_.py"", line 167, in _extract_nc4_encoding ' %r' % (backend, invalid)) ValueError: unexpected encoding parameters for 'netCDF4' backend: ['dtype'] ``` Additionally I see the following errors when I run some other code which uses the same dataset.groupby().apply() technique (the trouble appears to show up within numpy.convolve()): ``` Traceback (most recent call last): File ""C:\home\git\indices\src\main\python\indices\spi_gamma_xarray.py"", line 46, in dataset = dataset.groupby('grid_cells').apply(function_to_be_applied) File ""C:\Anaconda3\lib\site-packages\xarray\core\groupby.py"", line 567, in apply combined = self._concat(applied) File ""C:\Anaconda3\lib\site-packages\xarray\core\groupby.py"", line 572, in _concat applied_example, applied = peek_at(applied) File ""C:\Anaconda3\lib\site-packages\xarray\core\utils.py"", line 90, in peek_at peek = next(gen) File ""C:\Anaconda3\lib\site-packages\xarray\core\groupby.py"", line 566, in applied = (func(ds, **kwargs) for ds in self._iter_grouped()) File ""C:\home\git\indices\src\main\python\indices\spi_gamma_xarray.py"", line 27, in function_to_be_applied valid_max) File ""C:\Anaconda3\lib\site-packages\numpy\core\numeric.py"", line 1005, in convolve return multiarray.correlate(a, v[::-1], mode) TypeError: Cannot cast array data from dtype('float64') to dtype(' dataset = dataset.groupby('grid_cells').apply(function_to_be_applied) File ""C:\Anaconda3\lib\site-packages\xarray\core\groupby.py"", line 567, in apply combined = self._concat(applied) File ""C:\Anaconda3\lib\site-packages\xarray\core\groupby.py"", line 572, in _concat applied_example, applied = peek_at(applied) File ""C:\Anaconda3\lib\site-packages\xarray\core\utils.py"", line 90, in peek_at peek = next(gen) File ""C:\Anaconda3\lib\site-packages\xarray\core\groupby.py"", line 566, in applied = (func(ds, **kwargs) for ds in self._iter_grouped()) File ""C:\home\git\indices\src\main\python\indices\spi_gamma_xarray.py"", line 27, in function_to_be_applied valid_max) File ""C:\Anaconda3\lib\site-packages\numpy\core\numeric.py"", line 1005, in convolve return multiarray.correlate(a, v[::-1], mode) TypeError: Cannot cast array data from dtype('float64') to dtype(' I want to be able to run a scikit-learn model over a bunch of variables in > a 3D (lat/lon/time) dataset, and return values for each coordinate point. > Is something like this multi-dimensional groupby required (I'm thinking > groupby(lat, lon) => 2D matrices that can be fed straight into > scikit-learn), or is there already some other mechanism that could achieve > something like this? Or is the best way at the moment just to create a null > dataset, and loop over lat/lon and fill in the blanks as you go? > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly or view it on GitHub > https://github.com/pydata/xarray/pull/818#issuecomment-218372591 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,146182176