home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where author_association = "NONE" and user = 1328158 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 6

  • Broadcast error when dataset is recombined after a stack/groupby/apply/unstack sequence 4
  • Error when applying a function with apply_ufunc() when using a function that returns multiple arrays 3
  • Parallel map/apply powered by dask.array 2
  • Tremendous slowdown when using dask integration 2
  • Multidimensional groupby 1
  • NameError: name '_DType_co' is not defined 1

user 1

  • monocongo · 13 ✖

author_association 1

  • NONE · 13 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
885927432 https://github.com/pydata/xarray/issues/5631#issuecomment-885927432 https://api.github.com/repos/pydata/xarray/issues/5631 IC_kwDOAMm_X840zi4I monocongo 1328158 2021-07-23T21:39:54Z 2021-07-23T21:39:54Z NONE

Thanks to all for your help. Installing typing-extensions did solve the problem, thanks for the heads up @rhkleijn

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  NameError: name '_DType_co' is not defined 951644054
433789634 https://github.com/pydata/xarray/issues/2507#issuecomment-433789634 https://api.github.com/repos/pydata/xarray/issues/2507 MDEyOklzc3VlQ29tbWVudDQzMzc4OTYzNA== monocongo 1328158 2018-10-29T05:07:06Z 2018-10-29T05:07:06Z NONE

You're a wizard, Stephan. That was my bug. I really appreciate your help!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when applying a function with apply_ufunc() when using a function that returns multiple arrays 373646673
433768856 https://github.com/pydata/xarray/issues/2507#issuecomment-433768856 https://api.github.com/repos/pydata/xarray/issues/2507 MDEyOklzc3VlQ29tbWVudDQzMzc2ODg1Ng== monocongo 1328158 2018-10-29T02:20:30Z 2018-10-29T02:20:30Z NONE

Any guidance as to where I should start when looking into this further?

At this point, all I've been able to surmise is that the arrays returned by the applied function appear to be present, but are present as a list of arrays rather than as a tuple. That's where things go wonky in computation.py where it's checking for a tuple instance. Is xarray responsible for putting the arrays into a tuple upon function completion, and if so where should I go to look into that?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when applying a function with apply_ufunc() when using a function that returns multiple arrays 373646673
433458618 https://github.com/pydata/xarray/issues/2507#issuecomment-433458618 https://api.github.com/repos/pydata/xarray/issues/2507 MDEyOklzc3VlQ29tbWVudDQzMzQ1ODYxOA== monocongo 1328158 2018-10-26T16:03:26Z 2018-10-26T16:03:26Z NONE

Thanks, Stephan. I don't think this is related to numba, as I'm running this using the environment variable NUMBA_DISABLE_JIT=1 (I do this when debugging my code since numba JIT prevents stepping into and inspection of JIT annotated code since it's compiled). In any event, I can comment out the @numba.jit annotations and report any discrepancies.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when applying a function with apply_ufunc() when using a function that returns multiple arrays 373646673
432846749 https://github.com/pydata/xarray/issues/2499#issuecomment-432846749 https://api.github.com/repos/pydata/xarray/issues/2499 MDEyOklzc3VlQ29tbWVudDQzMjg0Njc0OQ== monocongo 1328158 2018-10-24T22:14:08Z 2018-10-24T22:14:08Z NONE

I have had some success using apply_ufunc in tandem with multiprocessing. Apparently, I can't (seamlessly) use dask arrays in place of numpy arrays within the functions where I am performing my computations, as it's not possible to assign values into dask arrays using integer indexing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Tremendous slowdown when using dask integration 372244156
431684522 https://github.com/pydata/xarray/issues/2499#issuecomment-431684522 https://api.github.com/repos/pydata/xarray/issues/2499 MDEyOklzc3VlQ29tbWVudDQzMTY4NDUyMg== monocongo 1328158 2018-10-21T16:49:35Z 2018-10-21T19:43:27Z NONE

Thanks, Zac.

I have used various options with the chunks argument, e.g. chunks={'lat': 10, 'lon': 10}, all of which appear to have a similar effect. Maybe I just haven't yet hit upon the sweet spot chunk sizes?

Is there a rule-of-thumb approach to determining the chunk sizes for a dataset? Perhaps before setting the chunk sizes I could open the dataset to poll the dimensions of the variables and based on that come up with reasonable chunk sizes, or none at all if the dataset is reasonably small?

My computations typically use a full time series per lat/lon point, so my assumption has been that I don't want to use chunking on the time dimension -- is this correct?

I have been testing this code using two versions of a precipitation dataset, the full resolution is (time=1481, lat=596, lon=1385) and the low-resolution version (for faster tests) is (time=1466, lat=38, lon=87). Results of ncdump and repr(xr.open_dataset(netcdf_precip)) are below.

``` $ ncdump -h nclimgrid_prcp.nc netcdf nclimgrid_prcp { dimensions: time = UNLIMITED ; // (1481 currently) lat = 596 ; lon = 1385 ; variables: int time(time) ; time:long_name = "Time, in monthly increments" ; time:standard_name = "time" ; time:calendar = "gregorian" ; time:units = "days since 1800-01-01 00:00:00" ; time:axis = "T" ; float lat(lat) ; lat:standard_name = "latitude" ; lat:long_name = "Latitude" ; lat:units = "degrees_north" ; lat:axis = "Y" ; lat:valid_min = 24.56253f ; lat:valid_max = 49.3542f ; float lon(lon) ; lon:standard_name = "longitude" ; lon:long_name = "Longitude" ; lon:units = "degrees_east" ; lon:axis = "X" ; lon:valid_min = -124.6875f ; lon:valid_max = -67.02084f ; float prcp(time, lat, lon) ; prcp:_FillValue = NaNf ; prcp:least_significant_digit = 3LL ; prcp:valid_min = 0.f ; prcp:coordinates = "time lat lon" ; prcp:long_name = "Precipitation, monthly total" ; prcp:standard_name = "precipitation_amount" ; prcp:references = "GHCN-Monthly Version 3 (Vose et al. 2011), NCEI/NOAA, https://www.ncdc.noaa.gov/ghcnm/v3.php" ; prcp:units = "millimeter" ; prcp:valid_max = 2000.f ;

// global attributes: :date_created = "2018-02-15 10:29:25.485927" ; :date_modified = "2018-02-15 10:29:25.486042" ; :Conventions = "CF-1.6, ACDD-1.3" ; :ncei_template_version = "NCEI_NetCDF_Grid_Template_v2.0" ; :title = "nClimGrid" ; :naming_authority = "gov.noaa.ncei" ; :standard_name_vocabulary = "Standard Name Table v35" ; :institution = "National Centers for Environmental Information (NCEI), NOAA, Department of Commerce" ; :geospatial_lat_min = 24.56253f ; :geospatial_lat_max = 49.3542f ; :geospatial_lon_min = -124.6875f ; :geospatial_lon_max = -67.02084f ; :geospatial_lat_units = "degrees_north" ; :geospatial_lon_units = "degrees_east" ; }

/ repr(ds) below: / <xarray.Dataset> Dimensions: (lat: 596, lon: 1385, time: 1481) Coordinates: * time (time) datetime64[ns] 1895-01-01 1895-02-01 ... 2018-05-01 * lat (lat) float32 49.3542 49.312534 49.270866 ... 24.6042 24.562532 * lon (lon) float32 -124.6875 -124.645836 ... -67.0625 -67.020836 Data variables: prcp (time, lat, lon) float32 ... Attributes: date_created: 2018-02-15 10:29:25.485927 date_modified: 2018-02-15 10:29:25.486042 Conventions: CF-1.6, ACDD-1.3 ncei_template_version: NCEI_NetCDF_Grid_Template_v2.0 title: nClimGrid naming_authority: gov.noaa.ncei standard_name_vocabulary: Standard Name Table v35 institution: National Centers for Environmental Information... geospatial_lat_min: 24.562532 geospatial_lat_max: 49.3542 geospatial_lon_min: -124.6875 geospatial_lon_max: -67.020836 geospatial_lat_units: degrees_north geospatial_lon_units: degrees_east ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Tremendous slowdown when using dask integration 372244156
249059201 https://github.com/pydata/xarray/issues/585#issuecomment-249059201 https://api.github.com/repos/pydata/xarray/issues/585 MDEyOklzc3VlQ29tbWVudDI0OTA1OTIwMQ== monocongo 1328158 2016-09-22T23:39:41Z 2017-03-07T05:32:04Z NONE

This is good news for me as the functions I will apply take a ndarray as input and return a corresponding ndarray as output. Once this is available in xarray I'll be eager to give it a whirl...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel map/apply powered by dask.array 107424151
248969870 https://github.com/pydata/xarray/issues/585#issuecomment-248969870 https://api.github.com/repos/pydata/xarray/issues/585 MDEyOklzc3VlQ29tbWVudDI0ODk2OTg3MA== monocongo 1328158 2016-09-22T17:23:22Z 2016-09-22T17:23:22Z NONE

I'm adding this note to express an interest in the functionality described in Stephan's original description, i.e. a parallel_apply method/function which would apply a function in parallel utilizing multiple CPUs. I have (finally) worked out how to use groupby and apply for my application but it would be much more useful if I could apply functions in parallel to take advantage of multiple CPUs. What's the expected effort to make something like this available in xarray? Several months ago I worked on doing this sort of thing without xarray using the multiprocessing module and a shared memory object and I may revisit that soon, but I expect that a solution using xarray will be more elegant so if such a thing is coming in the foreseeable future then I may wait on that and focus on other tasks. Can anyone advise?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Parallel map/apply powered by dask.array 107424151
248409634 https://github.com/pydata/xarray/issues/873#issuecomment-248409634 https://api.github.com/repos/pydata/xarray/issues/873 MDEyOklzc3VlQ29tbWVudDI0ODQwOTYzNA== monocongo 1328158 2016-09-20T19:37:07Z 2016-09-20T19:37:07Z NONE

Thanks for this clarification, Stephan. Apparently I didn't read the API documentation closely enough, as I was assuming that the function is applied to the underlying ndarray rather than to all data variables of a Dataset object. Now that I've taken the approach you suggested I'm cooking with gas, and it's very encouraging. I really appreciate your help.

​--James ​

On Tue, Sep 20, 2016 at 11:54 AM, Stephan Hoyer notifications@github.com wrote:

GroupBy is working as intended here. ds.groupby('something').apply(func) calls func on objects of the same type as ds. If you group a Dataset, each time you apply to a Dataset, too.

You can certainly still use np.convolve, but you'll need to manually apply it to numpy arrays extracted from a Dataset and then rebuild another Dataset or DataArray.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/873#issuecomment-248345053, or mute the thread https://github.com/notifications/unsubscribe-auth/ABREHkSB-DvW5OD3DHuYxYzP7l7yfjL2ks5qsAGcgaJpZM4IwE4g .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Broadcast error when dataset is recombined after a stack/groupby/apply/unstack sequence 158958801
248216388 https://github.com/pydata/xarray/issues/873#issuecomment-248216388 https://api.github.com/repos/pydata/xarray/issues/873 MDEyOklzc3VlQ29tbWVudDI0ODIxNjM4OA== monocongo 1328158 2016-09-20T06:42:53Z 2016-09-20T06:42:53Z NONE

Thanks, Stephan. My code uses numpy.convolve() in several key places, so if that function is a deal breaker for using xarray then I'll hold off until that's fixed. In the meantime if there's anything else I can do to help you work this out then please let me know.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Broadcast error when dataset is recombined after a stack/groupby/apply/unstack sequence 158958801
242535724 https://github.com/pydata/xarray/issues/873#issuecomment-242535724 https://api.github.com/repos/pydata/xarray/issues/873 MDEyOklzc3VlQ29tbWVudDI0MjUzNTcyNA== monocongo 1328158 2016-08-25T20:48:45Z 2016-08-25T20:48:45Z NONE

Thanks, Stephan. In general things appear to be working much more as expected now, probably (hopefully) this is just an edge case/nuance that won't be too difficult for you guys to address. If so and if I don't run across any other issues then my code will be dramatically simplified by leveraging xarray rather than writing code to enable shared memory objects for the multiprocessing side of things (my assumption being that you guys have done a better job of that than I can).

A gist with example code and a smallish data file attached to the comment is here: https://gist.github.com/monocongo/e8e883c2355f7a92bb0b9d24db5407a8

Please let me know if I can do anything else to help you help me. Godspeed!

--James

On Tue, Aug 23, 2016 at 12:42 AM, Stephan Hoyer notifications@github.com wrote:

Could you please share a data file and/or code which I can run to reproduce each of these issues?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/873#issuecomment-241625354, or mute the thread https://github.com/notifications/unsubscribe-auth/ABREHhoGvwv30D2Qk858lHB-U5oWtRQnks5qinpDgaJpZM4IwE4g .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Broadcast error when dataset is recombined after a stack/groupby/apply/unstack sequence 158958801
241540585 https://github.com/pydata/xarray/issues/873#issuecomment-241540585 https://api.github.com/repos/pydata/xarray/issues/873 MDEyOklzc3VlQ29tbWVudDI0MTU0MDU4NQ== monocongo 1328158 2016-08-22T20:32:20Z 2016-08-22T20:32:20Z NONE

I get the following error now when I try to run the gist code referenced in the original message above:

``` $ python -u xarray_gist.py /dev/shm/nclimgrid_prcp_reduced.nc nclimgrid_prcp_doubled.nc Traceback (most recent call last): File "xarray_gist.py", line 45, in <module> encoding = {variable_name: {'FillValue': np.nan, 'dtype': 'float32'}}) File "/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/core/dataset.py", line 782, in to_netcdf engine=engine, encoding=encoding) File "/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/api.py", line 354, in to_netcdf dataset.dump_to_store(store, sync=sync, encoding=encoding) File "/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/core/dataset.py", line 728, in dump_to_store store.store(variables, attrs, check_encoding) File "/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/common.py", line 234, in store check_encoding_set) File "/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/common.py", line 209, in store self.set_variables(variables, check_encoding_set) File "/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/common.py", line 219, in set_variables target, source = self.prepare_variable(name, v, check) File "/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/netCDF4.py", line 266, in prepare_variable raise_on_invalid=check_encoding) File "/home/james.adams/anaconda3/lib/python3.5/site-packages/xarray/backends/netCDF4_.py", line 167, in _extract_nc4_encoding ' %r' % (backend, invalid)) ValueError: unexpected encoding parameters for 'netCDF4' backend: ['dtype']

```

Additionally I see the following errors when I run some other code which uses the same dataset.groupby().apply() technique (the trouble appears to show up within numpy.convolve()):

Traceback (most recent call last): File "C:\home\git\indices\src\main\python\indices\spi_gamma_xarray.py", line 46, in <module> dataset = dataset.groupby('grid_cells').apply(function_to_be_applied) File "C:\Anaconda3\lib\site-packages\xarray\core\groupby.py", line 567, in apply combined = self._concat(applied) File "C:\Anaconda3\lib\site-packages\xarray\core\groupby.py", line 572, in _concat applied_example, applied = peek_at(applied) File "C:\Anaconda3\lib\site-packages\xarray\core\utils.py", line 90, in peek_at peek = next(gen) File "C:\Anaconda3\lib\site-packages\xarray\core\groupby.py", line 566, in <genexpr> applied = (func(ds, **kwargs) for ds in self._iter_grouped()) File "C:\home\git\indices\src\main\python\indices\spi_gamma_xarray.py", line 27, in function_to_be_applied valid_max) File "C:\Anaconda3\lib\site-packages\numpy\core\numeric.py", line 1005, in convolve return multiarray.correlate(a, v[::-1], mode) TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe' Traceback (most recent call last): File "C:\home\git\indices\src\main\python\indices\spi_gamma_xarray.py", line 46, in <module> dataset = dataset.groupby('grid_cells').apply(function_to_be_applied) File "C:\Anaconda3\lib\site-packages\xarray\core\groupby.py", line 567, in apply combined = self._concat(applied) File "C:\Anaconda3\lib\site-packages\xarray\core\groupby.py", line 572, in _concat applied_example, applied = peek_at(applied) File "C:\Anaconda3\lib\site-packages\xarray\core\utils.py", line 90, in peek_at peek = next(gen) File "C:\Anaconda3\lib\site-packages\xarray\core\groupby.py", line 566, in <genexpr> applied = (func(ds, **kwargs) for ds in self._iter_grouped()) File "C:\home\git\indices\src\main\python\indices\spi_gamma_xarray.py", line 27, in function_to_be_applied valid_max) File "C:\Anaconda3\lib\site-packages\numpy\core\numeric.py", line 1005, in convolve return multiarray.correlate(a, v[::-1], mode) TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'

Please advise if I can provide any further information which might help work this out, or if I have made wrong assumptions as to how this feature should be used. Thanks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Broadcast error when dataset is recombined after a stack/groupby/apply/unstack sequence 158958801
219231028 https://github.com/pydata/xarray/pull/818#issuecomment-219231028 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxOTIzMTAyOA== monocongo 1328158 2016-05-14T16:56:37Z 2016-05-14T16:56:37Z NONE

I would also like to do what is described below but so far have had little success using xarray.

I have time series data (x years of monthly values) at each lat/lon point of a grid (x*12 times, lons, lats). I want to apply a function f() against the time series to return a corresponding time series of values. I then write these values to an output NetCDF which corresponds to the input NetCDF in terms of dimensions and coordinate variables. So instead of looping over every lat and every lon I want to apply f() in a vectorized manner such as what's described for xarray's groupby (in order to gain the expected performance from using xarray for the split-apply-combine pattern), but it needs to work for more than a single dimension which is the current capability.

Has anyone done what is described above using xarray? What sort of performance gains can be expected using your approach?

Thanks in advance for any help with this topic. My apologies if there is a more appropriate forum for this sort of discussion (please redirect if so), as this may not be applicable to the original issue...

--James

On Wed, May 11, 2016 at 2:24 AM, naught101 notifications@github.com wrote:

I want to be able to run a scikit-learn model over a bunch of variables in a 3D (lat/lon/time) dataset, and return values for each coordinate point. Is something like this multi-dimensional groupby required (I'm thinking groupby(lat, lon) => 2D matrices that can be fed straight into scikit-learn), or is there already some other mechanism that could achieve something like this? Or is the best way at the moment just to create a null dataset, and loop over lat/lon and fill in the blanks as you go?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/pydata/xarray/pull/818#issuecomment-218372591

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.944ms · About: xarray-datasette