home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

21 rows where author_association = "NONE" and user = 17701232 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 9

  • multidim groupby on dask arrays: dask.array.reshape error 6
  • groupby_bins: exclude bin or assign bin with nan when bin has no values 5
  • to_netcdf on Python 3: "string" qualifier on attributes 2
  • xr.concat and xr.to_netcdf new filesize 2
  • Loss of coordinate information from groupby.apply() on a stacked object 2
  • Feature/rasterio 1
  • description of xarray assumes knowledge of pandas 1
  • xarray vs Xarray vs XArray 1
  • Time Dimension, Big problem with methods 'groupby' and 'to_netcdf' 1

user 1

  • byersiiasa · 21 ✖

author_association 1

  • NONE · 21 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
316381473 https://github.com/pydata/xarray/issues/1483#issuecomment-316381473 https://api.github.com/repos/pydata/xarray/issues/1483 MDEyOklzc3VlQ29tbWVudDMxNjM4MTQ3Mw== byersiiasa 17701232 2017-07-19T13:12:03Z 2017-07-19T13:12:03Z NONE

@darothen yes you are right - this is definitely not a good way to apply mean - I was just using mean as a (poor) example trying not to over-complicate or distract from the issue. But, as you suggest, this is what I do when needing to apply customised functions like from scipy... which, can end up being slow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Loss of coordinate information from groupby.apply() on a stacked object 244016361
316363418 https://github.com/pydata/xarray/issues/1483#issuecomment-316363418 https://api.github.com/repos/pydata/xarray/issues/1483 MDEyOklzc3VlQ29tbWVudDMxNjM2MzQxOA== byersiiasa 17701232 2017-07-19T12:00:42Z 2017-07-19T12:00:42Z NONE

** Maybe not an issue for others or I am missing something... Or perhaps this is intended behaviour? Thanks for clarification!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Loss of coordinate information from groupby.apply() on a stacked object 244016361
315782686 https://github.com/pydata/xarray/issues/1480#issuecomment-315782686 https://api.github.com/repos/pydata/xarray/issues/1480 MDEyOklzc3VlQ29tbWVudDMxNTc4MjY4Ng== byersiiasa 17701232 2017-07-17T15:04:56Z 2017-07-17T15:04:56Z NONE

As far as I know I can imagine this is the intended functionality.

The examples given in the documentation seems to have a different behaviour. That is, the timestamps are retained and the first date of each month is used.

I cannot find where this is the case, apart from when using .resample. Could you put a link to the doc page?

The issue is perhaps more with the example that you present (of only 1 year data) and expected behaviour. Normally groupby('time.month') would be applied to multiple years of data. i.e. group data by month and find the monthly averages for Jan-Dec for 30 years of data, e.g. a climatology.

And so in this case it absolutely makes sense to keep the months as 1 to 12, or something similar (perhaps 'Jan','Feb'etc). Applying a datestring of the first day of the month wouldn't make sense because which year would you choose when you have 30 years of data?

If you do want a time series of monthly means, then .resample is the function you want and it will give you the datestamps in the format that you desire.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Time Dimension, Big problem with methods 'groupby' and 'to_netcdf' 243270042
303025395 https://github.com/pydata/xarray/pull/1070#issuecomment-303025395 https://api.github.com/repos/pydata/xarray/issues/1070 MDEyOklzc3VlQ29tbWVudDMwMzAyNTM5NQ== byersiiasa 17701232 2017-05-22T07:50:55Z 2017-05-22T07:50:55Z NONE

@gidden you might be interested in this

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/rasterio 186326698
286382540 https://github.com/pydata/xarray/issues/1306#issuecomment-286382540 https://api.github.com/repos/pydata/xarray/issues/1306 MDEyOklzc3VlQ29tbWVudDI4NjM4MjU0MA== byersiiasa 17701232 2017-03-14T10:34:53Z 2017-03-14T10:34:53Z NONE

Just to add another complication, previously the package was xray and the shorthand typically used was and still is xr, e..g. xr.open_dataset(). Are/have there beein any thoughts or discussions on whether xa would be more fitting?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray vs Xarray vs XArray 213426608
286381505 https://github.com/pydata/xarray/issues/1026#issuecomment-286381505 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjM4MTUwNQ== byersiiasa 17701232 2017-03-14T10:30:24Z 2017-03-14T10:30:24Z NONE

Thanks - this is working well.

Reverting back to xarray 0.8.2 and dask 0.10.1 seems to be a combination that worked well for this particular task using delayed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286171415 https://github.com/pydata/xarray/issues/1026#issuecomment-286171415 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjE3MTQxNQ== byersiiasa 17701232 2017-03-13T16:58:06Z 2017-03-13T16:58:06Z NONE

@shoyer No chunking as the dataset was quite small (360x720x30). Also, the calculation is along the time dimension so this effectively disappears for each lat/lon. Hence initial surprise why it was coming up with this chunk/reshape issue since I thought all it has to do is unstack 'allpoints'

If I print one of the dask arrays from within the function print sT dask.array<from-va..., shape=(11L,), dtype=float64, chunksize=(11L,)> This is 11L because the calculation returns 11 values per point to an xr.Dataset.

Others have no chunks because they are single values (for each point) print p_value dask.array<from-va..., shape=(), dtype=float64, chunksize=()> Only returns one value per point The object returned (xr.Dataset) from the .apply function comes out with chunks: mle.chunks Frozen(SortedKeysDict({'allpoints': (1, 1, 1, 1, 1......(allpoints)....., 1, 1), 'T': (11L,)}))

and looks like: <xarray.Dataset> Dimensions: (T: 11, allpoints: 259200) Coordinates: * T (T) int32 1 5 10 15 20 25 30 40 50 75 100 * allpoints (allpoints) MultiIndex - allpoints_level_0 (allpoints) float64 40.25 40.25 40.25 40.25 40.25 ... - allpoints_level_1 (allpoints) float64 22.75 23.25 23.75 24.25 24.75 ... Data variables: xi (allpoints) float64 -0.6906 -0.6906 -0.6906 -0.6906 ... mu (allpoints) float64 9.969e+36 9.969e+36 9.969e+36 ... sT (allpoints, T) float64 9.969e+36 9.969e+36 9.969e+36 ... KS_p_value (allpoints) float64 3.8e-12 3.8e-12 3.8e-12 3.8e-12 ... sigma (allpoints) float64 5.297e-24 5.297e-24 5.297e-24 ... KS_statistic (allpoints) float64 0.6321 0.6321 0.6321 0.6321 ...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286152988 https://github.com/pydata/xarray/issues/1026#issuecomment-286152988 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjE1Mjk4OA== byersiiasa 17701232 2017-03-13T16:00:39Z 2017-03-13T16:00:39Z NONE

So, not sure if this is helpful but I'll leave these notes here just in case.

  • 0.11.0 - similar problem to @rabernat above - 0.10.1 - seems to work fine for what I wanted (delayed)
  • 0.9.0 - appeared to work ok, but actually I'm not convinced it was parallelising the tasks. And also resulted in massive memory issues
  • 0.14.0 - another problem, can't remember what but issue to do with delayed I think.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286144002 https://github.com/pydata/xarray/issues/1026#issuecomment-286144002 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjE0NDAwMg== byersiiasa 17701232 2017-03-13T15:33:25Z 2017-03-13T15:33:25Z NONE

I have been re-running that script you helped me with in Google groups: https://groups.google.com/forum/#!searchin/xarray/combogev%7Csort:relevance/xarray/nfNh40Zt3sU/WfhavtXgCAAJ

do you mean the delayed object from within the function? perhaps <bound method Array.visualize of dask.array<from-va..., shape=(11L,), dtype=float64, chunksize=(11L,)>>

or perhaps Delayed('fit-3767d9ad6cfa517555b5800b3b5f4e41')

I am going to keep trying with different versions of dask since this 0.9.0 doesn't seem to behave it did previously.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
286062113 https://github.com/pydata/xarray/issues/1026#issuecomment-286062113 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NjA2MjExMw== byersiiasa 17701232 2017-03-13T09:57:04Z 2017-03-13T09:57:04Z NONE

<xarray.DataArray 'dis' (time: 30, allpoints: 259200)> array([[ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], ..., [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36], [ 9.969210e+36, 9.969210e+36, 9.969210e+36, ..., 9.969210e+36, 9.969210e+36, 9.969210e+36]]) Coordinates: * time (time) datetime64[ns] 1971-01-01 1972-01-01 1973-01-01 ... * allpoints (allpoints) MultiIndex - lon (allpoints) float64 -179.8 -179.8 -179.8 -179.8 -179.8 -179.8 ... - lat (allpoints) float64 89.75 89.25 88.75 88.25 87.75 87.25 86.75 ...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
285851059 https://github.com/pydata/xarray/issues/1026#issuecomment-285851059 https://api.github.com/repos/pydata/xarray/issues/1026 MDEyOklzc3VlQ29tbWVudDI4NTg1MTA1OQ== byersiiasa 17701232 2017-03-11T07:51:57Z 2017-03-12T14:53:35Z NONE

Hi @rabernat and @shoyer I have come across same issue while re-running some old code now using xarray 0.9.1 / dask 0.11.0. Was there any workaround or solution?

Issue occurs for me when trying to unstack 'allpoints', e.g. mle = stacked.dis.groupby('allpoints').apply(combogev) dsmle = mle.unstack('allpoints')

Thanks

Also works with dask 0.9.0

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multidim groupby on dask arrays: dask.array.reshape error 180516114
285925097 https://github.com/pydata/xarray/issues/1282#issuecomment-285925097 https://api.github.com/repos/pydata/xarray/issues/1282 MDEyOklzc3VlQ29tbWVudDI4NTkyNTA5Nw== byersiiasa 17701232 2017-03-12T06:18:37Z 2017-03-12T06:18:37Z NONE

I agree as I was in this situation of jumping straight into xarray (and Python) having never used pandas. As for other key points that could be emphasised:

  • , the concept of label-based indexing was new to me and may be something you may want to add more emphasis on in the Page 1 description? (I see it is already nicely explained in the paper in referecend to np.ndarrays.)
  • the automatic plotting with Matplotlib is super
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  description of xarray assumes knowledge of pandas 209561985
251238760 https://github.com/pydata/xarray/issues/1019#issuecomment-251238760 https://api.github.com/repos/pydata/xarray/issues/1019 MDEyOklzc3VlQ29tbWVudDI1MTIzODc2MA== byersiiasa 17701232 2016-10-03T21:54:22Z 2016-10-03T21:54:38Z NONE

@rabernat @shoyer thank you very much - (at least for my purposes) this appears to be working well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby_bins: exclude bin or assign bin with nan when bin has no values 179969119
250773817 https://github.com/pydata/xarray/issues/1019#issuecomment-250773817 https://api.github.com/repos/pydata/xarray/issues/1019 MDEyOklzc3VlQ29tbWVudDI1MDc3MzgxNw== byersiiasa 17701232 2016-09-30T15:24:31Z 2016-09-30T15:24:31Z NONE

Thanks @shoyer and @rabernat . @gidden and I may have a go next week. Otherwise if someone wants to jump in, I made a notebook to test/demonstrate the issue. groupby_bins_test_nb.zip

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby_bins: exclude bin or assign bin with nan when bin has no values 179969119
250496630 https://github.com/pydata/xarray/issues/1019#issuecomment-250496630 https://api.github.com/repos/pydata/xarray/issues/1019 MDEyOklzc3VlQ29tbWVudDI1MDQ5NjYzMA== byersiiasa 17701232 2016-09-29T15:15:44Z 2016-09-29T15:15:44Z NONE

0.8.2 updated from conda a few days ago. I'll try the master. Thanks

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby_bins: exclude bin or assign bin with nan when bin has no values 179969119
250492101 https://github.com/pydata/xarray/issues/1019#issuecomment-250492101 https://api.github.com/repos/pydata/xarray/issues/1019 MDEyOklzc3VlQ29tbWVudDI1MDQ5MjEwMQ== byersiiasa 17701232 2016-09-29T15:00:02Z 2016-09-29T15:00:02Z NONE

@rabernat I don't have much capability to help, but if any changes are made I am happy to help test this particular case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby_bins: exclude bin or assign bin with nan when bin has no values 179969119
250486102 https://github.com/pydata/xarray/issues/1019#issuecomment-250486102 https://api.github.com/repos/pydata/xarray/issues/1019 MDEyOklzc3VlQ29tbWVudDI1MDQ4NjEwMg== byersiiasa 17701232 2016-09-29T14:40:43Z 2016-09-29T14:40:43Z NONE

So if I plot the current output as a bar chart/histogram, that bin interval will be skipped. For example if I did: plt.plot(binns[0:-2], binned) #using left edges of the bins I would get an error if a bin present in binns has been skipped in binned.

I guess that perhaps there is a cleverer way of plotting the output data than this.

This leads to more important questions: 1. Do you know the logic to the ordering of the binned data and the bin objects? In this example, the bins input is monotonically increasing, but the bin object does not correspond. e.g.

binns = [-100, -50, 0, 50, 50.00001, 100] array(['(0, 50]', '(-50, 0]', '(51, 100]', '(-100, -50]'], dtype=object) 1. Does the order of output values in the summed array (binned) correspond to the input bins or the output bin object? If the latter, how do I reorder the data more in line with the monotonically increasing input bins array?

Thanks

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby_bins: exclude bin or assign bin with nan when bin has no values 179969119
226420773 https://github.com/pydata/xarray/issues/681#issuecomment-226420773 https://api.github.com/repos/pydata/xarray/issues/681 MDEyOklzc3VlQ29tbWVudDIyNjQyMDc3Mw== byersiiasa 17701232 2016-06-16T08:27:33Z 2016-06-16T08:27:33Z NONE

Turns out to be 1.2.2!. I am using the Anaconda installation and for some reason it is not updating to the latest version. @shoyer Many thanks for your ongoing support on xarray btw, it works well and has been fairly painless to use

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf on Python 3: "string" qualifier on attributes  122776511
226162435 https://github.com/pydata/xarray/issues/681#issuecomment-226162435 https://api.github.com/repos/pydata/xarray/issues/681 MDEyOklzc3VlQ29tbWVudDIyNjE2MjQzNQ== byersiiasa 17701232 2016-06-15T11:38:04Z 2016-06-15T11:38:04Z NONE

Is this still possibly an issue? We have been writing out netCDFs using xarray but having trouble opening in other software, e.g. ArcGIS when using NETCDF4 format. However, works fine when using NETCDF4_CLASSIC

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf on Python 3: "string" qualifier on attributes  122776511
220542297 https://github.com/pydata/xarray/issues/851#issuecomment-220542297 https://api.github.com/repos/pydata/xarray/issues/851 MDEyOklzc3VlQ29tbWVudDIyMDU0MjI5Nw== byersiiasa 17701232 2016-05-20T08:08:44Z 2016-05-20T08:08:44Z NONE

Thank you Stephan - very useful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.concat and xr.to_netcdf new filesize 155741762
220337889 https://github.com/pydata/xarray/issues/851#issuecomment-220337889 https://api.github.com/repos/pydata/xarray/issues/851 MDEyOklzc3VlQ29tbWVudDIyMDMzNzg4OQ== byersiiasa 17701232 2016-05-19T14:17:49Z 2016-05-19T14:17:49Z NONE

not sure! I don't have NCO but I will try.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.concat and xr.to_netcdf new filesize 155741762

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 644.524ms · About: xarray-datasette