github: issues: 3 rows where comments = 13, repo = 13221727 and user = 5635139 sorted by updated

3 rows where comments = 13, repo = 13221727 and user = 5635139 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
1913983402	I_kwDOAMm_X85yFRGq	8233	numbagg & flox	max-sixty 5635139	closed	13	2023-09-26T17:33:32Z	2023-10-15T07:48:56Z	2023-10-09T15:40:29Z	MEMBER	What is your issue? I've been doing some work recently on our old friend numbagg, improving the ewm routines & adding some more. I'm keen to get numbagg back in shape, doing the things that it does best, and trimming anything it doesn't. I notice that it has grouped calcs. Am I correct to think that flox does this better? I haven't been up with the latest. flox looks like it's particularly focused on dask arrays, whereas numpy_groupies, one of the inspirations for this, was applicable to numpy arrays too. At least from the xarray perspective, are we OK to deprecate these numbagg functions, and direct folks to flox?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8233/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
365973662	MDU6SXNzdWUzNjU5NzM2NjI=	2459	Stack + to_array before to_xarray is much faster that a simple to_xarray	max-sixty 5635139	closed	13	2018-10-02T16:13:26Z	2020-07-02T20:39:01Z	2020-07-02T20:39:01Z	MEMBER	I was seeing some slow performance around `to_xarray()` on MultiIndexed series, and found that unstacking one of the dimensions before running `to_xarray()`, and then restacking with `to_array()` was ~30x faster. This time difference is consistent with larger data sizes. To reproduce: Create a series with a MultiIndex, ensuring the MultiIndex isn't a simple product: ```python s = pd.Series( np.random.rand(100000), index=pd.MultiIndex.from_product([ list('abcdefhijk'), list('abcdefhijk'), pd.DatetimeIndex(start='2000-01-01', periods=1000, freq='B'), ])) cropped = s[::3] cropped.index=pd.MultiIndex.from_tuples(cropped.index, names=list('xyz')) cropped.head() x y z a a 2000-01-03 0.993989 2000-01-06 0.850518 2000-01-11 0.068944 2000-01-14 0.237197 2000-01-19 0.784254 dtype: float64 ``` Two approaches for getting this into xarray; 1 - Simple `.to_xarray()`: ```python current_method = cropped.to_xarray() <xarray.DataArray (x: 10, y: 10, z: 1000)> array([[[0.993989, nan, ..., nan, 0.721663], [ nan, nan, ..., 0.58224 , nan], ..., [ nan, 0.369382, ..., nan, nan], [0.98558 , nan, ..., nan, 0.403732]], `[[ nan, nan, ..., 0.493711, nan], [ nan, 0.126761, ..., nan, nan], ..., [0.976758, nan, ..., nan, 0.816612], [ nan, nan, ..., 0.982128, nan]], ..., [[ nan, 0.971525, ..., nan, nan], [0.146774, nan, ..., nan, 0.419806], ..., [ nan, nan, ..., 0.700764, nan], [ nan, 0.502058, ..., nan, nan]], [[0.246768, nan, ..., nan, 0.079266], [ nan, nan, ..., 0.802297, nan], ..., [ nan, 0.636698, ..., nan, nan], [0.025195, nan, ..., nan, 0.629305]]])` Coordinates: * x (x) object 'a' 'b' 'c' 'd' 'e' 'f' 'h' 'i' 'j' 'k' * y (y) object 'a' 'b' 'c' 'd' 'e' 'f' 'h' 'i' 'j' 'k' * z (z) datetime64[ns] 2000-01-03 2000-01-04 ... 2003-10-30 2003-10-31 ``` This takes 536 ms 2 - unstack in pandas first, and then use `to_array` to do the equivalent of a restack: `proposed_version = ( cropped .unstack('y') .to_xarray() .to_array('y') )` This takes 17.3 ms To confirm these are identical: ``` proposed_version_adj = ( proposed_version .assign_coords(y=proposed_version['y'].astype(object)) .transpose(*current_version.dims) ) proposed_version_adj.equals(current_version) True ``` Problem description A default operation is much slower than a (potentially) equivalent operation that's not the default. I need to look more at what's causing the issues. I think it's to do with the `.reindex(full_idx)`, but I'm unclear why it's so much faster in the alternative route, and whether there's a fix that we can make to make the default path fast. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 2.7.14.final.0 python-bits: 64 OS: Linux OS-release: 4.9.93-linuxkit-aufs machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.utf8 LOCALE: None.None xarray: 0.10.9 pandas: 0.23.4 numpy: 1.15.2 scipy: 1.1.0 netCDF4: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None PseudonetCDF: None rasterio: None iris: None bottleneck: 1.2.1 cyordereddict: None dask: None distributed: None matplotlib: 2.2.3 cartopy: 0.16.0 seaborn: 0.9.0 setuptools: 40.4.3 pip: 18.0 conda: None pytest: 3.8.1 IPython: 5.8.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2459/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
115210260	MDU6SXNzdWUxMTUyMTAyNjA=	645	Display of PeriodIndex	max-sixty 5635139	closed	13	2015-11-05T05:01:22Z	2015-12-30T05:59:05Z	2015-12-30T05:59:05Z	MEMBER	Not the greatest issue but: While coordinates that are given as `PeriodIndex`es are stored in that form, their `Int` representation is shown in the `DataArray` repr, which adds a frequent additional step to see what dates we're dealing with. Or correct me if I'm making some basic mistake. ``` python In [23]: data_array = xray.DataArray( data=pd.Series(np.random.rand(20), index=pd.period_range(start='2000', periods=20, name='Date')) ) data_array Out[23]: <xray.DataArray (Date: 20)> array([ 0.95861189, 0.3607297 , 0.9890032 , 0.77674314, 0.39461886, 0.98425749, 0.79044973, 0.81376587, 0.07091318, 0.02757213, 0.87366025, 0.0496346 , 0.45433931, 0.3339866 , 0.67261248, 0.91684965, 0.60889737, 0.33469611, 0.94966724, 0.50328461]) Coordinates: * Date (Date) int64 10957 10958 10959 10960 10961 10962 10963 10964 ... In [25]: data_array.to_series() Out[25]: Date 2000-01-01 0.958612 2000-01-02 0.360730 2000-01-03 0.989003 2000-01-04 0.776743 2000-01-05 0.394619 2000-01-06 0.984257 2000-01-07 0.790450 2000-01-08 0.813766 2000-01-09 0.070913 2000-01-10 0.027572 2000-01-11 0.873660 2000-01-12 0.049635 2000-01-13 0.454339 2000-01-14 0.333987 2000-01-15 0.672612 2000-01-16 0.916850 2000-01-17 0.608897 2000-01-18 0.334696 2000-01-19 0.949667 2000-01-20 0.503285 Freq: D, dtype: float64 ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/645/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

3 rows where comments = 13, repo = 13221727 and user = 5635139 sorted by updated_at descending

What is your issue?

x y z

a a 2000-01-03 0.993989

2000-01-06 0.850518

2000-01-11 0.068944

2000-01-14 0.237197

2000-01-19 0.784254

dtype: float64

current_method = cropped.to_xarray()

True

Problem description

Output of xr.show_versions()

Advanced export

Output of `xr.show_versions()`