github: issues: 4 rows where repo = 13221727, type = "issue" and user = 21049064 sorted by updated

4 rows where repo = 13221727, type = "issue" and user = 21049064 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
469633509	MDU6SXNzdWU0Njk2MzM1MDk=	3141	calculating cumsums on a groupby object	tommylees112 21049064	closed	6	2019-07-18T08:21:57Z	2022-07-20T01:31:37Z	2022-07-20T01:31:37Z	NONE	How do I go about calculating cumsums on a groupby object? I have a `Dataset` that looks as the following: ```python lat = np.linspace(-5.175003, 5.9749985, 224) lon = np.linspace(33.524994, 42.274994, 176) time = pd.date_range(start='1981-01-31', end='2019-04-30', freq='M') data = np.random.randn(len(time), len(lat), len(lon)) dims = ['time', 'lat', 'lon'] coords = {'time': time, 'lat': lat, 'lon': lon} ds = xr.Dataset({'precip': (dims, data)}, coords=coords) Out[]: <xarray.Dataset> Dimensions: (lat: 224, lon: 176, time: 460) Coordinates: * time (time) datetime64[ns] 1981-01-31 1981-02-28 ... 2019-04-30 * lat (lat) float64 -5.175 -5.125 -5.075 -5.025 ... 5.875 5.925 5.975 * lon (lon) float64 33.52 33.57 33.62 33.67 ... 42.12 42.17 42.22 42.27 Data variables: precip (time, lat, lon) float64 0.006328 0.2969 1.564 ... 0.6675 2.32 ``` I need to `groupby` year and calculate the cumsum for each year. That way I will have a value for each `month` (timestep) and each pixel (`lat` - `lon` pair). But the cumsum operation doesn't work on a `groupby` object ```python ds.groupby('time.year').cumsum(dim='time') Out[]: AttributeError Traceback (most recent call last) <ipython-input-12-dceee5f5647c> in <module> 9 display(ds_) 10 ---> 11 ds_.groupby('time.year').cumsum(dim='time') AttributeError: 'DatasetGroupBy' object has no attribute 'cumsum' ``` Is there a work around? ``` INSTALLED VERSIONS commit: None python: 3.7.0 \| packaged by conda-forge \| (default, Nov 12 2018, 12:34:36) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.2 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.3.0 netCDF4: 1.5.1.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: 1.0.17 cfgrib: 0.9.7 iris: None bottleneck: 1.2.1 dask: 1.2.2 distributed: 1.28.1 matplotlib: 3.1.0 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: None setuptools: 41.0.1 pip: 19.1 conda: None pytest: 4.5.0 IPython: 7.1.1 sphinx: 2.0.1 ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3141/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
453576041	MDU6SXNzdWU0NTM1NzYwNDE=	3004	assign values from `xr.groupby_bins` to new `variable`	tommylees112 21049064	closed	8	2019-06-07T15:38:01Z	2019-07-07T12:17:46Z	2019-07-07T12:17:45Z	NONE	Code Sample, a copy-pastable example if possible A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports ```python Your code here import pandas as pd import numpy as np import xarray as xr time = pd.date_range('2010-01-01','2011-12-31',freq='M') lat = np.linspace(-5.175003, -4.7250023, 10) lon = np.linspace(33.524994, 33.97499, 10) precip = np.random.normal(0, 1, size=(len(time), len(lat), len(lon))) ds = xr.Dataset( {'precip': (['time', 'lat', 'lon'], precip)}, coords={ 'lon': lon, 'lat': lat, 'time': time, } ) variable = 'precip' calculate a cumsum over some window size rolling_window = 3 ds_window = ( ds.rolling(time=rolling_window, center=True) .sum() .dropna(dim='time', how='all') ) construct a cumulative frequency distribution ranking the precip values per month rank_norm_list = [] for mth in range(1, 13): ds_mth = ( ds_window .where(ds_window['time.month'] == mth) .dropna(dim='time', how='all') ) rank_norm_mth = ( (ds_mth.rank(dim='time') - 1) / (ds_mth.time.size - 1.0) * 100.0 ) rank_norm_mth = rank_norm_mth.rename({variable: 'rank_norm'}) rank_norm_list.append(rank_norm_mth) rank_norm = xr.merge(rank_norm_list).sortby('time') assign bins to variable xarray bins = [20., 40., 60., 80., np.Inf] decile_index_gpby = rank_norm.groupby_bins('rank_norm', bins=bins) out = decile_index_gpby.assign() # assign_coords() ``` Problem description [this should explain why the current behavior is a problem and why the expected output is a better solution.] I want to calculate the Decile Index - see the `ex1-Calculate Decile Index (DI) with Python.ipynb`. The `pandas` implementation is simple enough but I need help with applying the bin labels to a new `variable` / `coordinate`. Expected Output ``` <xarray.Dataset> Dimensions: (lat: 10, lon: 10, time: 24) Coordinates: * time (time) datetime64[ns] 2010-01-31 2010-02-28 ... 2011-12-31 * lat (lat) float32 -5.175003 -5.125 -5.075001 ... -4.7750015 -4.7250023 * lon (lon) float32 33.524994 33.574997 33.625 ... 33.925003 33.97499 Data variables: precip (time, lat, lon) float32 4.6461554 4.790813 ... 7.3063064 7.535994 rank_bin (lat, lon, time) int64 1 3 3 0 1 4 2 3 0 1 ... 0 4 0 1 3 1 2 2 3 1 ``` Output of `xr.show_versions()` # Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.0 \| packaged by conda-forge \| (default, Nov 12 2018, 12:34:36) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.3.0 netCDF4: 1.5.1.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: 1.0.17 cfgrib: 0.9.7 iris: None bottleneck: 1.2.1 dask: 1.2.2 distributed: 1.28.1 matplotlib: 3.1.0 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 41.0.1 pip: 19.1 conda: None pytest: 4.5.0 IPython: 7.1.1 sphinx: 2.0.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3004/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
462010865	MDU6SXNzdWU0NjIwMTA4NjU=	3053	How to select using `.where()` on a timestamp `coordinate` for forecast data with 5 dimensions	tommylees112 21049064	closed	6	2019-06-28T12:28:05Z	2019-07-07T12:17:12Z	2019-07-07T12:17:12Z	NONE	I am working with a multi-dimensional object (more than just time,lat,lon which is what i am used to). It is a forecast produced from a weather model and so has the complexity of having an `initialisation_date` as well as a `forecast_horizon`. I have a `valid_time` `coordinate` which defines the true time that the forecast refers to. I want to be able to select these `valid_time` objects from my `xr.Dataset` object but I don't know how to select from a `coordinate` that is not also a `dimension`. MCVE Code Sample ```python import pandas as pd import numpy as np import xarray as xr initialisation_date = pd.date_range('2018-01-01','2018-12-31',freq='M') number = [i for i in range(0, 51)] # corresponds to model number (ensemble of model runs) lat = np.linspace(-5.175003, -5.202, 36) lon = np.linspace(33.5, 42.25, 45) forecast_horizon = np.array( [ 2419200000000000, 2592000000000000, 2678400000000000, 5097600000000000, 5270400000000000, 5356800000000000, 7689600000000000, 7776000000000000, 7862400000000000, 7948800000000000, 10368000000000000, 10454400000000000, 10540800000000000, 10627200000000000, 12960000000000000, 13046400000000000, 13219200000000000, 15638400000000000, 15724800000000000, 15811200000000000, 15897600000000000, 18316800000000000, 18489600000000000, 18576000000000000 ], dtype='timedelta64[ns]' ) valid_time = initialisation_date[:, np.newaxis] + forecast_horizon precip = np.random.normal( 0, 1, size=(len(number), len(initialisation_date), len(forecast_horizon), len(lat), len(lon)) ) ds = xr.Dataset( {'precip': (['number', 'initialisation_date', 'forecast_horizon', 'lat', 'lon'], precip)}, coords={ 'lon': lon, 'lat': lat, 'initialisation_date': initialisation_date, 'number': number, 'forecast_horizon': forecast_horizon, 'valid_time': (['initialisation_date', 'step'], valid_time) } ) Out[]: <xarray.Dataset> Dimensions: (forecast_horizon: 24, initialisation_date: 12, lat: 36, lon: 45, number: 51, step: 24) Coordinates: * lon (lon) float64 33.5 33.7 33.9 34.1 ... 41.85 42.05 42.25 * lat (lat) float64 -5.175 -5.176 -5.177 ... -5.201 -5.202 * initialisation_date (initialisation_date) datetime64[ns] 2018-01-31 ... 2018-12-31 * number (number) int64 0 1 2 3 4 5 6 7 ... 44 45 46 47 48 49 50 * forecast_horizon (forecast_horizon) timedelta64[ns] 28 days ... 215 days valid_time (initialisation_date, step) datetime64[ns] 2018-02-28 ... 2019-08-03 Dimensions without coordinates: step Data variables: precip (number, initialisation_date, forecast_horizon, lat, lon) float64 1.373 ... 1.138 ``` I try to select all the March, April, May months from the `valid_time` coordinate using `.sel[]` ```python select March April May from the valid_time ds.sel(valid_time=np.isin(ds['valid_time.month'], [3,4,5])) ``` Error Message: ``` <ipython-input-151-132375b92854> in <module> ----> 1 ds.sel(valid_time=np.isin(ds['valid_time.month'], [3,4,5])) ~/miniconda3/envs/crop/lib/python3.7/site-packages/xarray/core/dataset.py in sel(self, indexers, method, tolerance, drop, indexers_kwargs) 1729 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, 'sel') 1730 pos_indexers, new_indexes = remap_label_indexers( -> 1731 self, indexers=indexers, method=method, tolerance=tolerance) 1732 result = self.isel(indexers=pos_indexers, drop=drop) 1733 return result._overwrite_indexes(new_indexes) ~/miniconda3/envs/crop/lib/python3.7/site-packages/xarray/core/coordinates.py in remap_label_indexers(obj, indexers, method, tolerance, indexers_kwargs) 315 316 pos_indexers, new_indexes = indexing.remap_label_indexers( --> 317 obj, v_indexers, method=method, tolerance=tolerance 318 ) 319 # attach indexer's coordinate to pos_indexers ~/miniconda3/envs/crop/lib/python3.7/site-packages/xarray/core/indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance) 237 new_indexes = {} 238 --> 239 dim_indexers = get_dim_indexers(data_obj, indexers) 240 for dim, label in dim_indexers.items(): 241 try: ~/miniconda3/envs/crop/lib/python3.7/site-packages/xarray/core/indexing.py in get_dim_indexers(data_obj, indexers) 205 if invalid: 206 raise ValueError("dimensions or multi-index levels %r do not exist" --> 207 % invalid) 208 209 level_indexers = defaultdict(dict) ValueError: dimensions or multi-index levels ['valid_time'] do not exist ``` I have also tried creating a new variable of the month object from `valid_time` and copying it to the same shape as the `precip` variable ```python create months array of shape (51, 12, 24, 36, 45) months = ds['valid_time.month'].values m = np.repeat(months[np.newaxis, :, :], 51, axis=0) m = np.repeat(m[:, :, :, np.newaxis], 36, axis=3) m = np.repeat(m[:, :, :, :, np.newaxis], 45, axis=4) assert (m[0, :, :, 0, 0] == m[50, :, :, 4, 2]).all(), f"The matrices have not been copied to the correct dimensions" ds['month'] = (['number', 'initialisation_date', 'forecast_horizon', 'lat', 'lon'], m) ds.where(np.isin(ds['month'], [3,4,5])).dropna(how='all', dim='forecast_horizon') ``` Problem Description I want to be able to select all of the forecasts that correspond to the `valid_time` I select. I think that an issue might be that the result from that query will be an irregular grid, because we will have different `initialisation_date` and `forecast_horizon` combinations that match the query. Is that the case I want to select from a `coordinate` object that isn't a dimension. How can I go about doing this? Expected Output For example. I want to return the lat-lon arrays for the `valid_time` = `2018-04-01`. The returning combinations should be 51 realisations of a (36 x 45) (`lat, lon`) grid of forecast values. So 3 possible forecasts matching this criteria: - `initialisation_date`=`2018-01-01` at a `forecast_horizon`=`3 months` - `initialisation_date`=`2018-02-01` at a `forecast_horizon`=`2 months` - `initialisation_date`=`2018-03-01` at a `forecast_horizon`=`1 months` Output of `xr.show_versions()` # Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.0 \| packaged by conda-forge \| (default, Nov 12 2018, 12:34:36) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.4 scipy: 1.3.0 netCDF4: 1.5.1.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: 1.0.17 cfgrib: 0.9.7 iris: None bottleneck: 1.2.1 dask: 1.2.2 distributed: 1.28.1 matplotlib: 3.1.0 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 41.0.1 pip: 19.1 conda: None pytest: 4.5.0 IPython: 7.1.1 sphinx: 2.0.1 Thanks so much for your package! Working with these multidimensional datasets would be a pain without `xarray`. I really appreciate the help that I have got from `github` / `stackExchange` by the community in the past. Thanks team!	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3053/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
377947810	MDU6SXNzdWUzNzc5NDc4MTA=	2547	How do I copy my array forwards in time?	tommylees112 21049064	closed	14	2018-11-06T17:12:00Z	2019-05-19T02:49:14Z	2019-05-19T02:49:14Z	NONE	I have this `xr.Dataset`: `<xarray.Dataset> Dimensions: (time: 1, x: 200, y: 200) Coordinates: * time (time) float64 9.505e+17 Dimensions without coordinates: x, y Data variables: Rg (time, y, x) float32 ... latitude (y, x) float64 ... longitude (y, x) float64 ... time datetime64[ns] 2000-02-14 Attributes: Conventions: CF-1.0 content: HARMONIZED WORLD SOIL DATABASE; first it was aggregated ... scaling_factor: 20` I want to copy it through time, adding the time dimension at a given daterange Something like this: ```python times = pd.date_range('2000-01-01', '2000-12-31', name='time') ds.time = times[0] all_data = [ds] for i, time in enumerate(times[1:]): ds_t1 = ds.copy() ds_t1.time = time `all_data.append(ds) ds = ds_t1` ds = xr.concat(all_data) ``` So I should have output data like: `<xarray.Dataset> Dimensions: (time: 366, x: 200, y: 200) Coordinates: * time (time) float64 ... Dimensions without coordinates: x, y Data variables: Rg (time, y, x) float32 ... latitude (y, x) float64 ... longitude (y, x) float64 ... time datetime64[ns] 2000-02-14 Attributes: Conventions: CF-1.0 content: HARMONIZED WORLD SOIL DATABASE; first it was aggregated ... scaling_factor: 20`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2547/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

4 rows where repo = 13221727, type = "issue" and user = 21049064 sorted by updated_at descending

Out[]:

Code Sample, a copy-pastable example if possible

Your code here

calculate a cumsum over some window size

construct a cumulative frequency distribution ranking the precip values

per month

assign bins to variable xarray

Problem description

Expected Output

Output of `xr.show_versions()`

MCVE Code Sample

select March April May from the valid_time

create months array of shape (51, 12, 24, 36, 45)

Problem Description

Expected Output

Output of `xr.show_versions()`

Advanced export

issues

4 rows where repo = 13221727, type = "issue" and user = 21049064 sorted by updated_at descending

Out[]:

Code Sample, a copy-pastable example if possible

Your code here

calculate a cumsum over some window size

construct a cumulative frequency distribution ranking the precip values

per month

assign bins to variable xarray

Problem description

Expected Output

Output of xr.show_versions()

MCVE Code Sample

select March April May from the valid_time

create months array of shape (51, 12, 24, 36, 45)

Problem Description

Expected Output

Output of xr.show_versions()

Advanced export

Output of `xr.show_versions()`

Output of `xr.show_versions()`