github: issues: 6 rows where repo = 13221727 and "updated_at" is on date 2022-05-01 sorted by updated

6 rows where repo = 13221727 and "updated_at" is on date 2022-05-01 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	draft	pull_request	body	reactions	state_reason	repo	type
1222215528	I_kwDOAMm_X85I2Ydo	6555	sortby with ascending=False should create an index	headtr1ck 43316012	closed	4	2022-05-01T16:57:51Z	2022-05-01T22:17:50Z	2022-05-01T22:17:50Z	COLLABORATOR			Is your feature request related to a problem? When using `sortby` with `ascending=False` on a DataArray/Dataset without an explicit index, the data gets correctly reversed, but it is not possible to tell anymore which ordering the data has. If an explicit index (like [0, 1, 2]) exists, it gets correctly reordered and allowes correct aligning. Describe the solution you'd like For consistency with aligning xarray should create a new index that indicates that the data has been reordered, i.e. [2, 1, 0]. Only downside: this will break code that relies on non-existent indexes. Describe alternatives you've considered No response Additional context No response	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6555/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1216982208	PR_kwDOAMm_X8422vsw	6522	Update issue template to include a checklist	max-sixty 5635139	closed	4	2022-04-27T08:19:49Z	2022-05-01T22:14:35Z	2022-05-01T22:14:32Z	MEMBER	0	pydata/xarray/pulls/6522	This replaces https://github.com/pydata/xarray/pull/5787. Please check out the previews in the most recent comment there	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6522/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
376370028	MDU6SXNzdWUzNzYzNzAwMjg=	2534	to_dataframe() excessive memory usage	guygriffiths 1665346	closed	3	2018-11-01T12:20:39Z	2022-05-01T22:04:51Z	2022-05-01T22:04:43Z	NONE			Code Sample, a copy-pastable example if possible ```python import xarray as xr from glob import glob This refers to a large multi-file NetCDF dataset file_list = sorted(glob('~/Data///.nc')) dataset = xr.open_mfdataset(file_list, decode_times=True, autoclose=True, decode_cf=True, cache=False, concat_dim='time') At this point, the total RAM used by the python process is ~1.4G Select a timeseries at a single point This is near instantaneous and uses no additional memory ts = dataset.sel({'lat': 10, 'lon': 10}, method='nearest') Convert that timeseries to a pandas dataframe. This is where the actual data reading happens, and reads the data into memory df = ts.to_dataframe() At this point, the total RAM used by the python process is ~10.5G ``` Problem description Despite the fact that the resulting dataframe only has a single lat/lon point's worth of data, a huge amount of RAM is used. I can get (what appears to be) an identical pandas DataFrame by changing the final line to: `python df = (ts 1.0).to_dataframe()` which reduces the total RAM to ~2.2G (i.e. 0.6G additional RAM for that single line vs 9G additional RAM). No type conversion is taking place (i.e. `ts` and `ts * 1.0` both have identical data types) Expected Output I would expect that `to_dataframe()` would require the same amount of memory whether or not it was multiplied by 1.0. I'm aware there could be a good reason for this, but it took me by surprise somewhat. Output of `xr.show_versions()` commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-36-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 xarray: 0.10.7 pandas: 0.23.1 numpy: 1.13.3 scipy: 0.17.0 netCDF4: 1.4.1 h5netcdf: None h5py: None Nio: None zarr: None bottleneck: None cyordereddict: None dask: 0.19.0 distributed: None matplotlib: 1.5.1 cartopy: None seaborn: 0.8.1 setuptools: 20.7.0 pip: 18.0 conda: None pytest: None IPython: 2.4.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2534/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
822987300	MDU6SXNzdWU4MjI5ODczMDA=	5001	.min() doesn't work on np.datetime64 with a chunked Dataset	ludwigVonKoopa 49512274	open	2	2021-03-05T11:12:19Z	2022-05-01T16:11:48Z		NONE			Hi all, if a xr.Dataset is chunked, i cannot do ds.time.min(), i get an error : `ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')`. I don't know if it is expected ? Moreover, `ds2.time.mean()` works Thanks What happened: raised an `UFuncTypeError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')` What you expected to happen: compute the min & max on a chunked datetime64 xarray.DataArray Minimal Complete Verifiable Example: ```python import xarray as xr import numpy as np obs=200 t0 = np.datetime64("2010-01-01T00:00:00") tn = t0 + np.timedelta64(1234, "D") ds2 = xr.Dataset( { "time": (["obs"], np.arange(t0, tn, (tn-t0)/obs)), }, coords={ "obs": (["obs"], np.arange(obs)), }, ).chunk({"obs": 100}) ds2.time.min() ``` Anything else we need to know?: ds2.time.mean() works, max & min raise Exception Environment*: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-133-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.6.1 cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.01.1 distributed: 2021.01.1 matplotlib: 3.3.4 cartopy: None seaborn: None numbagg: None pint: 0.16.1 setuptools: 52.0.0.post20210125 pip: 20.3.3 conda: None pytest: 6.2.2 IPython: 7.20.0 sphinx: 3.5.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5001/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1221885425	I_kwDOAMm_X85I1H3x	6549	Improved Dataset broadcasting	headtr1ck 43316012	open	3	2022-04-30T17:51:37Z	2022-05-01T14:37:43Z		COLLABORATOR			Is your feature request related to a problem? I am a bit puzzled about how xarrays is broadcasting Datasets. It seems to always add all dimensions to all variables. Is this what you want in general? See this example: ```python import xarray as xr da = xr.DataArray([[1, 2, 3]], dims=("x", "y")) <xarray.DataArray (x: 1, y: 3)> array([[1, 2, 3]]) ds = xr.Dataset({"a": ("x", [1]), "b": ("z", [2, 3])}) <xarray.Dataset> Dimensions: (x: 1, z: 2) Dimensions without coordinates: x, z Data variables: a (x) int32 1 b (z) int32 2 3 ds.broadcast_like(da) returns: <xarray.Dataset> Dimensions: (x: 1, y: 3, z: 2) Dimensions without coordinates: x, y, z Data variables: a (x, y, z) int32 1 1 1 1 1 1 b (x, y, z) int32 2 3 2 3 2 3 I think it should return: <xarray.Dataset> Dimensions: (x: 1, y: 3, z: 2) Dimensions without coordinates: x, y, z Data variables: a (x, y) int32 1 1 1 # notice here without "z" dim b (x, y, z) int32 2 3 2 3 2 3 ``` Describe the solution you'd like I would like broadcasting to behave the same way as e.g. a simple addition. In the upper example `da + ds` produces the dimensions that I want. Describe alternatives you've considered `ds + xr.zeros_like(da)` this works, but seems more like a "dirty hack". Additional context Maybe one can add an option to broadcasting that controls this behavior?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6549/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
264321376	MDU6SXNzdWUyNjQzMjEzNzY=	1621	Undesired decoding to timedelta64 (was: units of "seconds" translated to time coordinate)	pacioos 4701070	open	16	2017-10-10T17:58:45Z	2022-05-01T08:49:43Z		NONE			When using open_dataset( ), it is translating data variables with units of "seconds" to time coordinates. For example, measurements of wave period. I don't believe xarray should treat variables as time coordinates unless their units are of "seconds since...". I have noticed that changing my units to "second" or "sec" or "s" prevents xarray from translating the variable to datetime64 and keeps it float64, as desired. More details and an OPeNDAP example posted on github here: https://stackoverflow.com/questions/46552078/xarray-wave-period-in-seconds-ingested-as-timedelta64	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1621/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

6 rows where repo = 13221727 and "updated_at" is on date 2022-05-01 sorted by updated_at descending

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Code Sample, a copy-pastable example if possible

This refers to a large multi-file NetCDF dataset

At this point, the total RAM used by the python process is ~1.4G

Select a timeseries at a single point

This is near instantaneous and uses no additional memory

Convert that timeseries to a pandas dataframe.

This is where the actual data reading happens, and reads the data into memory

At this point, the total RAM used by the python process is ~10.5G

Problem description

Expected Output

Output of xr.show_versions()

Is your feature request related to a problem?

<xarray.DataArray (x: 1, y: 3)>

array([[1, 2, 3]])

<xarray.Dataset>

Dimensions: (x: 1, z: 2)

Dimensions without coordinates: x, z

Data variables:

a (x) int32 1

b (z) int32 2 3

returns:

<xarray.Dataset>

Dimensions: (x: 1, y: 3, z: 2)

Dimensions without coordinates: x, y, z

Data variables:

a (x, y, z) int32 1 1 1 1 1 1

b (x, y, z) int32 2 3 2 3 2 3

I think it should return:

<xarray.Dataset>

Dimensions: (x: 1, y: 3, z: 2)

Dimensions without coordinates: x, y, z

Data variables:

a (x, y) int32 1 1 1 # notice here without "z" dim

b (x, y, z) int32 2 3 2 3 2 3

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Advanced export

Output of `xr.show_versions()`