github: issues: 4 rows where repo = 13221727 and "updated_at" is on date 2020-01-20 sorted by updated

4 rows where repo = 13221727 and "updated_at" is on date 2020-01-20 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	draft	pull_request	body	reactions	state_reason	repo	type
548475127	MDU6SXNzdWU1NDg0NzUxMjc=	3686	Different data values from xarray open_mfdataset when using chunks	abarciauskas-bgse 15016780	closed	7	2020-01-11T20:15:12Z	2020-01-20T20:35:48Z	2020-01-20T20:35:47Z	NONE			MCVE Code Sample You will first need to download or (mount podaac's drive) from PO.DAAC, including credentials: ```bash curl -u USERNAME:PASSWORD https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/152/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc -O data/mursst_netcdf/152/ curl -u USERNAME:PASSWORD https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/153/20020602090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc -O data/mursst_netcdf/153/ curl -u USERNAME:PASSWORD https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/154/20020603090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc -O data/mursst_netcdf/154/ curl -u USERNAME:PASSWORD https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/155/20020604090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc -O data/mursst_netcdf/155/ curl -u USERNAME:PASSWORD https://podaac-tools.jpl.nasa.gov/drive/files/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/156/20020605090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc -O data/mursst_netcdf/156/ ``` Then run the following code: ```python from datetime import datetime import xarray as xr import glob def generate_file_list(start_doy, end_doy): """ Given a start day and end end day, generate a list of file locations. Assumes a 'prefix' and 'year' variables have already been defined. 'Prefix' should be a local directory or http url and path. 'Year' should be a 4 digit year. """ days_of_year = list(range(start_doy, end_doy)) fileObjs = [] for doy in days_of_year: if doy < 10: doy = f"00{doy}" elif doy >= 10 and doy < 100: doy = f"0{doy}" file = glob.glob(f"{prefix}/{doy}/*.nc")[0] fileObjs.append(file) return fileObjs Invariants - but could be made configurable year = 2002 prefix = f"data/mursst_netcdf" chunks = {'time': 1, 'lat': 1799, 'lon': 3600} Create a list of files start_doy = 152 num_days = 5 end_doy = start_doy + num_days fileObjs = generate_file_list(start_doy, end_doy) will use this timeslice in query later on time_slice = slice(datetime.strptime(f"{year}-06-02", '%Y-%m-%d'), datetime.strptime(f"{year}-06-04", '%Y-%m-%d')) print("results from unchunked dataset") ds_unchunked = xr.open_mfdataset(fileObjs, combine='by_coords') print(ds_unchunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values) print(ds_unchunked.analysed_sst.sel(time=time_slice).mean().values) print(f"results from chunked dataset using {chunks}") ds_chunked = xr.open_mfdataset(fileObjs, combine='by_coords', chunks=chunks) print(ds_chunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values) print(ds_chunked.analysed_sst.sel(time=time_slice).mean().values) print("results from chunked dataset using 'auto'") ds_chunked = xr.open_mfdataset(fileObjs, combine='by_coords', chunks={'time': 'auto', 'lat': 'auto', 'lon': 'auto'}) print(ds_chunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values) print(ds_chunked.analysed_sst.sel(time=time_slice).mean().values) ``` Note, these are just a few examples but I tried a variety of other chunk options and got similar discrepancies between the unchunked and chunked datasets. Output: `results from unchunked dataset 290.13754 286.7869 results from chunked dataset using {'time': 1, 'lat': 1799, 'lon': 3600} 290.13757 286.81107 results from chunked dataset using 'auto' 290.1377 286.8118` Expected Output Values output from queries of chunked and unchunked xarray dataset are equal. Problem Description I want to understand how to chunk or query data to verify data opened using chunks will have the same output as data opened without chunking. Would like to store data ultimately in Zarr but verifying data integrity is critical. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.8.1 \| packaged by conda-forge \| (default, Jan 5 2020, 20:58:18) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.14.154-99.181.amzn1.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.14.1 pandas: 0.25.3 numpy: 1.17.3 scipy: None netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.3.2 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.9.1 distributed: 2.9.1 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 44.0.0.post20200102 pip: 19.3.1 conda: None pytest: None IPython: 7.11.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3686/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
539821504	MDExOlB1bGxSZXF1ZXN0MzU0NzMwNzI5	3642	Make datetime_to_numeric more robust to overflow errors	huard 81219	closed	1	2019-12-18T17:34:41Z	2020-01-20T19:21:49Z	2020-01-20T19:21:49Z	CONTRIBUTOR	0	pydata/xarray/pulls/3642	[x] Closes #3641 [x] Tests added [x] Passes `black . && mypy . && flake8` [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API This is likely only safe with NumPy>=1.17 though.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3642/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
550964139	MDExOlB1bGxSZXF1ZXN0MzYzNzcyNzE3	3699	Feature/align in dot	mathause 10194086	closed	4	2020-01-16T17:55:38Z	2020-01-20T12:55:51Z	2020-01-20T12:09:27Z	MEMBER	0	pydata/xarray/pulls/3699	[x] Closes #3694 [x] Tests added [x] Passes `black . && mypy . && flake8` [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API Happy to get feedback @fujiisoup @shoyer	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3699/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
549679475	MDU6SXNzdWU1NDk2Nzk0NzU=	3694	xr.dot requires equal indexes (join="exact")	mathause 10194086	closed	5	2020-01-14T16:28:15Z	2020-01-20T12:09:27Z	2020-01-20T12:09:27Z	MEMBER			MCVE Code Sample ```python import xarray as xr import numpy as np d1 = xr.DataArray(np.arange(4), dims=["a"], coords=dict(a=[0, 1, 2, 3])) d2 = xr.DataArray(np.arange(4), dims=["a"], coords=dict(a=[0, 1, 2, 3])) note: different coords d3 = xr.DataArray(np.arange(4), dims=["a"], coords=dict(a=[1, 2, 3, 4])) (d1 * d2).sum() # -> array(14) xr.dot(d1, d2) # -> array(14) (d2 * d3).sum() # -> array(8) xr.dot(d2, d3) # -> ValueError ``` Expected Output `python <xarray.DataArray ()> array(8)` Problem Description The last statement results in an `python ValueError: indexes along dimension 'a' are not equal` because `xr.apply_ufunc` defaults to `join='exact'`. However, I think this should work - but maybe there is a good reason for this to fail? This is a problem for #2922 (weighted operations) - I think it is fine for the weights and data to not align. Fixing this may be as easy as specifying `join='inner'` in https://github.com/pydata/xarray/blob/e0fd48052dbda34ee35d2491e4fe856495c9621b/xarray/core/computation.py#L1181-L1187 @fujiisoup Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: 5afc6f32b18f5dbb9a89e30f156b626b0a83597d python: 3.7.3 \| packaged by conda-forge \| (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.12.14-lp151.28.36-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.2 xarray: 0.14.0+164.g5afc6f32.dirty pandas: 0.25.2 numpy: 1.17.3 scipy: 1.3.1 netCDF4: 1.5.1.2 pydap: installed h5netcdf: 0.7.4 h5py: 2.10.0 Nio: 1.5.5 zarr: 2.3.2 cftime: 1.0.4.2 nc_time_axis: 1.2.0 PseudoNetCDF: installed rasterio: 1.1.0 cfgrib: 0.9.7.2 iris: 2.2.0 bottleneck: 1.2.1 dask: 2.6.0 distributed: 2.6.0 matplotlib: 3.1.1 cartopy: 0.17.0 seaborn: 0.9.0 numbagg: installed setuptools: 41.6.0.post20191029 pip: 19.3.1 conda: None pytest: 5.2.2 IPython: 7.9.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3694/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

4 rows where repo = 13221727 and "updated_at" is on date 2020-01-20 sorted by updated_at descending

MCVE Code Sample

Invariants - but could be made configurable

Create a list of files

will use this timeslice in query later on

Expected Output

Problem Description

Output of `xr.show_versions()`

MCVE Code Sample

note: different coords

Expected Output

Problem Description

Output of `xr.show_versions()`

Advanced export

issues

4 rows where repo = 13221727 and "updated_at" is on date 2020-01-20 sorted by updated_at descending

MCVE Code Sample

Invariants - but could be made configurable

Create a list of files

will use this timeslice in query later on

Expected Output

Problem Description

Output of xr.show_versions()

MCVE Code Sample

note: different coords

Expected Output

Problem Description

Output of xr.show_versions()

Advanced export

Output of `xr.show_versions()`

Output of `xr.show_versions()`