github: issues: 4 rows where user = 463809 sorted by updated

4 rows where user = 463809 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	draft	pull_request	body	reactions	state_reason	repo	type
789410367	MDU6SXNzdWU3ODk0MTAzNjc=	4826	Reading and writing a zarr dataset multiple times casts bools to int8	amatsukawa 463809	closed	10	2021-01-19T22:02:15Z	2023-04-10T09:26:27Z	2023-04-10T09:26:27Z	CONTRIBUTOR			What happened: Reading and writing zarr dataset multiple times into different paths changes `bool` dtype arrays to `int8`. I think this issue is related to #2937. What you expected to happen: My array's dtype in numpy/dask should not change, even if certain storage backends store dtypes a certain way. Minimal Complete Verifiable Example: ```python import xarray as xr import numpy as np ds = xr.Dataset({ "bool_field": xr.DataArray( np.random.randn(5) < 0.5, dims=('g'), coords={'g': np.arange(5)} ) }) ds.to_zarr('test.zarr', mode="w") d2 = xr.open_zarr('test.zarr') print(d2.bool_field.dtype) print(d2.bool_field.encoding) d2.to_zarr("test2.zarr", mode="w") d3 = xr.open_zarr('test2.zarr') print(d3.bool_field.dtype) `` The above snippet prints the following. In d3, the dtype ofbool_field`is`int8`, presumably because d3 inherited d2's`encoding`and it says`int8`, despite the array having a`bool` dtype. `bool {'chunks': (5,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'dtype': dtype('int8')} int8` Anything else we need to know?: Currently workaround is to explicitly set encodings. This fixes the problem: `python encoding = {k: {"dtype": d2[k].dtype} for k in d2} d2.to_zarr('test2.zarr', mode="w", encoding=encoding)` Environment: Output of <tt>xr.show_versions()</tt> ``` # I'll update with the the full output of xr.show_versions() soon. In [4]: xr.__version__ Out[4]: '0.16.2' In [2]: zarr.__version__ Out[2]: '2.6.1' ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4826/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
484098286	MDU6SXNzdWU0ODQwOTgyODY=	3242	An `asfreq` method without `resample`, and clarify or improve resample().asfreq() behavior for down-sampling	amatsukawa 463809	open	2	2019-08-22T16:33:32Z	2022-04-18T16:01:07Z		CONTRIBUTOR			MCVE Code Sample ```python Your code here import numpy as np import xarray as xr import pandas as pd data = np.random.random(300) Make a time grid that doesn't start exactly on the hour. time = pd.date_range('2019-01-01', periods=300, freq='T') + pd.Timedelta('3T') time DatetimeIndex(['2019-01-01 00:03:00', '2019-01-01 00:04:00', '2019-01-01 00:05:00', '2019-01-01 00:06:00', '2019-01-01 00:07:00', '2019-01-01 00:08:00', '2019-01-01 00:09:00', '2019-01-01 00:10:00', '2019-01-01 00:11:00', '2019-01-01 00:12:00', ... '2019-01-01 04:53:00', '2019-01-01 04:54:00', '2019-01-01 04:55:00', '2019-01-01 04:56:00', '2019-01-01 04:57:00', '2019-01-01 04:58:00', '2019-01-01 04:59:00', '2019-01-01 05:00:00', '2019-01-01 05:01:00', '2019-01-01 05:02:00'], dtype='datetime64[ns]', length=300, freq='T') da = xr.DataArray(data, dims=['time'], coords={'time': time}) resampled = da.resample(time='H').asfreq() resampled <xarray.DataArray (time: 6)> array([0.478601, 0.488425, 0.496322, 0.479256, 0.523395, 0.201718]) Coordinates: * time (time) datetime64[ns] 2019-01-01 ... 2019-01-01T05:00:00 The value is actually the mean over the time window, eg. the third value is: da.loc['2019-01-01T02:00:00':'2019-01-01T02:59:00'].mean() <xarray.DataArray ()> array(0.496322) ``` Expected Output Docs say this: `Return values of original object at the new up-sampling frequency; essentially a re-index with new times set to NaN.` I suppose this doc is not technically wrong, since upon careful reading, I realize it does not define a behavior for down-sampling. But it's easy to: (1) assume the same behavior (reindexing) for down-sampling and up-sampling and/or (2) expect behavior similar to `df.asfreq()` in pandas. Problem Description I would argue for an `asfreq` method without resampling that matches the pandas behavior, which AFAIK, is to reindex starting at the first timestamp, at the specified interval. ``` df = pd.DataFrame(da, index=time) df.asfreq('H') 0 2019-01-01 00:03:00 0.065304 2019-01-01 01:03:00 0.325814 2019-01-01 02:03:00 0.841201 2019-01-01 03:03:00 0.610266 2019-01-01 04:03:00 0.613906 ``` This can currently easily be achieved, so it's not a blocker. ``` da.reindex(time=pd.date_range(da.time[0].values, da.time[-1].values, freq='H')) <xarray.DataArray (time: 5)> array([0.065304, 0.325814, 0.841201, 0.610266, 0.613906]) Coordinates: * time (time) datetime64[ns] 2019-01-01T00:03:00 ... 2019-01-01T04:03:00 ``` Why I argue for `asfreq` functionality outside of resampling is that `asfreq(freq)` in pandas is purely a reindex, compared to eg `resample(freq).first()` which would give you a different time index. Output of `xr.show_versions()` Still on python27, `show_versions` actually throws an exception, because some HDF5 library doesn't have a magic property. I don't think this detail is relevant here though. ``` >>> xr.__version__ u'0.11.3' ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3242/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
520507183	MDExOlB1bGxSZXF1ZXN0MzM5MDg0MjUz	3504	Allow appending datetime & boolean variables to zarr stores	amatsukawa 463809	closed	5	2019-11-09T20:09:29Z	2019-11-13T18:47:42Z	2019-11-13T15:55:33Z	CONTRIBUTOR	0	pydata/xarray/pulls/3504	[x] Closes #3480 [x] Tests added [x] Passes `black . && mypy . && flake8` [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API AFAICT, the type checking in the `_validate_datatypes_for_zarr_append` is simply too strict, and relaxing it seems to work fine. But this is my first time digging into the xarray source code, so please let me know if this issue is more complex.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3504/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
516725099	MDU6SXNzdWU1MTY3MjUwOTk=	3480	Allow appending non-numerical types to zarr arrays.	amatsukawa 463809	closed	0	2019-11-02T21:20:53Z	2019-11-13T15:55:33Z	2019-11-13T15:55:33Z	CONTRIBUTOR			MCVE Code Sample Zarr itself allows appending `np.datetime` and `np.bool` types. ```python path = 'tmp/test.zarr' z1 = zarr.open(path, mode='w', shape=(10,), chunks=(10,), dtype='M8[D]') z1[:] = '1990-01-01' z2 = zarr.open(path, mode='a') a = np.array(['1992-01-01'] * 10, dtype='datetime64[D]') z2.append(a) (20,) z2 <zarr.core.Array (20,) datetime64[D]> ``` But it's equivalent in xarray throws an error: ``` ds = xr.Dataset( ... {'y': (('x',), np.array(['1991-01-01'] * 10, dtype='datetime64[D]'))} ... ) ds.to_zarr('tmp/test_xr.zarr', mode='w') <xarray.backends.zarr.ZarrStore object at 0x31f403170> ds2 = xr.Dataset( ... {'y': (('x',), np.array(['1992-01-01'] * 10, dtype='datetime64[D]'))} ... ) ds2.to_zarr('tmp/test_xr.zarr', mode='a', append_dim='x') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/core/dataset.py", line 1616, in to_zarr append_dim=append_dim, File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1304, in to_zarr _validate_datatypes_for_zarr_append(dataset) File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1249, in _validate_datatypes_for_zarr_append check_dtype(k) File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1245, in check_dtype "unicode string or an object".format(var) ValueError: Invalid dtype for data variable: <xarray.DataArray 'y' (x: 10)> array(['1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000'], dtype='datetime64[ns]') Dimensions without coordinates: x dtype must be a subtype of number, a fixed sized string, a fixed size unicode string or an object ``` Expected Output The append should succeed. Problem Description This function in `xarray/api.py` is too strict on types: ``` def _validate_datatypes_for_zarr_append(dataset): """DataArray.name and Dataset keys must be a string or None""" `def check_dtype(var): if ( not np.issubdtype(var.dtype, np.number) and not coding.strings.is_unicode_dtype(var.dtype) and not var.dtype == object ): # and not re.match('^bytes[1-9]+$', var.dtype.name)): raise ValueError( "Invalid dtype for data variable: {} " "dtype must be a subtype of number, " "a fixed sized string, a fixed size " "unicode string or an object".format(var) ) for k in dataset.data_vars.values(): check_dtype(k)` ``` `np.datetime64[.]` and `np.bool` are not numbers: ``` np.issubdtype(np.dtype('datetime64[D]'), np.number) False np.issubdtype(np.dtype('bool'), np.number) False ``` Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 15:17:50) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None xarray: 0.14.0 pandas: 0.25.1 numpy: 1.17.2 scipy: 1.3.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.5.2 distributed: 2.5.2 matplotlib: 3.1.1 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 41.4.0 pip: 19.2.3 conda: 4.7.12 pytest: 5.2.1 IPython: 7.8.0 sphinx: 2.2.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3480/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

4 rows where user = 463809 sorted by updated_at descending

MCVE Code Sample

Your code here

Make a time grid that doesn't start exactly on the hour.

The value is actually the mean over the time window, eg. the third value is:

Expected Output

Problem Description

Output of `xr.show_versions()`

MCVE Code Sample

Expected Output

Problem Description

Output of `xr.show_versions()`

Advanced export

issues

4 rows where user = 463809 sorted by updated_at descending

MCVE Code Sample

Your code here

Make a time grid that doesn't start exactly on the hour.

The value is actually the mean over the time window, eg. the third value is:

Expected Output

Problem Description

Output of xr.show_versions()

MCVE Code Sample

Expected Output

Problem Description

Output of xr.show_versions()

Advanced export

Output of `xr.show_versions()`

Output of `xr.show_versions()`