github: issues: 3 rows where type = "issue" and user = 463809 sorted by updated

3 rows where type = "issue" and user = 463809 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
789410367	MDU6SXNzdWU3ODk0MTAzNjc=	4826	Reading and writing a zarr dataset multiple times casts bools to int8	amatsukawa 463809	closed	10	2021-01-19T22:02:15Z	2023-04-10T09:26:27Z	2023-04-10T09:26:27Z	CONTRIBUTOR	What happened: Reading and writing zarr dataset multiple times into different paths changes `bool` dtype arrays to `int8`. I think this issue is related to #2937. What you expected to happen: My array's dtype in numpy/dask should not change, even if certain storage backends store dtypes a certain way. Minimal Complete Verifiable Example: ```python import xarray as xr import numpy as np ds = xr.Dataset({ "bool_field": xr.DataArray( np.random.randn(5) < 0.5, dims=('g'), coords={'g': np.arange(5)} ) }) ds.to_zarr('test.zarr', mode="w") d2 = xr.open_zarr('test.zarr') print(d2.bool_field.dtype) print(d2.bool_field.encoding) d2.to_zarr("test2.zarr", mode="w") d3 = xr.open_zarr('test2.zarr') print(d3.bool_field.dtype) `` The above snippet prints the following. In d3, the dtype ofbool_field`is`int8`, presumably because d3 inherited d2's`encoding`and it says`int8`, despite the array having a`bool` dtype. `bool {'chunks': (5,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'dtype': dtype('int8')} int8` Anything else we need to know?: Currently workaround is to explicitly set encodings. This fixes the problem: `python encoding = {k: {"dtype": d2[k].dtype} for k in d2} d2.to_zarr('test2.zarr', mode="w", encoding=encoding)` Environment: Output of <tt>xr.show_versions()</tt> ``` # I'll update with the the full output of xr.show_versions() soon. In [4]: xr.__version__ Out[4]: '0.16.2' In [2]: zarr.__version__ Out[2]: '2.6.1' ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4826/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
484098286	MDU6SXNzdWU0ODQwOTgyODY=	3242	An `asfreq` method without `resample`, and clarify or improve resample().asfreq() behavior for down-sampling	amatsukawa 463809	open	2	2019-08-22T16:33:32Z	2022-04-18T16:01:07Z		CONTRIBUTOR	MCVE Code Sample ```python Your code here import numpy as np import xarray as xr import pandas as pd data = np.random.random(300) Make a time grid that doesn't start exactly on the hour. time = pd.date_range('2019-01-01', periods=300, freq='T') + pd.Timedelta('3T') time DatetimeIndex(['2019-01-01 00:03:00', '2019-01-01 00:04:00', '2019-01-01 00:05:00', '2019-01-01 00:06:00', '2019-01-01 00:07:00', '2019-01-01 00:08:00', '2019-01-01 00:09:00', '2019-01-01 00:10:00', '2019-01-01 00:11:00', '2019-01-01 00:12:00', ... '2019-01-01 04:53:00', '2019-01-01 04:54:00', '2019-01-01 04:55:00', '2019-01-01 04:56:00', '2019-01-01 04:57:00', '2019-01-01 04:58:00', '2019-01-01 04:59:00', '2019-01-01 05:00:00', '2019-01-01 05:01:00', '2019-01-01 05:02:00'], dtype='datetime64[ns]', length=300, freq='T') da = xr.DataArray(data, dims=['time'], coords={'time': time}) resampled = da.resample(time='H').asfreq() resampled <xarray.DataArray (time: 6)> array([0.478601, 0.488425, 0.496322, 0.479256, 0.523395, 0.201718]) Coordinates: * time (time) datetime64[ns] 2019-01-01 ... 2019-01-01T05:00:00 The value is actually the mean over the time window, eg. the third value is: da.loc['2019-01-01T02:00:00':'2019-01-01T02:59:00'].mean() <xarray.DataArray ()> array(0.496322) ``` Expected Output Docs say this: `Return values of original object at the new up-sampling frequency; essentially a re-index with new times set to NaN.` I suppose this doc is not technically wrong, since upon careful reading, I realize it does not define a behavior for down-sampling. But it's easy to: (1) assume the same behavior (reindexing) for down-sampling and up-sampling and/or (2) expect behavior similar to `df.asfreq()` in pandas. Problem Description I would argue for an `asfreq` method without resampling that matches the pandas behavior, which AFAIK, is to reindex starting at the first timestamp, at the specified interval. ``` df = pd.DataFrame(da, index=time) df.asfreq('H') 0 2019-01-01 00:03:00 0.065304 2019-01-01 01:03:00 0.325814 2019-01-01 02:03:00 0.841201 2019-01-01 03:03:00 0.610266 2019-01-01 04:03:00 0.613906 ``` This can currently easily be achieved, so it's not a blocker. ``` da.reindex(time=pd.date_range(da.time[0].values, da.time[-1].values, freq='H')) <xarray.DataArray (time: 5)> array([0.065304, 0.325814, 0.841201, 0.610266, 0.613906]) Coordinates: * time (time) datetime64[ns] 2019-01-01T00:03:00 ... 2019-01-01T04:03:00 ``` Why I argue for `asfreq` functionality outside of resampling is that `asfreq(freq)` in pandas is purely a reindex, compared to eg `resample(freq).first()` which would give you a different time index. Output of `xr.show_versions()` Still on python27, `show_versions` actually throws an exception, because some HDF5 library doesn't have a magic property. I don't think this detail is relevant here though. ``` >>> xr.__version__ u'0.11.3' ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3242/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
516725099	MDU6SXNzdWU1MTY3MjUwOTk=	3480	Allow appending non-numerical types to zarr arrays.	amatsukawa 463809	closed	0	2019-11-02T21:20:53Z	2019-11-13T15:55:33Z	2019-11-13T15:55:33Z	CONTRIBUTOR	MCVE Code Sample Zarr itself allows appending `np.datetime` and `np.bool` types. ```python path = 'tmp/test.zarr' z1 = zarr.open(path, mode='w', shape=(10,), chunks=(10,), dtype='M8[D]') z1[:] = '1990-01-01' z2 = zarr.open(path, mode='a') a = np.array(['1992-01-01'] * 10, dtype='datetime64[D]') z2.append(a) (20,) z2 <zarr.core.Array (20,) datetime64[D]> ``` But it's equivalent in xarray throws an error: ``` ds = xr.Dataset( ... {'y': (('x',), np.array(['1991-01-01'] * 10, dtype='datetime64[D]'))} ... ) ds.to_zarr('tmp/test_xr.zarr', mode='w') <xarray.backends.zarr.ZarrStore object at 0x31f403170> ds2 = xr.Dataset( ... {'y': (('x',), np.array(['1992-01-01'] * 10, dtype='datetime64[D]'))} ... ) ds2.to_zarr('tmp/test_xr.zarr', mode='a', append_dim='x') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/core/dataset.py", line 1616, in to_zarr append_dim=append_dim, File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1304, in to_zarr _validate_datatypes_for_zarr_append(dataset) File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1249, in _validate_datatypes_for_zarr_append check_dtype(k) File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1245, in check_dtype "unicode string or an object".format(var) ValueError: Invalid dtype for data variable: <xarray.DataArray 'y' (x: 10)> array(['1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000'], dtype='datetime64[ns]') Dimensions without coordinates: x dtype must be a subtype of number, a fixed sized string, a fixed size unicode string or an object ``` Expected Output The append should succeed. Problem Description This function in `xarray/api.py` is too strict on types: ``` def _validate_datatypes_for_zarr_append(dataset): """DataArray.name and Dataset keys must be a string or None""" `def check_dtype(var): if ( not np.issubdtype(var.dtype, np.number) and not coding.strings.is_unicode_dtype(var.dtype) and not var.dtype == object ): # and not re.match('^bytes[1-9]+$', var.dtype.name)): raise ValueError( "Invalid dtype for data variable: {} " "dtype must be a subtype of number, " "a fixed sized string, a fixed size " "unicode string or an object".format(var) ) for k in dataset.data_vars.values(): check_dtype(k)` ``` `np.datetime64[.]` and `np.bool` are not numbers: ``` np.issubdtype(np.dtype('datetime64[D]'), np.number) False np.issubdtype(np.dtype('bool'), np.number) False ``` Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 15:17:50) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None xarray: 0.14.0 pandas: 0.25.1 numpy: 1.17.2 scipy: 1.3.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: 2.3.2 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.5.2 distributed: 2.5.2 matplotlib: 3.1.1 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 41.4.0 pip: 19.2.3 conda: 4.7.12 pytest: 5.2.1 IPython: 7.8.0 sphinx: 2.2.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3480/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

3 rows where type = "issue" and user = 463809 sorted by updated_at descending

MCVE Code Sample

Your code here

Make a time grid that doesn't start exactly on the hour.

The value is actually the mean over the time window, eg. the third value is:

Expected Output

Problem Description

Output of `xr.show_versions()`

MCVE Code Sample

Expected Output

Problem Description

Output of `xr.show_versions()`

Advanced export

issues

3 rows where type = "issue" and user = 463809 sorted by updated_at descending

MCVE Code Sample

Your code here

Make a time grid that doesn't start exactly on the hour.

The value is actually the mean over the time window, eg. the third value is:

Expected Output

Problem Description

Output of xr.show_versions()

MCVE Code Sample

Expected Output

Problem Description

Output of xr.show_versions()

Advanced export

Output of `xr.show_versions()`

Output of `xr.show_versions()`