home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 768981497 and user = 13301940 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • andersy005 · 4 ✖

issue 1

  • Raise an informative error message when object array has mixed types · 4 ✖

author_association 1

  • MEMBER 4
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
765561903 https://github.com/pydata/xarray/pull/4700#issuecomment-765561903 https://api.github.com/repos/pydata/xarray/issues/4700 MDEyOklzc3VlQ29tbWVudDc2NTU2MTkwMw== andersy005 13301940 2021-01-22T17:13:39Z 2021-01-22T17:14:52Z MEMBER

Yes, I'd say go ahead. (I just hope it's not too big of a performance hit for normal use cases.)

@mathause, I am noticing a performance hit even for the special use cases. Here's how I am doing the sampling

python sample_indices = np.random.choice(array.size, size=min(20, array.size), replace=False) native_dtypes = set(np.vectorize(type, otypes=[object])(array.ravel()[sample_indices]))

and here's the code snippet I tested this on:

```python In [1]: import xarray as xr, numpy as np

In [2]: x = np.asarray(list("abcdefghijklmnopqrstuvwxyz"), dtype="object")

In [3]: array = np.repeat(x, 5_000_000)

In [4]: array.size Out[4]: 130000000

In [5]: array.dtype Out[5]: dtype('O') ```

Without sampling

python In [6]: %timeit xr.conventions._infer_dtype(array, "test") 7.63 s ± 515 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With sampling

python In [15]: %timeit xr.conventions._infer_dtype(array, "test") 8.31 s ± 395 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I could be wrong, but the sampling doesn't seem to be worth it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Raise an informative error message when object array has mixed types 768981497
764028548 https://github.com/pydata/xarray/pull/4700#issuecomment-764028548 https://api.github.com/repos/pydata/xarray/issues/4700 MDEyOklzc3VlQ29tbWVudDc2NDAyODU0OA== andersy005 13301940 2021-01-20T23:36:43Z 2021-01-20T23:36:43Z MEMBER

Also an array of this size is likely a dask array and there is already a performance warning on this. So I'd say go ahead.

@mathause, just to make sure I am not misinterpreting your comment, is this a go ahead to sampling the array to determine the types? :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Raise an informative error message when object array has mixed types 768981497
747220457 https://github.com/pydata/xarray/pull/4700#issuecomment-747220457 https://api.github.com/repos/pydata/xarray/issues/4700 MDEyOklzc3VlQ29tbWVudDc0NzIyMDQ1Nw== andersy005 13301940 2020-12-17T05:44:55Z 2020-12-17T05:44:55Z MEMBER

Alternatives — not ideal ones — would be to wait until the main error is raised, or only test a subset of the values.

I thought of taking a random sample from the array and checking the types on the sample only, but I wasn't so confident about how representative this sample would be and/or how to deal with misleading, skewed samples. If anyone has thoughts on this, please let me know.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Raise an informative error message when object array has mixed types 768981497
746446912 https://github.com/pydata/xarray/pull/4700#issuecomment-746446912 https://api.github.com/repos/pydata/xarray/issues/4700 MDEyOklzc3VlQ29tbWVudDc0NjQ0NjkxMg== andersy005 13301940 2020-12-16T15:11:12Z 2020-12-16T15:18:18Z MEMBER

Before

```python In [2]: data = np.array([["x", 1], ["y", 2]], dtype="object")

In [3]: xr.conventions._infer_dtype(data, 'test') Out[3]: dtype('O') ```

As pointed out in #2620, this doesn't seem problematic until the user tries writing the xarray object to disk. This results in a very cryptic error message:

```python In [7]: ds.to_netcdf('test.nc', engine='netcdf4') netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.setitem()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._put()

TypeError: expected bytes, int found ```

After

```python In [2]: data = np.array([["x", 1], ["y", 2]], dtype="object")

In [3]: xr.conventions._infer_dtype(data, 'test')

ValueError Traceback (most recent call last) <ipython-input-3-addaab43c03a> in <module> ----> 1 xr.conventions._infer_dtype(data, 'test')

~/devel/pydata/xarray/xarray/conventions.py in _infer_dtype(array, name) 142 native_dtypes = set(map(lambda x: type(x), array.flatten())) 143 if len(native_dtypes) > 1: --> 144 raise ValueError( 145 "unable to infer dtype on variable {!r}; object array " 146 "contains mixed native types: {}".format(

ValueError: unable to infer dtype on variable 'test'; object array contains mixed native types: str,int ```

During I/O, the user gets:

```python ... ~/devel/pydata/xarray/xarray/conventions.py in ensure_dtype_not_object(var, name) 223 data[missing] = fill_value 224 else: --> 225 data = _copy_with_dtype(data, dtype=_infer_dtype(data, name)) 226 227 assert data.dtype.kind != "O" or data.dtype.metadata

~/devel/pydata/xarray/xarray/conventions.py in _infer_dtype(array, name) 142 native_dtypes = set(map(lambda x: type(x), array.flatten())) 143 if len(native_dtypes) > 1: --> 144 raise ValueError( 145 "unable to infer dtype on variable {!r}; object array " 146 "contains mixed native types: {}".format(

ValueError: unable to infer dtype on variable 'test'; object array contains mixed native types: str,int ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Raise an informative error message when object array has mixed types 768981497

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 41.644ms · About: xarray-datasette