html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/4700#issuecomment-765561903,https://api.github.com/repos/pydata/xarray/issues/4700,765561903,MDEyOklzc3VlQ29tbWVudDc2NTU2MTkwMw==,13301940,2021-01-22T17:13:39Z,2021-01-22T17:14:52Z,MEMBER,"> Yes, I'd say go ahead. (I just hope it's not too big of a performance hit for normal use cases.)

@mathause, I am noticing a performance hit even for the special use cases. Here's how I am doing the sampling

```python
sample_indices = np.random.choice(array.size, size=min(20, array.size), replace=False)
native_dtypes = set(np.vectorize(type, otypes=[object])(array.ravel()[sample_indices]))
```

and here's the code snippet I tested this on:


```python
In [1]: import xarray as xr, numpy as np

In [2]: x = np.asarray(list(""abcdefghijklmnopqrstuvwxyz""), dtype=""object"")

In [3]: array = np.repeat(x, 5_000_000)

In [4]: array.size
Out[4]: 130000000

In [5]: array.dtype
Out[5]: dtype('O')
```

### Without sampling


```python
In [6]: %timeit xr.conventions._infer_dtype(array, ""test"")
7.63 s ± 515 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```


### With sampling


```python
In [15]: %timeit xr.conventions._infer_dtype(array, ""test"")
8.31 s ± 395 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

I could be wrong, but the sampling doesn't seem to be worth it. 
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,768981497
https://github.com/pydata/xarray/pull/4700#issuecomment-764028548,https://api.github.com/repos/pydata/xarray/issues/4700,764028548,MDEyOklzc3VlQ29tbWVudDc2NDAyODU0OA==,13301940,2021-01-20T23:36:43Z,2021-01-20T23:36:43Z,MEMBER,"> Also an array of this size is likely a dask array and there is already a performance warning on this. So I'd say go ahead.

@mathause, just to make sure I am not misinterpreting your comment, is this a go ahead to sampling the array to determine the types? :) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,768981497
https://github.com/pydata/xarray/pull/4700#issuecomment-747220457,https://api.github.com/repos/pydata/xarray/issues/4700,747220457,MDEyOklzc3VlQ29tbWVudDc0NzIyMDQ1Nw==,13301940,2020-12-17T05:44:55Z,2020-12-17T05:44:55Z,MEMBER,"> Alternatives — not ideal ones — would be to wait until the main error is raised, or only test a subset of the values. 

I thought of taking a random sample from the array and checking the types on the sample only, but I wasn't so confident about how representative this sample would be and/or how to deal with misleading, skewed samples. If anyone has thoughts on this, please let me know. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,768981497
https://github.com/pydata/xarray/pull/4700#issuecomment-746446912,https://api.github.com/repos/pydata/xarray/issues/4700,746446912,MDEyOklzc3VlQ29tbWVudDc0NjQ0NjkxMg==,13301940,2020-12-16T15:11:12Z,2020-12-16T15:18:18Z,MEMBER,"# Before

```python
In [2]: data = np.array([[""x"", 1], [""y"", 2]], dtype=""object"")

In [3]: xr.conventions._infer_dtype(data, 'test')
Out[3]: dtype('O')
```

As pointed out in #2620, this doesn't seem problematic until the user tries writing the xarray object to disk. This results in a very cryptic error message:

```python
In [7]: ds.to_netcdf('test.nc', engine='netcdf4')
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__setitem__()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._put()

TypeError: expected bytes, int found
```

# After

```python
In [2]: data = np.array([[""x"", 1], [""y"", 2]], dtype=""object"")

In [3]: xr.conventions._infer_dtype(data, 'test')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-addaab43c03a> in <module>
----> 1 xr.conventions._infer_dtype(data, 'test')

~/devel/pydata/xarray/xarray/conventions.py in _infer_dtype(array, name)
    142     native_dtypes = set(map(lambda x: type(x), array.flatten()))
    143     if len(native_dtypes) > 1:
--> 144         raise ValueError(
    145             ""unable to infer dtype on variable {!r}; object array ""
    146             ""contains mixed native types: {}"".format(

ValueError: unable to infer dtype on variable 'test'; object array contains mixed native types: str,int
```

During I/O, the user gets:

```python
...
~/devel/pydata/xarray/xarray/conventions.py in ensure_dtype_not_object(var, name)
    223             data[missing] = fill_value
    224         else:
--> 225             data = _copy_with_dtype(data, dtype=_infer_dtype(data, name))
    226 
    227         assert data.dtype.kind != ""O"" or data.dtype.metadata

~/devel/pydata/xarray/xarray/conventions.py in _infer_dtype(array, name)
    142     native_dtypes = set(map(lambda x: type(x), array.flatten()))
    143     if len(native_dtypes) > 1:
--> 144         raise ValueError(
    145             ""unable to infer dtype on variable {!r}; object array ""
    146             ""contains mixed native types: {}"".format(

ValueError: unable to infer dtype on variable 'test'; object array contains mixed native types: str,int
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,768981497