home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 1410253782

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/7494#issuecomment-1410253782 https://api.github.com/repos/pydata/xarray/issues/7494 1410253782 IC_kwDOAMm_X85UDsPW 5179430 2023-01-31T12:22:02Z 2023-01-31T12:26:37Z CONTRIBUTOR

Thanks for the PR. However, does that actually make a difference? To me it looks like _contains_cftime_datetimes also only considers one element of the array.

https://github.com/pydata/xarray/blob/b4515582ffc8b7f63632bfccd109d19889d00384/xarray/core/common.py#L1779-L1780

This isn't actually the line of code that's causing the performance bottleneck, it's the access to var.data in the function call that is actually problematic as I explained in the issue thread. You can verify this yourself running this simple example before and after applying the changes in this PR:

```python import numpy as np import xarray as xr

str_array = np.arange(100000000).astype(str) ds = xr.DataArray(dims=('x',), data=str_array).to_dataset(name='str_array') ds = ds.chunk(x=10000) ds['str_array'] = ds.str_array.astype('O') # Needs to actually be object dtype to show the problem ds.to_zarr('str_array.zarr')

%time xr.open_zarr('str_array.zarr') ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  1563270549
Powered by Datasette · Queries took 0.626ms · About: xarray-datasette