html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7065#issuecomment-1260899163,https://api.github.com/repos/pydata/xarray/issues/7065,1260899163,IC_kwDOAMm_X85LJ8tb,12760310,2022-09-28T13:16:13Z,2022-09-28T13:16:13Z,NONE,"Hey @benbovy, sorry for resurrect again this post but today I'm seeing the same issue and for the love of me I cannot understand what is the difference in this dataset that is causing the latitude and longitude arrays to be duplicated...
 
<img width=""743"" alt=""Screen Shot 2022-09-28 at 15 13 27"" src=""https://user-images.githubusercontent.com/12760310/192788105-524f6d18-4b95-4b2e-bd5a-0513f99c4f39.png"">

If I try to merge these two datasets I get one with lat lon doubled in size. 
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1381955373
https://github.com/pydata/xarray/issues/7065#issuecomment-1255092548,https://api.github.com/repos/pydata/xarray/issues/7065,1255092548,IC_kwDOAMm_X85KzzFE,12760310,2022-09-22T14:17:07Z,2022-09-22T14:17:17Z,NONE,"> Actually there's another conversion when you reuse an xarray dimension coordinate in array-like computations:
> 
> ```python
> ds = xr.Dataset(coords={""x"": np.array([1.2, 1.3, 1.4], dtype=np.float16)})
> 
> # coordinate data is a wrapper around a pandas.Index object
> # (it keeps track of the original array dtype)
> ds.variables[""x""]._data
> # PandasIndexingAdapter(array=Float64Index([1.2001953125, 1.2998046875, 1.400390625], dtype='float64', name='x'), dtype=dtype('float16'))
> 
> # This coerces the pandas.Index back as a numpy array
> np.asarray(ds.x)
> # array([1.2, 1.3, 1.4], dtype=float16)
> 
> # which is equivalent to
> ds.variables[""x""]._data.__array__()
> # array([1.2, 1.3, 1.4], dtype=float16)
> ```
> 
> The round-trip conversion preserves the original dtype so different execution times may be expected.
> 
> I can't tell much why the results are different (how much are they different?), but I wouldn't be surprised if it's caused by rounding errors accumulated through the computation of a complex formula like haversine.

The differences are larger than I would expect (order of 0.1 in some variables) but could be related to the fact that, when using different precisions, the closest grid points to the target point could change. This would eventually lead to a different value of the variable extracted from the original dataset. 

Unfortunately I didn't have time to verify if it was the case, but I think this is the only valid explanation because the variables of the dataset are untouched. 

It is still puzzling because, as the target points have a precision of e.g. (45.820497820, 13.003510004), I would expect the cast of the dataset coordinates from e.g. (45.8, 13.0) to preserve the 0 (45.800000000, 13.00000000), so that the closest point should not change.

Anyway, I think we're getting off-topic, thanks for the help :) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1381955373
https://github.com/pydata/xarray/issues/7065#issuecomment-1255026304,https://api.github.com/repos/pydata/xarray/issues/7065,1255026304,IC_kwDOAMm_X85Kzi6A,12760310,2022-09-22T13:28:17Z,2022-09-22T13:28:31Z,NONE,"Mmmm that's weird, because the execution time is really different, and it would be hard to explain it if all the arrays are casted to the same `dtype`.

Yeah, for the nearest lookup I already implemented ""my version"" of `BallTree`, but I thought the `sel` method is using that under the hood already...no? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1381955373
https://github.com/pydata/xarray/issues/7065#issuecomment-1254985357,https://api.github.com/repos/pydata/xarray/issues/7065,1254985357,IC_kwDOAMm_X85KzY6N,12760310,2022-09-22T12:56:35Z,2022-09-22T12:56:35Z,NONE,"Sorry, that brings me to another question that I never even considered.

As my latitude and longitude arrays in both datasets have a resolution of 0.1 degrees, wouldn't it make sense to use `np.float16` for both arrays?

From this dataset I'm extracting the closest points to a station inside a user-defined radius, doing something similar to

```python
ds['distances'] = haversine(station['lon'],
                              station['lat'],
                              ds.lon, ds.lat) # haversine is the haversine distance
nearest = ds.where(distances < 20, drop=True).copy()
```
In theory, using a 16 bit precision for the longitude and latitude arrays shouldn't change much, as the original coordinates are not supposed to have more than 0.1 precision, but the final results are still quite different...

The thing is, if I use `float16` I can bring the computation time from 6-7 seconds to 2 seconds.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1381955373
https://github.com/pydata/xarray/issues/7065#issuecomment-1254941693,https://api.github.com/repos/pydata/xarray/issues/7065,1254941693,IC_kwDOAMm_X85KzOP9,12760310,2022-09-22T12:17:10Z,2022-09-22T12:17:10Z,NONE,"@benbovy you have no idea how much time I spent trying to understand what the difference between the two different datasets was....and I completely missed the `dtype` difference.
That could definitely explain the problem. 

The problem is that I tried to merge with `join='override'` but it was still taking a long time. Probably I wasn't using the right order.

Before closing, just a curiosity: in this corner case shouldn't `xarray` cast automatically the `lat`,`lon` coordinate arrays to the same `dtype` or is it a dangerous assumption? 
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1381955373
https://github.com/pydata/xarray/issues/6904#issuecomment-1210383450,https://api.github.com/repos/pydata/xarray/issues/6904,1210383450,IC_kwDOAMm_X85IJPxa,12760310,2022-08-10T09:07:00Z,2022-08-10T09:07:00Z,NONE,"This is a minimal working example that I could come up with. You can try to open any netcdf that you have. 
I tested on a small one and it didn't reproduce the error, so it is definitely only happening with large datasets when the arrays are not loaded into memory. Unfortunately, as you need a large file, I cannot really attach it here. 

```python
import xarray as xr
from tqdm.contrib.concurrent import process_map
import pprint

def main():
    global ds
    ds = xr.open_dataset('input.nc')
    it = range(0, 5)
    results = []
    for i in it:
        results.append(compute(i))
    print(""------------Serial results-----------------"")
    pprint.pprint(results)
    results = process_map(compute, it, max_workers=6, chunksize=1, disable=True)
    print(""------------Parallel results-----------------"")
    pprint.pprint(results)


def compute(station):
    ds_point = ds.isel(lat=0, lon=0)
    return station, ds_point.t_2m_max.mean().item(), ds_point.t_2m_min.mean().item(), ds_point.lon.min().item(), ds_point.lat.min().item()


if __name__ == ""__main__"":
    main()
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1333650265
https://github.com/pydata/xarray/issues/6904#issuecomment-1210349031,https://api.github.com/repos/pydata/xarray/issues/6904,1210349031,IC_kwDOAMm_X85IJHXn,12760310,2022-08-10T08:38:31Z,2022-08-10T08:38:31Z,NONE,"> Re nearest, does it replicate with exact lookups?

Ok, it seems to fail also with exact lookups o.O 
This is extremely weird

I'm using 
```python
def compute():
    ds_point = ds.isel(lat=0, lon=0)
    return ds_point.t_2m_med.mean().item(), ds_point.t_2m_min.mean().item(), ds_point.lon.min().item(), 
                ds_point.lat.min().item()
```

Result for the serial version 

```python
[(
  10.469047546386719,
  6.5044121742248535,
  6.0,
  48.0),
 (
  10.469047546386719,
  6.5044121742248535,
  6.0,
  48.0),
 (
  10.469047546386719,
  6.5044121742248535,
  6.0,
  48.0),
 (
  10.469047546386719,
  6.5044121742248535,
  6.0,
  48.0),
 (
  10.469047546386719,
  6.5044121742248535,
  6.0,
  48.0)]
```
As you would expect all values are the same.

And for the parallel version with EXACTLY the same code

```python
[(
  7.968084812164307,
  6.948009967803955,
  6.0,
  48.0),
 (
  7.825599193572998,
  6.995675563812256,
  6.0,
  48.0),
 (
  8.894186019897461,
  6.849221706390381,
  6.0,
  48.0),
 (
  8.901763916015625,
  6.69615364074707,
  6.0,
  48.0),
 (
  9.164983749389648,
  6.484694480895996,
  6.0,
  48.0)]
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1333650265
https://github.com/pydata/xarray/issues/6904#issuecomment-1210341456,https://api.github.com/repos/pydata/xarray/issues/6904,1210341456,IC_kwDOAMm_X85IJFhQ,12760310,2022-08-10T08:32:13Z,2022-08-10T08:32:13Z,NONE,"> ```python
> , lock=Lock()
> ```

That causes an error

```python
Error 11: Resource temporarily unavailable
```

Here is the complete tracebabk
```python
concurrent.futures.process._RemoteTraceback: 
""""""
Traceback (most recent call last):
  File ""/var/models/miniconda3/lib/python3.8/concurrent/futures/process.py"", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File ""/var/models/miniconda3/lib/python3.8/concurrent/futures/process.py"", line 198, in _process_chunk
    return [fn(*args) for args in chunk]
  File ""/var/models/miniconda3/lib/python3.8/concurrent/futures/process.py"", line 198, in <listcomp>
    return [fn(*args) for args in chunk]
  File ""test_sel_bug.py"", line 58, in compute_clima
    return station, ds_point.t_2m_med.mean().item(), ds_point.t_2m_min.mean().item(), ds_point.lon.min().item(), ds_point.lat.min().item()
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/core/common.py"", line 58, in wrapped_func
    return self.reduce(func, dim, axis, skipna=skipna, **kwargs)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/core/dataarray.py"", line 2696, in reduce
    var = self.variable.reduce(func, dim, axis, keep_attrs, keepdims, **kwargs)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/core/variable.py"", line 1806, in reduce
    data = func(self.data, **kwargs)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/core/variable.py"", line 339, in data
    return self.values
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/core/variable.py"", line 512, in values
    return _as_array_or_item(self._data)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/core/variable.py"", line 252, in _as_array_or_item
    data = np.asarray(data)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/numpy/core/_asarray.py"", line 102, in asarray
    return array(a, dtype, copy=False, order=order)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/core/indexing.py"", line 552, in __array__
    self._ensure_cached()
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/core/indexing.py"", line 549, in _ensure_cached
    self.array = NumpyIndexingAdapter(np.asarray(self.array))
  File ""/var/models/miniconda3/lib/python3.8/site-packages/numpy/core/_asarray.py"", line 102, in asarray
    return array(a, dtype, copy=False, order=order)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/core/indexing.py"", line 522, in __array__
    return np.asarray(self.array, dtype=dtype)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/numpy/core/_asarray.py"", line 102, in asarray
    return array(a, dtype, copy=False, order=order)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/core/indexing.py"", line 423, in __array__
    return np.asarray(array[self.key], dtype=None)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/numpy/core/_asarray.py"", line 102, in asarray
    return array(a, dtype, copy=False, order=order)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/coding/variables.py"", line 70, in __array__
    return self.func(self.array)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/coding/variables.py"", line 137, in _apply_mask
    data = np.asarray(data, dtype=dtype)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/numpy/core/_asarray.py"", line 102, in asarray
    return array(a, dtype, copy=False, order=order)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/core/indexing.py"", line 423, in __array__
    return np.asarray(array[self.key], dtype=None)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/backends/netCDF4_.py"", line 93, in __getitem__
    return indexing.explicit_indexing_adapter(
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/core/indexing.py"", line 712, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
  File ""/var/models/miniconda3/lib/python3.8/site-packages/xarray/backends/netCDF4_.py"", line 106, in _getitem
    array = getitem(original_array, key)
  File ""src/netCDF4/_netCDF4.pyx"", line 4420, in netCDF4._netCDF4.Variable.__getitem__
  File ""src/netCDF4/_netCDF4.pyx"", line 5363, in netCDF4._netCDF4.Variable._get
  File ""src/netCDF4/_netCDF4.pyx"", line 1950, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: Resource temporarily unavailable
""""""
```

I think we may be heading the right direction","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1333650265
https://github.com/pydata/xarray/issues/6904#issuecomment-1210285626,https://api.github.com/repos/pydata/xarray/issues/6904,1210285626,IC_kwDOAMm_X85II346,12760310,2022-08-10T07:41:20Z,2022-08-10T07:41:20Z,NONE,"> > Will that work in the same way if I still use `process_map`, which uses `concurrent.futures` under the hood?
> 
> Yes it should, as long as you're using multi-processing under the covers.
> 
> If you do multi-threading, then you would want to use `threading.Lock()`. But I believe we already apply a thread lock by default.

mmm ok I'll try and let you know.

BTW is there any advantage or difference in terms of cpu and memory consumption in opening the file only one or let it open by every process? I'm asking because I thought opening in every process was just plain stupid but it seems to perform exactly the same, so maybe I'm just creating a problem where there is none","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1333650265
https://github.com/pydata/xarray/issues/6904#issuecomment-1210238864,https://api.github.com/repos/pydata/xarray/issues/6904,1210238864,IC_kwDOAMm_X85IIseQ,12760310,2022-08-10T06:51:18Z,2022-08-10T06:51:18Z,NONE,"> Can you try explicitly passing in a multiprocessing lock into the `open_dataset()` constructor? Something like:
> 
> ```python
> from multiprocessing import Lock
> ds = xarray.open_dataset(file, lock=Lock())
> ```
> 
> (We automatically select appropriate locks if using Dask, but I'm not sure how we would do that more generally...)

ok that's a good shot.
Will that work in the same way if I still use `process_map`, which uses `concurrent.futures` under the hood?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1333650265
https://github.com/pydata/xarray/issues/6904#issuecomment-1210220238,https://api.github.com/repos/pydata/xarray/issues/6904,1210220238,IC_kwDOAMm_X85IIn7O,12760310,2022-08-10T06:30:06Z,2022-08-10T06:30:06Z,NONE,"> Re nearest, does it replicate with exact lookups?

I haven't tried yet because it doesn't really match my use case.
One idea that I had was to provide the list of points before starting the loop, creating an iterator with the slices from the xarray and then pass this to the loop.
But I would end up using more data than necessary because I don't process all cases. 

another thing that I've noticed is that if the list of iterators is smaller than the chunksize everything's good, probably because it reverts to the serial case as only 1 worker is processing ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1333650265
https://github.com/pydata/xarray/issues/6904#issuecomment-1210174583,https://api.github.com/repos/pydata/xarray/issues/6904,1210174583,IC_kwDOAMm_X85IIcx3,12760310,2022-08-10T05:23:13Z,2022-08-10T05:24:24Z,NONE,"> That sounds quite unfriendly!
> 
> A couple of questions to reduce the size of the example, without providing any answers yet unfortunately:
> 
> * Is `process_map` from `tqdm`? Do you get the same behavior from the standard `multiprocessing`?

Yep, and yep (believe me, I've tried anything in desperation 😄)
> * What if we remove `method=nearest`?

Which method should I use then? I need the closest point 
> * Is the file a single netCDF file?

Yep

I can try to make a minimal example, however, in order to reproduce the issue, I think it's necessary to open a large dataset.

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1333650265
https://github.com/pydata/xarray/issues/6879#issuecomment-1206361835,https://api.github.com/repos/pydata/xarray/issues/6879,1206361835,IC_kwDOAMm_X85H557r,12760310,2022-08-05T11:53:15Z,2022-08-05T11:53:30Z,NONE,"> Thanks for the issue.
> 
> I would claim that this is the correct broadcasting behavior.
> 
> You could obtain your required result using
> 
> ```python
> ds_mask = xr.Dataset({""t_2m_min_anom"": mask, ""t_2m_min_anom_stations"": True})
> data.where(ds_mask)
> ```

Hey, thanks for the workaround.
However, I'm still not convinced that this is the ""correct"" behaviour.
If `mask` has as explicit coodinates `lat` and `lon` it should only be applied to variables that have these coordinates.

What is the use case in enlarging the 1-D array to a 3-D array with coordinates that it didn't have before?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1329754426