html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/7065#issuecomment-1255073449,https://api.github.com/repos/pydata/xarray/issues/7065,1255073449,IC_kwDOAMm_X85Kzuap,4160723,2022-09-22T14:04:22Z,2022-09-22T14:05:56Z,MEMBER,"Actually there's another conversion when you reuse an xarray dimension coordinate in array-like computations: ```python ds = xr.Dataset(coords={""x"": np.array([1.2, 1.3, 1.4], dtype=np.float16)}) # coordinate data is a wrapper around a pandas.Index object # (it keeps track of the original array dtype) ds.variables[""x""]._data # PandasIndexingAdapter(array=Float64Index([1.2001953125, 1.2998046875, 1.400390625], dtype='float64', name='x'), dtype=dtype('float16')) # This coerces the pandas.Index back as a numpy array np.asarray(ds.x) # array([1.2, 1.3, 1.4], dtype=float16) # which is equivalent to ds.variables[""x""]._data.__array__() # array([1.2, 1.3, 1.4], dtype=float16) ``` The round-trip conversion preserves the original dtype so different execution times may be expected. I can't tell much why the results are different (how much are they different?), but I wouldn't be surprised if it's caused by rounding errors accumulated through the computation of a complex formula like haversine.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1381955373 https://github.com/pydata/xarray/issues/7065#issuecomment-1255014363,https://api.github.com/repos/pydata/xarray/issues/7065,1255014363,IC_kwDOAMm_X85Kzf_b,4160723,2022-09-22T13:19:23Z,2022-09-22T13:19:23Z,MEMBER,"> As my latitude and longitude arrays in both datasets have a resolution of 0.1 degrees, wouldn't it make sense to use np.float16 for both arrays? I don't think so (at least not currently). The numpy arrays are by default converted to `pandas.Index` objects for each dimension coordinate, and for floats there's only `pandas.Float64Index`. It looks like it will be depreciated in favor of `pandas.NumericIndex` that supports more dtypes, but still [I don't see support for 16 bits floats](https://github.com/pandas-dev/pandas/blob/main/pandas/core/indexes/numeric.py#L95-L108). Regarding your nearest lat/lon point data selection problem, this is something that could probably be better solved using more specific (custom) indexes like the ones available in [xoak](https://xoak.readthedocs.io/en/latest/). Xoak only supports point-wise selection at the moment, though. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1381955373 https://github.com/pydata/xarray/issues/7065#issuecomment-1254983291,https://api.github.com/repos/pydata/xarray/issues/7065,1254983291,IC_kwDOAMm_X85KzYZ7,4160723,2022-09-22T12:54:43Z,2022-09-22T12:54:43Z,MEMBER,"> The problem is that I tried to merge with join='override' but it was still taking a long time. Probably I wasn't using the right order. Not 100% sure but maybe `xr.merge` loads all the data from your datasets and performs some equality checks. Perhaps you could see how much time it takes after loading all the data, or try different `xr.merge(compat=)` values? > Before closing, just a curiosity: in this corner case shouldn't xarray cast automatically the lat,lon coordinate arrays to the same dtype or is it a dangerous assumption? We already do this for label indexers that are passed to `.sel()`. However, for alignment I think that it would require re-building an index for every cast coordinate, which may be expensive and is probably no ideal if done automatically. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1381955373 https://github.com/pydata/xarray/issues/7065#issuecomment-1254862548,https://api.github.com/repos/pydata/xarray/issues/7065,1254862548,IC_kwDOAMm_X85Ky67U,4160723,2022-09-22T10:58:10Z,2022-09-22T10:58:36Z,MEMBER,"Hi @guidocioni. I see that the longitude and latitude coordinates both have different `dtype` in the two input datasets, which likely explains why you have many NaNs and larger sizes (almost 2x) for the `lat` and `lon` dimensions in the resulting dataset. Here's a small reproducible example: ```python import numpy as np import xarray as xr lat = np.random.uniform(0, 40, size=100) lon = np.random.uniform(0, 180, size=100) ds1 = xr.Dataset( coords={""lon"": lon.astype(np.float32), ""lat"": lat.astype(np.float32)} ) ds2 = xr.Dataset( coords={""lon"": lon, ""lat"": lat} ) ds1.indexes[""lat""].equals(ds2.indexes[""lat""]) # False xr.merge([ds1, ds2], join=""exact"") # ValueError: cannot align objects with join='exact' where index/labels/sizes # are not equal along these coordinates (dimensions): 'lon' ('lon',) ``` If coordinates labels differ only by their encoding, you could use `xr.merge([ds1, ds2], join=""override"")`, which will take the coordinates from the 1st object.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1381955373