html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4285#issuecomment-1282946332,https://api.github.com/repos/pydata/xarray/issues/4285,1282946332,IC_kwDOAMm_X85MeDUc,4133310,2022-10-18T20:11:30Z,2022-10-18T20:12:51Z,NONE,"Hi All, Thank you for the detailed discussion and thank you @TomNicholas for pointing it out to me. I read the thread last week and have been digesting it. There are many details that go over my head and will keep re-reading them to develop a better understanding of the problem. Two weeks ago I started working part-time on [CloudDrift](https://github.com/cloud-drift/clouddrift). This is an NSF EarthCube-funded project led by @selipot. @philippemiron was the lead developer in the first year of the project and he laid the foundation of the data structure that we need and example notebooks. The project's purpose is to make working with Lagrangian data (primarily ocean but generalizable to other kinds) easier for scientists who consume such data while also optimizing the storage of such data. This is use case 1 in Tom's list of use cases [here](https://github.com/pydata/xarray/issues/4285#issuecomment-1211197176). Clouddrift currently provides an implementation of a `RaggedArray` class. Once instantiated with user-provided data (a collection of variable-length arrays, either manually or from dataset-specific adapters), this class allows you to get either an `awkward.Array` or an `xarray.Dataset`, and from there store to a parquet file (via awkward) or a NetCDF file (via xarray). On either end (awkward or xarray), you get the indexing convenience that comes with these libraries, and once indexed you get the NumPy set of functionality. So, `RaggedArray` serves as an intermediate structure to get you to an `awkward.Array` or an `xarray.Dataset` representations of the data, but it does not itself wrap either. Other goals of the project include providing example and tutorial notebooks, writing adapters for canonical ocean Lagrangian datasets, writing methods for oceanographic diagnostics, and more general developer/scientist advocacy kind of work. I am very much interested in making our `RaggedArray` class more generally useful in other fields and use cases. I am also interested in designing and implementing it toward a closer integration with xarray, since there seems to be an appetite for that. `clouddrift.RaggedArray` becoming part of xarray (via core or contrib or otherwise) would be a success story for us. However, I will need help from all of you here given your deep understanding of the internals of awkward and xarray to make it work. I'll be paid half of my day-job salary to work on this for the next two years. So, at least you know that somebody will be committing time to it, but again, I will need guidance. What do you think should be the next step? Should we plan a video call to explore options?","{""total_count"": 2, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 2, ""eyes"": 0}",,667864088 https://github.com/pydata/xarray/issues/1809#issuecomment-486390856,https://api.github.com/repos/pydata/xarray/issues/1809,486390856,MDEyOklzc3VlQ29tbWVudDQ4NjM5MDg1Ng==,4133310,2019-04-24T19:22:52Z,2019-04-24T19:24:27Z,NONE,"I can't seem recreate this with a minimal example, `xarray` roundtrips a NetCDF file with a `coordinates` attribute correctly: ```python from netCDF4 import Dataset import xarray as xr with Dataset('test.nc', format='NETCDF4', mode='w') as nc: nc.createDimension('dim1', size=0) var = nc.createVariable('var1', 'f8', dimensions=('dim1')) var[:] = [1., 2., 3.] var.setncattr('coordinates', 'dim1') xr.open_dataset('test.nc').to_netcdf('test2.nc') ``` There is something peculiar about how WRF handles the `coordinates` attribute, but I can't see anything off about it yet. Interestingly, I can workaround the WRF `coordinates` issue by setting `decode_coords=False` in `xarray.open_dataset()`, for example, this works: ```python xr.open_dataset('wrfout_d01_2019-04-16_15_00_00', decode_coords=False).to_netcdf('test.nc') ``` while this doesn't: ```python xr.open_dataset('wrfout_d01_2019-04-16_15_00_00').to_netcdf('test.nc') ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286072335 https://github.com/pydata/xarray/issues/1809#issuecomment-485210441,https://api.github.com/repos/pydata/xarray/issues/1809,485210441,MDEyOklzc3VlQ29tbWVudDQ4NTIxMDQ0MQ==,4133310,2019-04-21T01:09:04Z,2019-04-24T19:24:02Z,NONE,"I ran into this issue trying to roundtrip a WRF output file. It looks like xarray raises an error for any NetCDF file that has variables with a `coordinates` attribute: ```python # These coordinates are saved according to CF conventions for var_name, coord_names in variable_coordinates.items(): attrs = variables[var_name].attrs if 'coordinates' in attrs: raise ValueError('cannot serialize coordinates because variable ' ""%s already has an attribute 'coordinates'"" % var_name) attrs['coordinates'] = ' '.join(map(str, coord_names)) ``` ~~Both this choice, and the proposed solution in this issue (delete all `coordinates` attributes), I don't understand.~~ Variables with a `coordinates` attribute are [CF conforming](http://cfconventions.org/Conformance/conformance.html), so xarray should be able to play along with this. ~~The solution that makes more sense to me is to raise a warning and overwrite or ignore the `coordinates` attribute, if the attribute is already present. Later step of the fix could even be a keyword argument to allow the user to choose whether to overwrite or ignore ""conflicting"" attributes.~~ Or perhaps I'm missing something obvious here... Let me know either way. I'd be happy to make a PR to patch this. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286072335