github: issue_comments: 36 rows where user = 22566757 sorted by updated

36 rows where user = 22566757 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1259913775	https://github.com/pydata/xarray/pull/7080#issuecomment-1259913775	https://api.github.com/repos/pydata/xarray/issues/7080	IC_kwDOAMm_X85LGMIv	DWesl 22566757	2022-09-27T18:47:00Z	2022-09-27T18:47:00Z	CONTRIBUTOR	I think the current default for two-dimensional plots is to try to re-use an existing axis if neither `row` nor `col` are set (implied by documentation for `ax` argument in the description of the `DataArray.plot` descriptor). Would this PR change that behavior, and should that be documented?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix `utils.get_axis` with kwargs 1385143758
1259894192	https://github.com/pydata/xarray/issues/7076#issuecomment-1259894192	https://api.github.com/repos/pydata/xarray/issues/7076	IC_kwDOAMm_X85LGHWw	DWesl 22566757	2022-09-27T18:28:26Z	2022-09-27T18:28:26Z	CONTRIBUTOR	Fix confirmed, thank you.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Can't unstack concatenated DataArrays 1384465119
1115251379	https://github.com/pydata/xarray/issues/6439#issuecomment-1115251379	https://api.github.com/repos/pydata/xarray/issues/6439	IC_kwDOAMm_X85CeWKz	DWesl 22566757	2022-05-02T19:00:47Z	2022-05-02T19:05:19Z	CONTRIBUTOR	Oh, right, you suggested that a bit ago. When I checkout `upstream/main` in my local XArray repository root and run the example, it completes without error. When I fix the example to use the correct dimension, the implicit print on the last line shows the nearly the same as `unstacked_diag` from a few lines earlier. Still not sure what fixed this, but since it's working, I don't care so much. I will wait for this to show up in a release. Thank you!	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Unstacking the diagonals of a sequence of matrices raises ValueError: IndexVariable objects must be 1-dimensional 1192449540
1115127711	https://github.com/pydata/xarray/issues/6439#issuecomment-1115127711	https://api.github.com/repos/pydata/xarray/issues/6439	IC_kwDOAMm_X85Cd3-f	DWesl 22566757	2022-05-02T17:04:13Z	2022-05-02T17:45:56Z	CONTRIBUTOR	Just a tip: You don't need any stacking for that. Just use an indexer with a new dim: I am aware that I can extract the diagonal of the arrays by using the same index for each argument of `isel`. That is, in fact, how I extracted the diagonals in each case above (look for `diag_index` to find the examples). The bit that interests me is unstacking the relevant dimension, because the data in the original case comes to me with, effectively, a stacked dimension, and I would like to turn it back into an unstacked dimension because that is what I am used to using `pcolormesh` to plot. That is to say, skipping the unstacking rather defeats the purpose of what I am trying to do, unless you have suggestions for how to create a two-dimensional plot (one using something like `contourf` or `pcolormesh`) of a one-dimensional `Dataset`, or a series of two-dimensional plots from a two-dimensional `Dataset`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Unstacking the diagonals of a sequence of matrices raises ValueError: IndexVariable objects must be 1-dimensional 1192449540
1112552981	https://github.com/pydata/xarray/issues/2780#issuecomment-1112552981	https://api.github.com/repos/pydata/xarray/issues/2780	IC_kwDOAMm_X85CUDYV	DWesl 22566757	2022-04-28T18:57:26Z	2022-04-28T19:01:34Z	CONTRIBUTOR	I found a way to get the sample dataset to save to a smaller netCDF: ```python import os import numpy as np import numpy.testing as np_tst import pandas as pd import xarray as xr Original example Create pandas DataFrame df = pd.DataFrame( np.random.randint(low=0, high=10, size=(100000, 5)), columns=["a", "b", "c", "d", "e"], ) Make 'e' a column of strings df["e"] = df["e"].astype(str) Make 'f' a column of floats DIGITS = 1 df["f"] = np.around(10 ** DIGITS * np.random.random(size=df.shape[0]), DIGITS) Save to csv df.to_csv("df.csv") Convert to an xarray's Dataset ds = xr.Dataset.from_dataframe(df) Save NetCDF file ds.to_netcdf("ds.nc") Additions def dtype_for_int_array(arry: "array of integers") -> np.dtype: """Find the smallest integer dtype that will encode arry. `Parameters ---------- arry : array of integers The array to compress Returns ------- smallest: dtype The smallest dtype that will represent arry """ largest = max(abs(arry.min()), abs(arry.max())) typecode = "i{bytes:d}".format( bytes=2 np.nonzero( [ np.iinfo("i{bytes:d}".format(bytes=2i)).max >= largest for i in range(4) ] )[0][0] ) return np.dtype(typecode)` def dtype_for_str_array( arry: "xr.DataArray of strings", for_disk: bool = True ) -> np.dtype: """Find a good string dtype for encoding arry. Parameters ---------- arry : xr.DataArray of strings The array to compress for_disk : bool True if meant for encoding argument of to_netcdf() False if meant for in-memory datasets Returns ------- smallest: dtype The smallest dtype that will represent arry """ lengths = arry.str.len() largest = lengths.max() if not for_disk: # Variant for in-memory datasets # Makes dask happier about strings typecode = "S{bytes:d}".format( largest ) else: # Variant for on-disk datasets # 0.2 and 0.6 are both guesses # If there's "a lot" of strings "much shorter than" the longest # use vlen str where available # otherwise use a string concatenation dimension if lengths.quantile(0.2) < 0.6 * largest: typecode = "O" else: typecode = "S1" return np.dtype(typecode) Set up encoding for saving to netCDF encoding = {} for name, var in ds.items(): encoding[name] = {} var_kind = var.dtype.kind # Perhaps we should assume "u" means people know what they're # doing if var_kind in ("u", "i"): dtype = dtype_for_int_array(var) if var_kind == "u": dtype = dtype.replace("i", "u") elif var_kind == "f": finfo = np.finfo(var.dtype) abs_var = np.abs(var) dynamic_range = abs_var.max() / abs_var[abs_var > 0].min() if dynamic_range > 10*finfo.precision: # Dynamic range too high for quantization dtype = var.dtype else: # set scale_factor and add_offset for quantization # Also figure out what dtype compresses best var_min = var.min() var_range = var.max() - var_min mid_range = var_min + var_range / 2 # Rescale to -1 to 1 values_to_compress = (var - mid_range) / (0.5 var_range) # for digits in range(finfo.precision): for digits in (2, 4, 9, 18): if np.allclose( values_to_compress, np.around(values_to_compress, digits), rtol=finfo.precision, ): dtype = var.dtype # Convert digits to integer dtype # digits <= 2 to i1 # digits <= 4 to i2 # digits <= 9 to i4 # digits <= 18 to i8 if digits <= 2: dtype = np.dtype("i1") elif digits <= 4: dtype = np.dtype("i2") elif digits <= 9: dtype = np.dtype("i4") else: dtype = np.dtype("i8") if dtype.itemsize >= var.dtype.itemsize: # Quantization saves space dtype = var.dtype else: # Quantization does not save space storage_iinfo = np.iinfo(dtype) encoding[name]["add_offset"] = mid_range.values encoding[name]["scale_factor"] = ( 2 * var_range / storage_iinfo.max ).values encoding[name]["_FillValue"] = storage_iinfo.min break else: # Quantization would lose information dtype = var.dtype elif var_kind == "O": dtype = dtype_for_str_array(var) else: dtype = var.dtype encoding[name]["dtype"] = dtype ds.to_netcdf("ds_encoded.nc", encoding=encoding) Display results stat_csv = os.stat("df.csv") stat_nc = os.stat("ds.nc") stat_enc = os.stat("ds_encoded.nc") sizes = pd.Series( index=["CSV", "default netCDF", "encoded netCDF"], data=[stats.st_size for stats in [stat_csv, stat_nc, stat_enc]], name="File sizes", ) print("File sizes (kB):", np.right_shift(sizes, 10), sep="\n", end="\n\n") print("Sizes relative to CSV:", sizes / sizes.iloc[0], sep="\n", end="\n\n") Check that I didn't break the floats from_disk = xr.open_dataset("ds_encoded.nc") np_tst.assert_allclose(ds["f"], from_disk["f"], rtol=10-DIGITS, atol=10-DIGITS) bash $ python xarray_auto_small_output.py && ls -sSh .csv .nc File sizes (kB): CSV 1942 default netCDF 10161 encoded netCDF 1375 Name: File sizes, dtype: int64 Sizes relative to CSV: CSV 1.000000 default netCDF 5.230366 encoded netCDF 0.708063 Name: File sizes, dtype: float64 10M ds.nc 1.9M df.csv 1.4M ds_encoded.nc ``` I added a column of floats with one digit before and after the decimal point to the example dataset, because why not. Does this satisfy your use-case? Should I turn the giant loop into a function to go into xarray somewhere? If so, I should probably tie the float handling in with the new `least_significant_digit` feature in netCDF4-python so the data gets read in the same way it was before getting written out.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic dtype encoding in to_netcdf 412180435
1069092987	https://github.com/pydata/xarray/issues/6310#issuecomment-1069092987	https://api.github.com/repos/pydata/xarray/issues/6310	IC_kwDOAMm_X84_uRB7	DWesl 22566757	2022-03-16T12:50:50Z	2022-03-16T12:50:50Z	CONTRIBUTOR	That could work. Are you set up to check that? That can be either a full repository checkout or an XArray installation you can edit.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Only auxiliary coordinates are listed in nc variable attribute 1154014066
1069084130	https://github.com/pydata/xarray/issues/6310#issuecomment-1069084130	https://api.github.com/repos/pydata/xarray/issues/6310	IC_kwDOAMm_X84_uO3i	DWesl 22566757	2022-03-16T12:40:20Z	2022-03-16T12:40:20Z	CONTRIBUTOR	Given this: https://github.com/pydata/xarray/blob/613a8fda4f07181fbc41d6ff2296fec3726fd351/xarray/conventions.py#L782-L783 I think that should be working. This: https://github.com/pydata/xarray/blob/613a8fda4f07181fbc41d6ff2296fec3726fd351/xarray/conventions.py#L770-L779 explicitly says it should, and is probably the part where things go wrong, but it should be going wrong the same way for `encoding` and `attrs`. I think https://github.com/pydata/xarray/blob/613a8fda4f07181fbc41d6ff2296fec3726fd351/xarray/conventions.py#L758-L768 may need to be split into two conditionals, one for `attrs` and one for `encoding`. I'm not sure how to get the `continue` behavior while allowing the code to work for both `attrs` and `encoding` without code duplication.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Only auxiliary coordinates are listed in nc variable attribute 1154014066
1069064616	https://github.com/pydata/xarray/issues/6310#issuecomment-1069064616	https://api.github.com/repos/pydata/xarray/issues/6310	IC_kwDOAMm_X84_uKGo	DWesl 22566757	2022-03-16T12:17:37Z	2022-03-16T12:17:37Z	CONTRIBUTOR	I tried to find what the CF conventions say about including dimension coordinates (I'm using the name from scitools-iris rather than "coordinate variable" as used in the CF conventions to keep myself from getting confused) in the `coordinates` attribute. From what I can tell, the whole document is consistent with usually excluding dimension coordinates from the `coordinates` attribute. Most of the Discrete Sampling Geometry examples in appendix H seem to include the dimension coordinates in the `coordinates` attributes, though at least one example leaves the dimension coordinates implied rather than explicit. From what I remember, XArray is based on the netCDF data model, rather than the CF data model, so initializing `variable_coordinates[var_name] = set(variable.dims)` will do the wrong thing if the dataset doesn't set one or more of its dimension coordinates (example H.2 has variables with dimensions `("station", "time")`, but no variable named `station`. Section 4.5 makes this practice explicit). You could work around this by leaving the initialization as it stands but dropping the `if coordinate_name not in variable.dims` condition on including `coordinate_name` as part of the `coordinates` attribute. Stick to the current logic which might be non-conformal with the CF conventions in case of "Discrete Sampling Geometries". However, users can manually fix this by setting the coordinates in encoding. Based on this, I think doing solution one from the previous post on writing a dataset will always be consistent with CF, but assuming that netCDF files XArray reads into datasets will always follow this pattern would be a problem. I suspect there are tests for reading netCDF files with dimension coordinates included in `coordinates` attributes already, but haven't checked. Implement a logic to recognize cases where a dataset is a "Discrete Sampling Geometry" and only then list the non-auxiliary coordinates in the variable attribute. This is a bit tricky, and I don't have the time to implement this, I'm afraid. If you want to try solution three, almost all Discrete Sampling Geometry files must have a global attribute called `featureType`. Since that attribute is recommended for all Discrete Sampling Geometry files, you could declare that the presence of that attribute defines a Discrete Sampling Geometry file for XArray. However, I don't see any place that says including dimension coordinates in the `coordinates` attribute is required, even for Discrete Sampling Geometry files, and a few places that explicitly say dimension coordinates can be omitted from the `coordinates` attribute, even for Discrete Sampling Geometry files. The references from CF on whether dimension coordinates can be included in the `coordinates` attribute: The fifth paragraph of CF section five says: If the longitude, latitude, vertical or time coordinate is multi-valued, varies in only one dimension, and varies independently of other spatiotemporal coordinates, it is not permitted to store it as an auxiliary coordinate variable. I think this is saying that if you can represent a coordinate using just one dimension, you shouldn't use two (that is, avoid using `np.tile(np.arange(10), (3, 1))` as a longitude coordinate). The other interpretation is that dimension coordinates must not be included in the `coordinates` attribute, which seems unlikely given that three lines later it says: Note that it is permissible, but optional, to list coordinate variables as well as auxiliary coordinate variables in the coordinates attribute. The first paragraph of the section on Discrete sampling geometries: Every element of every feature must be unambiguously associated with its space and time coordinates and with the feature that contains it. The coordinates attribute must be attached to every data variable to indicate the spatiotemporal coordinate variables that are needed to geo-locate the data. I think dimension coordinates are explicit enough to count as "unambiguously associated", even without inclusion in the `coordinates` attribute, since they share a name with one of the dimensions of the Discrete Sampling Geometry data variables. This seems to be made explicit in the fourth paragraph: Auxiliary coordinate variables containing the nominal and the precise positions should be listed in the relevant coordinates attributes of data variables. In orthogonal representations the nominal positions could be coordinate variables, which do not need to be listed in the coordinates attribute, rather than auxiliary coordinate variables.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Only auxiliary coordinates are listed in nc variable attribute 1154014066
866314326	https://github.com/pydata/xarray/issues/5510#issuecomment-866314326	https://api.github.com/repos/pydata/xarray/issues/5510	MDEyOklzc3VlQ29tbWVudDg2NjMxNDMyNg==	DWesl 22566757	2021-06-22T20:33:44Z	2021-06-22T20:37:53Z	CONTRIBUTOR	~`encoding` is where that information is stored between reading a dataset in from disk and saving it back out again.~ `_encode_coordinates` can take a default value from either of `encoding` or `attrs`, but a falsy value will be overwritten. Setting `.attrs["coordinates"] = " "` should work. ```python import numpy as np, xarray as xr data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]}) ds = xr.Dataset({"foo": data, "bar": ("x", [1, 2]), "fake": 10}) ds = ds.assign_coords({"reftime":np.array("2004-11-01T00:00:00", dtype=np.datetime64)}) ds = ds.assign({"test": 1}) ds.test.encoding["coordinates"] = " " ds.to_netcdf("file.nc") bash $ ncdump -h file.nc netcdf file { dimensions: x = 2 ; y = 3 ; variables: int64 x(x) ; double foo(x, y) ; foo:_FillValue = NaN ; foo:coordinates = "reftime" ; int64 bar(x) ; bar:coordinates = "reftime" ; int64 fake ; fake:coordinates = "reftime" ; int64 reftime ; reftime:units = "days since 2004-11-01 00:00:00" ; reftime:calendar = "proleptic_gregorian" ; int64 test ; test:coordinates = " " ; } `` As mentioned above, the XArray data model associates coordinates with dimensions rather than with variables, so any time you read the dataset back in again, thetest`variable will gain`reftime`as a coordinate, because the dimensions of`reftime`(`()`), are a subset of the dimensions of`test`(also`()`). Not producing a `coordinates` attribute for variables mentioned in another variable's `bounds` attribute (or a few other attributes, for that matter) would be entirely doable within the function linked above, and should be straightforward if you want to make a PR for that. Making `realization` and the bounds show up in `ds.coords` rather than `ds.data_vars` may also skip setting the `coordinates` attribute, though I'm less sure of that. It would, however, add `realization` to the `coordinates` attributes of every other `data_var` unless you overrode that, which may not be what you want.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Can't remove coordinates attribute from DataArrays 927336712
778717611	https://github.com/pydata/xarray/pull/2844#issuecomment-778717611	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDc3ODcxNzYxMQ==	DWesl 22566757	2021-02-14T03:35:55Z	2021-02-14T03:35:55Z	CONTRIBUTOR	~Does anyone know why the `xr.open_dataset(....)` call is echoed in the warning message. Is this intentional? Cc @dcherian @DWesl~ It seems you've already figured this out, but for anyone else with this question, the repeat of the call on that file is part of the warning that the file does not have all the variables the attributes refer to. You can fix this by recreating the file with the listed variables added (`areacella`, or by deleting the attribute from the variables (`cell_measures`). You can also ignore the warning using the machinery in the `warnings` module.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
778629061	https://github.com/pydata/xarray/pull/2844#issuecomment-778629061	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDc3ODYyOTA2MQ==	DWesl 22566757	2021-02-13T14:46:25Z	2021-02-13T14:46:25Z	CONTRIBUTOR	I think this looks good.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
761842344	https://github.com/pydata/xarray/pull/2844#issuecomment-761842344	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDc2MTg0MjM0NA==	DWesl 22566757	2021-01-17T16:48:39Z	2021-01-17T16:48:39Z	CONTRIBUTOR	Looks good to me. I was wondering where those docstrings were.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
670778071	https://github.com/pydata/xarray/issues/4121#issuecomment-670778071	https://api.github.com/repos/pydata/xarray/issues/4121	MDEyOklzc3VlQ29tbWVudDY3MDc3ODA3MQ==	DWesl 22566757	2020-08-07T23:07:14Z	2020-08-17T13:09:27Z	CONTRIBUTOR	2844 used to move these variables to `ds.coords` rather than `ds.data_vars`, and allowed saving of `ancillary_variables` via the `encoding` attribute. It was decided to drop that since `ancillary_variables` are linked to variables rather than dimensions like most of the other CF attributes. The specific behavior mentioned in the original post (describing `ancillary_variables` in the output) might work better in `cf-xarray`'s `ds.cf.describe` method.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	decode_cf doesn't work for ancillary_variables in attributes 630573329
670996109	https://github.com/pydata/xarray/pull/2844#issuecomment-670996109	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDY3MDk5NjEwOQ==	DWesl 22566757	2020-08-09T02:17:07Z	2020-08-09T16:36:12Z	CONTRIBUTOR	That's two people with that view so I made the change. Again, I feel that the quality flags are essentially meaningless on their own, useful primarily in the context of their associated variables, like the items currently put in the XArray `coords` attribute, which, admittedly, is only those variables identified by CF as dimension or auxiliary coordinates at the moment, and should remain associated with the relevant variable even if it is extracted into a `DataArray`. Since all of the other people who have opinions on the matter seem to disagree with me, I changed the code to preserve the present behavior with regards to `ancillary_variables`. I can always monkey-patch it back in if it really bothers me, or add a `Dataset.__getitem__` wrapper to `xarray-contrib/cf-xarray` so that the `ancillary_variables` stay associated when I pull variables out, or move back to `SciTools/iris`. On a related note, I should probably check whether this breaks conversion to an `iris.Cube`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
670816691	https://github.com/pydata/xarray/pull/2844#issuecomment-670816691	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDY3MDgxNjY5MQ==	DWesl 22566757	2020-08-08T03:25:17Z	2020-08-08T03:53:39Z	CONTRIBUTOR	You are correct; `ancillary_variables` is neither `grid_mapping` or `bounds`. My personal view is that the quality information should stay with the variable it describes unless explicitly dropped; I think your view is that quality information can always be extracted from the original dataset, and that no variable should carry quality information for a different variable. At this point it would be simple to remove `ancillary_variables` from the attributes processed by this PR. There was a suggestion earlier of adding a `decode_aux_vars` argument to control the new behavior as a means of avoiding back-compatibility breaks like this one. I will leave that as a question for the maintainers; there is also some related discussion at #4215. I should point out that a similar situation arises for `grid_mapping`; `ds.coords["time"]` will include the `grid_mapping` variable in its coordinates. In contrast, `ds.coords["x"]` will not include the bounds for the `x` variable, since it has more dimensions than `ds.coords["x"]`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
670765806	https://github.com/pydata/xarray/pull/2844#issuecomment-670765806	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDY3MDc2NTgwNg==	DWesl 22566757	2020-08-07T22:29:20Z	2020-08-07T22:29:20Z	CONTRIBUTOR	The MinimumVersionsPolicy error appears to be a series of internal `conda` errors, and is probably unrelated.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
670730008	https://github.com/pydata/xarray/pull/2844#issuecomment-670730008	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDY3MDczMDAwOA==	DWesl 22566757	2020-08-07T22:02:47Z	2020-08-07T22:02:47Z	CONTRIBUTOR	pydata/xarray-data#19	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
667744209	https://github.com/pydata/xarray/pull/2844#issuecomment-667744209	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDY2Nzc0NDIwOQ==	DWesl 22566757	2020-08-03T00:14:24Z	2020-08-03T19:42:02Z	CONTRIBUTOR	The `rasm` dataset has coordinates `xc` and `yc`, which reference bounds `xv` and `yv` respectively, which I do not see in the variable list with `decode_coords=False`. It would appear that pydata/xarray-data#4 did not include the bounds in the updated dataset when adding coordinates to `rasm.nc`, so this warning is correct. I do not know that file, so I'm probably not the best person to add bounds. Should I wait for an update to `pydata/xarray-data`, or should I ask sphinx to ignore the warning? Another option is to just delete the `bounds` attributes of `xc` and `yc` in `rasm.nc`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
667737160	https://github.com/pydata/xarray/pull/2844#issuecomment-667737160	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDY2NzczNzE2MA==	DWesl 22566757	2020-08-02T23:13:26Z	2020-08-02T23:13:26Z	CONTRIBUTOR	The example the doc build doesn't like: ```python ds = xr.tutorial.load_dataset("rasm") ds.to_zarr("rasm.zarr", mode="w") import zarr zgroup = zarr.open("rasm.zarr") print(zgroup.tree()) dict(zgroup["Tair"].attrs) `` I'll need to look into therasm` dataset to figure out why there is a warning now.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
658329779	https://github.com/pydata/xarray/issues/4215#issuecomment-658329779	https://api.github.com/repos/pydata/xarray/issues/4215	MDEyOklzc3VlQ29tbWVudDY1ODMyOTc3OQ==	DWesl 22566757	2020-07-14T18:07:05Z	2020-07-14T18:07:05Z	CONTRIBUTOR	`formula_terms` is another attribute with variable names, although it requires a bit more parsing. Question: Should we allow `decode_coords` to control whether variables mentioned in these attributes are set as coordinate variables? I don't think this is necessary. It's easy to explicitly set or reset coordinates afterwards if desired. Is that "putting the variables in these attributes in `coords` is out of scope for XArray" or "putting the variables in these attributes in `coords` is out of scope for `decode_coords`" or something else? I would say no however to ancillary_variables, since those are not really about coordinates and instead about linked data variables (like uncertainties). I tend to think of uncertainties and status flags as important for the interpretation of the associated variables that should stay with the data variables unless a decision is explicitly made to drop them. On the other hand, since XArray seems to associate coordinates with dimensions rather than with variables, I can see why this might be less than desirable. This argument would also apply to `grid_mapping`. My one concern with #2844 is clarifying the role of `encoding` vs. `attrs`. I think we should probably ensure that xarray always propagates `encoding` exactly like how it propagates `attrs`. Should this be part of #2844 or should preserving `encoding` be a separate PR?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	setting variables named in CF attributes as coordinate variables 654889988
497948836	https://github.com/pydata/xarray/pull/2844#issuecomment-497948836	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDQ5Nzk0ODgzNg==	DWesl 22566757	2019-06-01T14:20:08Z	2020-07-14T15:55:17Z	CONTRIBUTOR	On 5/31/2019 11:50 AM, dcherian wrote: It isn't just MetPy though. I'm sure there's existing code relying on adding \|grid_mapping\| and \|bounds\| to \|attrs\| in order to write CF-compliant files. So there's a (potentially big) backward compatibility issue. This becomes worse if in the future we keep interpreting more CF attributes and moving them to \|encoding\| :/. At present, the proper, CF-compliant way to do this is to have both \|grid_mapping\| and \|bounds\| variables in \|data_vars\|, and maintain the attributes yourself, including making sure the variables get copied into the result after relevant \|ds[var_name]\| and \|ds.sel(axis=bounds)\| operations. If you decide to move these variables to \|coords\|, the \|bounds\| variables will still get dropped on any subsetting operation, including those where the relevant axis was retained, the \|grid_mapping\| variables will be included in the result of all subsetting operations (including pulling out, for example, a time coordinate), and both will be included in some \|coordinates\| attribute when written to disk, breaking CF compliance. This PR only really addresses getting these variables in \|coords\| initially and keeping them out of the global \|coordinates\| attribute when writing to disk. `Since I'm doing this primarily to get grid_mapping and bounds variables out of ds.data_vars.` I'm +1 on this but I wonder whether saving them in \|attrs\| and using that information when encoding coordinates would be the more pragmatic choice. You have a point about \|grid_mapping\|, but applying the MetPy approach of saving the information in another, more directly useful format (\|cartopy.Projection\| instances) immediately after loading the file would be a way around that. For \|bounds\|, I think \|pd.PeriodIndex\| would be the most natural representation for time, and \|pd.IntervalIndex\| for most other 1-D cases, but that still leaves \|bounds\| for two-or-more-dimensional coordinates. That's a design choice I'll leave to the maintainers. We could define \|encoding\| as containing a specified set of CF attributes that control on-disk representation such as \|units\|, \|scale_factor\|, \|contiguous\| etc. and leaving everything else in \|attrs\|. A full list of attributes that belong in \|encoding\| could be in the docs so that downstream packages can fully depend on this behaviour. Currently I see \|coordinates\| is interpreted and moved to \|encoding\|. In the above proposal, this would be left in \|attrs\| but its value would still be interpreted if \|decode_coords=True\|. What do you think? At present, \|set(ds[var_name].attrs["coordinates"].split())\| and \|set(ds[var_name].coords) - set(ds[var_name].indexes[dim_name])\| would be identical, since the \|coordinates\| attribute is essentially computed from the second expression on write. Do you have a use case in mind where you need specifically the list of CF auxiliary coordinates, or is that just an example of something that would change under the new proposal? I assume \|units\| would be moved to \|encoding\| only for \|datetime64[ns]\| and \|timedelta64[ns]\| variables.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
644405067	https://github.com/pydata/xarray/pull/2844#issuecomment-644405067	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDY0NDQwNTA2Nw==	DWesl 22566757	2020-06-15T21:40:49Z	2020-06-15T21:40:49Z	CONTRIBUTOR	This PR currently puts `grid_mapping` and `bounds` in `encoding` once it is done with them. Is that where XArray wants to put them, or should they be somewhere else?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
633296515	https://github.com/pydata/xarray/issues/2780#issuecomment-633296515	https://api.github.com/repos/pydata/xarray/issues/2780	MDEyOklzc3VlQ29tbWVudDYzMzI5NjUxNQ==	DWesl 22566757	2020-05-24T20:45:43Z	2020-05-24T20:45:43Z	CONTRIBUTOR	For the example given, this would mean finding `largest = max(abs(ds.min()), abs(ds.max()))` and finding the first integer dtype wide enough to write that: `[np.iinfo("i{bytes:d}".format(bytes=2 i)).max >= largest for i in range(4)]` would help there. The function below should help with this; I would tend to use this at array creation time rather than at save time so you get these benefits in memory as well as on disk. For the character/string variables, the smallest representation varies a bit more: a fixed-width encoding (`dtype=S6`) will probably be smaller if all the strings are about the same size, while variable-width strings are probably smaller if there are many short strings and only a few long strings. If you happen to know that a given field is a five-character identifier or a one-character status code, you can again set these types to be used in memory (which I think makes dask happier when it comes time to save), while free-form survey responses will likely be better as a variable-length string. It may be possible use the distribution of string lengths (perhaps using numpy.char.str_len) to see whether most of the strings are at least 90% as long as the longest, but it's probably simpler to test. Doing this correctly for floating-point types would be difficult, but I think that's outside the scope of this issue. Hopefully this gives you something to work with. ```python import numpy as np def dtype_for_int_array(arry: "array of integers") -> np.dtype: """Find the smallest integer dtype that will encode arry. `Parameters ---------- arry : array of integers The array to compress Returns ------- smallest: dtype The smallest dtype that will represent arry """ largest = max(abs(arry.min()), abs(arry.max())) typecode = "i{bytes:d}".format( bytes=2 np.nonzero([ np.iinfo("i{bytes:d}".format(bytes=2 ** i)).max >= largest for i in range(4) ])[0][0] ) return np.dtype(typecode)` ``` Looking at `df.memory_usage()` will explain why I do this early. If I extend your example with this new function, I see the following: ```python df_small = df.copy() for col in df_small: ... df_small[col] = df_small[col].astype( ... dtype_for_int_array(df_small[col]) if df_small[col].dtype.kind == "i" else "S1" ... ) ... df_small.memory_usage() Index 80 a 100000 b 100000 c 100000 d 100000 e 800000 dtype: int64 df.memory_usage() Index 80 a 800000 b 800000 c 800000 d 800000 e 800000 dtype: int64 ``` It looks like pandas always uses object dtype for string arrays, so the numbers in that column likely reflect the size of an array of pointers. XArray lets you use a dtype of "S1" or "U1", but I haven't found the equivalent of the `memory_usage` method.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Automatic dtype encoding in to_netcdf 412180435
633253434	https://github.com/pydata/xarray/pull/2844#issuecomment-633253434	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDYzMzI1MzQzNA==	DWesl 22566757	2020-05-24T16:09:04Z	2020-05-24T16:09:04Z	CONTRIBUTOR	Should I change this to put `grid_mapping` and `bounds` back in `attrs`, or should I leave them in `encoding`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
633251217	https://github.com/pydata/xarray/issues/4068#issuecomment-633251217	https://api.github.com/repos/pydata/xarray/issues/4068	MDEyOklzc3VlQ29tbWVudDYzMzI1MTIxNw==	DWesl 22566757	2020-05-24T15:53:10Z	2020-05-24T15:53:10Z	CONTRIBUTOR	For others reading this issue, the h5netcdf workaround was discussed in #3297, with further discussion on supporting complex numbers in netCDF in cf-convention/cf-conventions#204. The short version: `engine="h5netcdf", invalid_netcdf=True` will save these files, but the netCDF-C library doesn't understand the result. Reading with `engine="h5netcdf"` may be able to round-trip these files, but I haven't checked that. There is a longer discussion of why netCDF-C doesn't understand these files at Unidata/netcdf-c#267. That specific issue is for booleans, but complex numbers are likely the same.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	utility function to save complex values as a netCDF file 619347681
597375929	https://github.com/pydata/xarray/pull/2844#issuecomment-597375929	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDU5NzM3NTkyOQ==	DWesl 22566757	2020-03-10T23:54:41Z	2020-03-10T23:54:41Z	CONTRIBUTOR	I think the choice is between `attrs` and `encoding`, not both. If it helps lean your decision one way or the other, `attrs` tends to stay associated with `Dataset`s through more operations than `encoding`, so `parse_cf()` would have to be called fairly soon after opening if the information ends up in `encoding`, while putting it in `attrs` gives users a bit more time for that.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
587466776	https://github.com/pydata/xarray/issues/3689#issuecomment-587466776	https://api.github.com/repos/pydata/xarray/issues/3689	MDEyOklzc3VlQ29tbWVudDU4NzQ2Njc3Ng==	DWesl 22566757	2020-02-18T13:44:27Z	2020-02-18T13:44:27Z	CONTRIBUTOR	`bounds` and `grid_mapping`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Decode CF bounds to coords 548607657
587466093	https://github.com/pydata/xarray/pull/2844#issuecomment-587466093	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDU4NzQ2NjA5Mw==	DWesl 22566757	2020-02-18T13:43:12Z	2020-02-18T13:43:12Z	CONTRIBUTOR	The test failures seem to all be due to recent changes in `cftime`/`CFTimeIndex`, which I haven't touched. Is sticking the `grid_mapping` and `bounds` attributes in `encoding` good, or should I put them back in `attrs`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
586273656	https://github.com/pydata/xarray/pull/2844#issuecomment-586273656	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDU4NjI3MzY1Ng==	DWesl 22566757	2020-02-14T12:47:06Z	2020-02-14T12:47:06Z	CONTRIBUTOR	I just noticed pandas.PeriodIndex would be an alternative to pandas.IntervalIndex for time data if which side the interval is closed on is largely irrelevant for such data. Is there an interest in using these for 1D coordinates with bounds? I think `ds.groupby_bins()` already returns an `IntervalIndex`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
586261327	https://github.com/pydata/xarray/pull/3724#issuecomment-586261327	https://api.github.com/repos/pydata/xarray/issues/3724	MDEyOklzc3VlQ29tbWVudDU4NjI2MTMyNw==	DWesl 22566757	2020-02-14T12:07:21Z	2020-02-14T12:07:21Z	CONTRIBUTOR	Not yet, at least: https://github.com/pydata/xarray/network/dependents GitHub points my projects using XArray at https://github.com/thadncs/https-github.com-pydata-xarray rather than this repository, There seem to be a decent number of repositories there: https://github.com/thadncs/https-github.com-pydata-xarray/network/dependents I have no idea why GitHub shifted them, nor what to do about it.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	setuptools-scm (3) 555752381
497566742	https://github.com/pydata/xarray/pull/2844#issuecomment-497566742	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDQ5NzU2Njc0Mg==	DWesl 22566757	2019-05-31T04:00:17Z	2019-05-31T04:00:17Z	CONTRIBUTOR	Switched to use `in` rather than `is not None`. Re: `grid_mapping` in `.encoding` not `.attrs` MetPy assumes `grid_mapping` will be in `.attrs`. Since the xarray documentation mentions this capability, should I be making concurrent changes to MetPy to allow this to continue? If so, would it be sufficient to change their `.attrs` references to `.encoding` and mentioning in both sets of documentation that the user should call `ds.metpy.parse_cf()` immediately after loading to ensure the information is available for MetPy to use? I don't entirely understand the accessor API.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
497558317	https://github.com/pydata/xarray/pull/2844#issuecomment-497558317	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDQ5NzU1ODMxNw==	DWesl 22566757	2019-05-31T03:04:06Z	2019-05-31T03:13:53Z	CONTRIBUTOR	This is briefly mentioned above, in https://github.com/pydata/xarray/pull/2844#discussion_r270595609 The rationale was that everywhere else xarray uses CF attributes for something, the original values of those attributes are recorded in `var.encoding`, not `var.attrs`, and consistency across a code base is a good thing. Since I'm doing this primarily to get `grid_mapping` and `bounds` variables out of `ds.data_vars`, I don't have strong opinions on the subject. If you feel strongly to the contrary, there's an idea at the top of this thread for getting `bounds` information encoded in terms xarray already uses in some cases (`Dataset.groupby_bins()`), and the diffs for this PR should help you figure out what needs changing to support this. For `grid_mapping` there's http://xarray.pydata.org/en/latest/weather-climate.html#cf-compliant-coordinate-variables which is enough for my uses.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
478290053	https://github.com/pydata/xarray/pull/2844#issuecomment-478290053	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDQ3ODI5MDA1Mw==	DWesl 22566757	2019-03-30T21:17:17Z	2019-03-30T21:17:17Z	CONTRIBUTOR	I can shift this to use encoding only, but I'm having trouble figuring out where that code would go. Would the preferred path be to create VariableCoder classes for each and add them to encode_cf_variable, then add tests to xarray.tests.test_coding?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
478248763	https://github.com/pydata/xarray/pull/2843#issuecomment-478248763	https://api.github.com/repos/pydata/xarray/issues/2843	MDEyOklzc3VlQ29tbWVudDQ3ODI0ODc2Mw==	DWesl 22566757	2019-03-30T14:04:12Z	2019-03-30T14:04:12Z	CONTRIBUTOR	I just checked and can't find that section of the documentation now, so that seems to be consistent. I suppose that's a vote for "be sure to check current behavior before submitting old packages". I'll change my code to this new method then. Thanks	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow passing _FillValue=False in encoding for vlen str variables. 424262546
476586154	https://github.com/pydata/xarray/pull/2844#issuecomment-476586154	https://api.github.com/repos/pydata/xarray/issues/2844	MDEyOklzc3VlQ29tbWVudDQ3NjU4NjE1NA==	DWesl 22566757	2019-03-26T11:31:05Z	2019-03-26T11:31:05Z	CONTRIBUTOR	Related to #1475 and #2288 , but this is just keeping the metadata consistent where already present, not extending the data model to include bounds, cells, or projections. I should add a test to ensure saving still works if the bounds are lost when pulling out variables.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Read grid mapping and bounds as coords 424265093
309484883	https://github.com/pydata/xarray/pull/814#issuecomment-309484883	https://api.github.com/repos/pydata/xarray/issues/814	MDEyOklzc3VlQ29tbWVudDMwOTQ4NDg4Mw==	DWesl 22566757	2017-06-19T15:58:27Z	2017-06-19T15:58:27Z	CONTRIBUTOR	If you're still looking for the old tests, it looks like they disappeared in the last merge commit, f48de5.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray to and from iris 145140657

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

36 rows where user = 22566757 sorted by updated_at descending

Original example

Create pandas DataFrame

Make 'e' a column of strings

Make 'f' a column of floats

Save to csv

Convert to an xarray's Dataset

Save NetCDF file

Additions

Set up encoding for saving to netCDF

Display results

Check that I didn't break the floats

2844 used to move these variables to ds.coords rather than ds.data_vars, and allowed saving of ancillary_variables via the encoding attribute. It was decided to drop that since ancillary_variables are linked to variables rather than dimensions like most of the other CF attributes.

Advanced export

2844 used to move these variables to `ds.coords` rather than `ds.data_vars`, and allowed saving of `ancillary_variables` via the `encoding` attribute. It was decided to drop that since `ancillary_variables` are linked to variables rather than dimensions like most of the other CF attributes.