home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

36 rows where user = 22566757 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 14

  • Read grid mapping and bounds as coords 19
  • Only auxiliary coordinates are listed in nc variable attribute 3
  • Automatic dtype encoding in to_netcdf 2
  • Unstacking the diagonals of a sequence of matrices raises ValueError: IndexVariable objects must be 1-dimensional 2
  • xarray to and from iris 1
  • Allow passing _FillValue=False in encoding for vlen str variables. 1
  • Decode CF bounds to coords 1
  • setuptools-scm (3) 1
  • utility function to save complex values as a netCDF file 1
  • decode_cf doesn't work for ancillary_variables in attributes 1
  • setting variables named in CF attributes as coordinate variables 1
  • Can't remove coordinates attribute from DataArrays 1
  • Can't unstack concatenated DataArrays 1
  • Fix `utils.get_axis` with kwargs 1

user 1

  • DWesl · 36 ✖

author_association 1

  • CONTRIBUTOR 36
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1259913775 https://github.com/pydata/xarray/pull/7080#issuecomment-1259913775 https://api.github.com/repos/pydata/xarray/issues/7080 IC_kwDOAMm_X85LGMIv DWesl 22566757 2022-09-27T18:47:00Z 2022-09-27T18:47:00Z CONTRIBUTOR

I think the current default for two-dimensional plots is to try to re-use an existing axis if neither row nor col are set (implied by documentation for ax argument in the description of the DataArray.plot descriptor). Would this PR change that behavior, and should that be documented?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix `utils.get_axis` with kwargs 1385143758
1259894192 https://github.com/pydata/xarray/issues/7076#issuecomment-1259894192 https://api.github.com/repos/pydata/xarray/issues/7076 IC_kwDOAMm_X85LGHWw DWesl 22566757 2022-09-27T18:28:26Z 2022-09-27T18:28:26Z CONTRIBUTOR

Fix confirmed, thank you.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Can't unstack concatenated DataArrays 1384465119
1115251379 https://github.com/pydata/xarray/issues/6439#issuecomment-1115251379 https://api.github.com/repos/pydata/xarray/issues/6439 IC_kwDOAMm_X85CeWKz DWesl 22566757 2022-05-02T19:00:47Z 2022-05-02T19:05:19Z CONTRIBUTOR

Oh, right, you suggested that a bit ago.

When I checkout upstream/main in my local XArray repository root and run the example, it completes without error. When I fix the example to use the correct dimension, the implicit print on the last line shows the nearly the same as unstacked_diag from a few lines earlier.

Still not sure what fixed this, but since it's working, I don't care so much. I will wait for this to show up in a release. Thank you!

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unstacking the diagonals of a sequence of matrices raises ValueError: IndexVariable objects must be 1-dimensional 1192449540
1115127711 https://github.com/pydata/xarray/issues/6439#issuecomment-1115127711 https://api.github.com/repos/pydata/xarray/issues/6439 IC_kwDOAMm_X85Cd3-f DWesl 22566757 2022-05-02T17:04:13Z 2022-05-02T17:45:56Z CONTRIBUTOR

Just a tip: You don't need any stacking for that. Just use an indexer with a new dim:

I am aware that I can extract the diagonal of the arrays by using the same index for each argument of isel. That is, in fact, how I extracted the diagonals in each case above (look for diag_index to find the examples).

The bit that interests me is unstacking the relevant dimension, because the data in the original case comes to me with, effectively, a stacked dimension, and I would like to turn it back into an unstacked dimension because that is what I am used to using pcolormesh to plot.

That is to say, skipping the unstacking rather defeats the purpose of what I am trying to do, unless you have suggestions for how to create a two-dimensional plot (one using something like contourf or pcolormesh) of a one-dimensional Dataset, or a series of two-dimensional plots from a two-dimensional Dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unstacking the diagonals of a sequence of matrices raises ValueError: IndexVariable objects must be 1-dimensional 1192449540
1112552981 https://github.com/pydata/xarray/issues/2780#issuecomment-1112552981 https://api.github.com/repos/pydata/xarray/issues/2780 IC_kwDOAMm_X85CUDYV DWesl 22566757 2022-04-28T18:57:26Z 2022-04-28T19:01:34Z CONTRIBUTOR

I found a way to get the sample dataset to save to a smaller netCDF: ```python import os

import numpy as np import numpy.testing as np_tst import pandas as pd import xarray as xr

Original example

Create pandas DataFrame

df = pd.DataFrame( np.random.randint(low=0, high=10, size=(100000, 5)), columns=["a", "b", "c", "d", "e"], )

Make 'e' a column of strings

df["e"] = df["e"].astype(str)

Make 'f' a column of floats

DIGITS = 1 df["f"] = np.around(10 ** DIGITS * np.random.random(size=df.shape[0]), DIGITS)

Save to csv

df.to_csv("df.csv")

Convert to an xarray's Dataset

ds = xr.Dataset.from_dataframe(df)

Save NetCDF file

ds.to_netcdf("ds.nc")

Additions

def dtype_for_int_array(arry: "array of integers") -> np.dtype: """Find the smallest integer dtype that will encode arry.

Parameters
----------
arry : array of integers
    The array to compress

Returns
-------
smallest: dtype
    The smallest dtype that will represent arry
"""
largest = max(abs(arry.min()), abs(arry.max()))
typecode = "i{bytes:d}".format(
    bytes=2
    ** np.nonzero(
        [
            np.iinfo("i{bytes:d}".format(bytes=2**i)).max >= largest
            for i in range(4)
        ]
    )[0][0]
)
return np.dtype(typecode)

def dtype_for_str_array( arry: "xr.DataArray of strings", for_disk: bool = True ) -> np.dtype: """Find a good string dtype for encoding arry.

Parameters
----------
arry : xr.DataArray of strings
    The array to compress
for_disk : bool
    True if meant for encoding argument of to_netcdf()
    False if meant for in-memory datasets

Returns
-------
smallest: dtype
    The smallest dtype that will represent arry
"""
lengths = arry.str.len()
largest = lengths.max()

if not for_disk:
    # Variant for in-memory datasets
    # Makes dask happier about strings
    typecode = "S{bytes:d}".format(
        largest
    )
else:
    # Variant for on-disk datasets
    # 0.2 and 0.6 are both guesses
    # If there's "a lot" of strings "much shorter than" the longest
    # use vlen str where available
    # otherwise use a string concatenation dimension
    if lengths.quantile(0.2) < 0.6 * largest:
        typecode = "O"
    else:
        typecode = "S1"
return np.dtype(typecode)

Set up encoding for saving to netCDF

encoding = {} for name, var in ds.items(): encoding[name] = {}

var_kind = var.dtype.kind
# Perhaps we should assume "u" means people know what they're
# doing
if var_kind in ("u", "i"):
    dtype = dtype_for_int_array(var)
    if var_kind == "u":
        dtype = dtype.replace("i", "u")
elif var_kind == "f":
    finfo = np.finfo(var.dtype)

    abs_var = np.abs(var)
    dynamic_range = abs_var.max() / abs_var[abs_var > 0].min()
    if dynamic_range > 10**finfo.precision:
        # Dynamic range too high for quantization
        dtype = var.dtype
    else:
        # set scale_factor and add_offset for quantization
        # Also figure out what dtype compresses best
        var_min = var.min()
        var_range = var.max() - var_min
        mid_range = var_min + var_range / 2

        # Rescale to -1 to 1
        values_to_compress = (var - mid_range) / (0.5 * var_range)
        # for digits in range(finfo.precision):
        for digits in (2, 4, 9, 18):
            if np.allclose(
                values_to_compress,
                np.around(values_to_compress, digits),
                rtol=finfo.precision,
            ):
                dtype = var.dtype
                # Convert digits to integer dtype
                # digits <= 2 to i1
                # digits <= 4 to i2
                # digits <= 9 to i4
                # digits <= 18 to i8
                if digits <= 2:
                    dtype = np.dtype("i1")
                elif digits <= 4:
                    dtype = np.dtype("i2")
                elif digits <= 9:
                    dtype = np.dtype("i4")
                else:
                    dtype = np.dtype("i8")

                if dtype.itemsize >= var.dtype.itemsize:
                    # Quantization saves space
                    dtype = var.dtype
                else:
                    # Quantization does not save space
                    storage_iinfo = np.iinfo(dtype)
                    encoding[name]["add_offset"] = mid_range.values
                    encoding[name]["scale_factor"] = (
                        2 * var_range / storage_iinfo.max
                    ).values
                    encoding[name]["_FillValue"] = storage_iinfo.min
                break
        else:
            # Quantization would lose information
            dtype = var.dtype
elif var_kind == "O":
    dtype = dtype_for_str_array(var)
else:
    dtype = var.dtype
encoding[name]["dtype"] = dtype

ds.to_netcdf("ds_encoded.nc", encoding=encoding)

Display results

stat_csv = os.stat("df.csv") stat_nc = os.stat("ds.nc") stat_enc = os.stat("ds_encoded.nc")

sizes = pd.Series( index=["CSV", "default netCDF", "encoded netCDF"], data=[stats.st_size for stats in [stat_csv, stat_nc, stat_enc]], name="File sizes", )

print("File sizes (kB):", np.right_shift(sizes, 10), sep="\n", end="\n\n")

print("Sizes relative to CSV:", sizes / sizes.iloc[0], sep="\n", end="\n\n")

Check that I didn't break the floats

from_disk = xr.open_dataset("ds_encoded.nc") np_tst.assert_allclose(ds["f"], from_disk["f"], rtol=10-DIGITS, atol=10-DIGITS) bash $ python xarray_auto_small_output.py && ls -sSh .csv .nc File sizes (kB): CSV 1942 default netCDF 10161 encoded netCDF 1375 Name: File sizes, dtype: int64

Sizes relative to CSV: CSV 1.000000 default netCDF 5.230366 encoded netCDF 0.708063 Name: File sizes, dtype: float64

10M ds.nc 1.9M df.csv 1.4M ds_encoded.nc ``` I added a column of floats with one digit before and after the decimal point to the example dataset, because why not.

Does this satisfy your use-case?

Should I turn the giant loop into a function to go into xarray somewhere? If so, I should probably tie the float handling in with the new least_significant_digit feature in netCDF4-python so the data gets read in the same way it was before getting written out.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Automatic dtype encoding in to_netcdf 412180435
1069092987 https://github.com/pydata/xarray/issues/6310#issuecomment-1069092987 https://api.github.com/repos/pydata/xarray/issues/6310 IC_kwDOAMm_X84_uRB7 DWesl 22566757 2022-03-16T12:50:50Z 2022-03-16T12:50:50Z CONTRIBUTOR

That could work. Are you set up to check that? That can be either a full repository checkout or an XArray installation you can edit.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Only auxiliary coordinates are listed in nc variable attribute 1154014066
1069084130 https://github.com/pydata/xarray/issues/6310#issuecomment-1069084130 https://api.github.com/repos/pydata/xarray/issues/6310 IC_kwDOAMm_X84_uO3i DWesl 22566757 2022-03-16T12:40:20Z 2022-03-16T12:40:20Z CONTRIBUTOR

Given this: https://github.com/pydata/xarray/blob/613a8fda4f07181fbc41d6ff2296fec3726fd351/xarray/conventions.py#L782-L783 I think that should be working. This: https://github.com/pydata/xarray/blob/613a8fda4f07181fbc41d6ff2296fec3726fd351/xarray/conventions.py#L770-L779 explicitly says it should, and is probably the part where things go wrong, but it should be going wrong the same way for encoding and attrs.

I think https://github.com/pydata/xarray/blob/613a8fda4f07181fbc41d6ff2296fec3726fd351/xarray/conventions.py#L758-L768 may need to be split into two conditionals, one for attrs and one for encoding. I'm not sure how to get the continue behavior while allowing the code to work for both attrs and encoding without code duplication.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Only auxiliary coordinates are listed in nc variable attribute 1154014066
1069064616 https://github.com/pydata/xarray/issues/6310#issuecomment-1069064616 https://api.github.com/repos/pydata/xarray/issues/6310 IC_kwDOAMm_X84_uKGo DWesl 22566757 2022-03-16T12:17:37Z 2022-03-16T12:17:37Z CONTRIBUTOR

I tried to find what the CF conventions say about including dimension coordinates (I'm using the name from scitools-iris rather than "coordinate variable" as used in the CF conventions to keep myself from getting confused) in the coordinates attribute. From what I can tell, the whole document is consistent with usually excluding dimension coordinates from the coordinates attribute. Most of the Discrete Sampling Geometry examples in appendix H seem to include the dimension coordinates in the coordinates attributes, though at least one example leaves the dimension coordinates implied rather than explicit.

From what I remember, XArray is based on the netCDF data model, rather than the CF data model, so initializing variable_coordinates[var_name] = set(variable.dims) will do the wrong thing if the dataset doesn't set one or more of its dimension coordinates (example H.2 has variables with dimensions ("station", "time"), but no variable named station. Section 4.5 makes this practice explicit). You could work around this by leaving the initialization as it stands but dropping the if coordinate_name not in variable.dims condition on including coordinate_name as part of the coordinates attribute.

  1. Stick to the current logic which might be non-conformal with the CF conventions in case of "Discrete Sampling Geometries". However, users can manually fix this by setting the coordinates in encoding.

Based on this, I think doing solution one from the previous post on writing a dataset will always be consistent with CF, but assuming that netCDF files XArray reads into datasets will always follow this pattern would be a problem. I suspect there are tests for reading netCDF files with dimension coordinates included in coordinates attributes already, but haven't checked.

  1. Implement a logic to recognize cases where a dataset is a "Discrete Sampling Geometry" and only then list the non-auxiliary coordinates in the variable attribute. This is a bit tricky, and I don't have the time to implement this, I'm afraid.

If you want to try solution three, almost all Discrete Sampling Geometry files must have a global attribute called featureType. Since that attribute is recommended for all Discrete Sampling Geometry files, you could declare that the presence of that attribute defines a Discrete Sampling Geometry file for XArray. However, I don't see any place that says including dimension coordinates in the coordinates attribute is required, even for Discrete Sampling Geometry files, and a few places that explicitly say dimension coordinates can be omitted from the coordinates attribute, even for Discrete Sampling Geometry files.

The references from CF on whether dimension coordinates can be included in the coordinates attribute:

The fifth paragraph of CF section five says:

If the longitude, latitude, vertical or time coordinate is multi-valued, varies in only one dimension, and varies independently of other spatiotemporal coordinates, it is not permitted to store it as an auxiliary coordinate variable.

I think this is saying that if you can represent a coordinate using just one dimension, you shouldn't use two (that is, avoid using np.tile(np.arange(10), (3, 1)) as a longitude coordinate). The other interpretation is that dimension coordinates must not be included in the coordinates attribute, which seems unlikely given that three lines later it says:

Note that it is permissible, but optional, to list coordinate variables as well as auxiliary coordinate variables in the coordinates attribute.

The first paragraph of the section on Discrete sampling geometries:

Every element of every feature must be unambiguously associated with its space and time coordinates and with the feature that contains it. The coordinates attribute must be attached to every data variable to indicate the spatiotemporal coordinate variables that are needed to geo-locate the data.

I think dimension coordinates are explicit enough to count as "unambiguously associated", even without inclusion in the coordinates attribute, since they share a name with one of the dimensions of the Discrete Sampling Geometry data variables. This seems to be made explicit in the fourth paragraph:

Auxiliary coordinate variables containing the nominal and the precise positions should be listed in the relevant coordinates attributes of data variables. In orthogonal representations the nominal positions could be coordinate variables, which do not need to be listed in the coordinates attribute, rather than auxiliary coordinate variables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Only auxiliary coordinates are listed in nc variable attribute 1154014066
866314326 https://github.com/pydata/xarray/issues/5510#issuecomment-866314326 https://api.github.com/repos/pydata/xarray/issues/5510 MDEyOklzc3VlQ29tbWVudDg2NjMxNDMyNg== DWesl 22566757 2021-06-22T20:33:44Z 2021-06-22T20:37:53Z CONTRIBUTOR

~encoding is where that information is stored between reading a dataset in from disk and saving it back out again.~ _encode_coordinates can take a default value from either of encoding or attrs, but a falsy value will be overwritten. Setting .attrs["coordinates"] = " " should work.

```python

import numpy as np, xarray as xr data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]}) ds = xr.Dataset({"foo": data, "bar": ("x", [1, 2]), "fake": 10}) ds = ds.assign_coords({"reftime":np.array("2004-11-01T00:00:00", dtype=np.datetime64)}) ds = ds.assign({"test": 1}) ds.test.encoding["coordinates"] = " " ds.to_netcdf("file.nc") bash $ ncdump -h file.nc netcdf file { dimensions: x = 2 ; y = 3 ; variables: int64 x(x) ; double foo(x, y) ; foo:_FillValue = NaN ; foo:coordinates = "reftime" ; int64 bar(x) ; bar:coordinates = "reftime" ; int64 fake ; fake:coordinates = "reftime" ; int64 reftime ; reftime:units = "days since 2004-11-01 00:00:00" ; reftime:calendar = "proleptic_gregorian" ; int64 test ; test:coordinates = " " ; } `` As mentioned above, the XArray data model associates coordinates with dimensions rather than with variables, so any time you read the dataset back in again, thetestvariable will gainreftimeas a coordinate, because the dimensions ofreftime(()), are a subset of the dimensions oftest(also()`).

Not producing a coordinates attribute for variables mentioned in another variable's bounds attribute (or a few other attributes, for that matter) would be entirely doable within the function linked above, and should be straightforward if you want to make a PR for that.

Making realization and the bounds show up in ds.coords rather than ds.data_vars may also skip setting the coordinates attribute, though I'm less sure of that. It would, however, add realization to the coordinates attributes of every other data_var unless you overrode that, which may not be what you want.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Can't remove coordinates attribute from DataArrays  927336712
778717611 https://github.com/pydata/xarray/pull/2844#issuecomment-778717611 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDc3ODcxNzYxMQ== DWesl 22566757 2021-02-14T03:35:55Z 2021-02-14T03:35:55Z CONTRIBUTOR

~Does anyone know why the xr.open_dataset(....) call is echoed in the warning message. Is this intentional? Cc @dcherian @DWesl~

It seems you've already figured this out, but for anyone else with this question, the repeat of the call on that file is part of the warning that the file does not have all the variables the attributes refer to. You can fix this by recreating the file with the listed variables added (areacella, or by deleting the attribute from the variables (cell_measures). You can also ignore the warning using the machinery in the warnings module.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
778629061 https://github.com/pydata/xarray/pull/2844#issuecomment-778629061 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDc3ODYyOTA2MQ== DWesl 22566757 2021-02-13T14:46:25Z 2021-02-13T14:46:25Z CONTRIBUTOR

I think this looks good.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
761842344 https://github.com/pydata/xarray/pull/2844#issuecomment-761842344 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDc2MTg0MjM0NA== DWesl 22566757 2021-01-17T16:48:39Z 2021-01-17T16:48:39Z CONTRIBUTOR

Looks good to me. I was wondering where those docstrings were.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
670778071 https://github.com/pydata/xarray/issues/4121#issuecomment-670778071 https://api.github.com/repos/pydata/xarray/issues/4121 MDEyOklzc3VlQ29tbWVudDY3MDc3ODA3MQ== DWesl 22566757 2020-08-07T23:07:14Z 2020-08-17T13:09:27Z CONTRIBUTOR

2844 used to move these variables to ds.coords rather than ds.data_vars, and allowed saving of ancillary_variables via the encoding attribute. It was decided to drop that since ancillary_variables are linked to variables rather than dimensions like most of the other CF attributes.

The specific behavior mentioned in the original post (describing ancillary_variables in the output) might work better in cf-xarray's ds.cf.describe method.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  decode_cf doesn't work for ancillary_variables in attributes 630573329
670996109 https://github.com/pydata/xarray/pull/2844#issuecomment-670996109 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MDk5NjEwOQ== DWesl 22566757 2020-08-09T02:17:07Z 2020-08-09T16:36:12Z CONTRIBUTOR

That's two people with that view so I made the change.

Again, I feel that the quality flags are essentially meaningless on their own, useful primarily in the context of their associated variables, like the items currently put in the XArray coords attribute, which, admittedly, is only those variables identified by CF as dimension or auxiliary coordinates at the moment, and should remain associated with the relevant variable even if it is extracted into a DataArray. Since all of the other people who have opinions on the matter seem to disagree with me, I changed the code to preserve the present behavior with regards to ancillary_variables. I can always monkey-patch it back in if it really bothers me, or add a Dataset.__getitem__ wrapper to xarray-contrib/cf-xarray so that the ancillary_variables stay associated when I pull variables out, or move back to SciTools/iris.

On a related note, I should probably check whether this breaks conversion to an iris.Cube.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
670816691 https://github.com/pydata/xarray/pull/2844#issuecomment-670816691 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MDgxNjY5MQ== DWesl 22566757 2020-08-08T03:25:17Z 2020-08-08T03:53:39Z CONTRIBUTOR

You are correct; ancillary_variables is neither grid_mapping or bounds.

My personal view is that the quality information should stay with the variable it describes unless explicitly dropped; I think your view is that quality information can always be extracted from the original dataset, and that no variable should carry quality information for a different variable. At this point it would be simple to remove ancillary_variables from the attributes processed by this PR. There was a suggestion earlier of adding a decode_aux_vars argument to control the new behavior as a means of avoiding back-compatibility breaks like this one. I will leave that as a question for the maintainers; there is also some related discussion at #4215.

I should point out that a similar situation arises for grid_mapping; ds.coords["time"] will include the grid_mapping variable in its coordinates. In contrast, ds.coords["x"] will not include the bounds for the x variable, since it has more dimensions than ds.coords["x"]

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
670765806 https://github.com/pydata/xarray/pull/2844#issuecomment-670765806 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MDc2NTgwNg== DWesl 22566757 2020-08-07T22:29:20Z 2020-08-07T22:29:20Z CONTRIBUTOR

The MinimumVersionsPolicy error appears to be a series of internal conda errors, and is probably unrelated.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
670730008 https://github.com/pydata/xarray/pull/2844#issuecomment-670730008 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MDczMDAwOA== DWesl 22566757 2020-08-07T22:02:47Z 2020-08-07T22:02:47Z CONTRIBUTOR

pydata/xarray-data#19

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
667744209 https://github.com/pydata/xarray/pull/2844#issuecomment-667744209 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY2Nzc0NDIwOQ== DWesl 22566757 2020-08-03T00:14:24Z 2020-08-03T19:42:02Z CONTRIBUTOR

The rasm dataset has coordinates xc and yc, which reference bounds xv and yv respectively, which I do not see in the variable list with decode_coords=False. It would appear that pydata/xarray-data#4 did not include the bounds in the updated dataset when adding coordinates to rasm.nc, so this warning is correct. I do not know that file, so I'm probably not the best person to add bounds. Should I wait for an update to pydata/xarray-data, or should I ask sphinx to ignore the warning?

Another option is to just delete the bounds attributes of xc and yc in rasm.nc

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
667737160 https://github.com/pydata/xarray/pull/2844#issuecomment-667737160 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY2NzczNzE2MA== DWesl 22566757 2020-08-02T23:13:26Z 2020-08-02T23:13:26Z CONTRIBUTOR

The example the doc build doesn't like: ```python ds = xr.tutorial.load_dataset("rasm") ds.to_zarr("rasm.zarr", mode="w") import zarr

zgroup = zarr.open("rasm.zarr") print(zgroup.tree()) dict(zgroup["Tair"].attrs) `` I'll need to look into therasm` dataset to figure out why there is a warning now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
658329779 https://github.com/pydata/xarray/issues/4215#issuecomment-658329779 https://api.github.com/repos/pydata/xarray/issues/4215 MDEyOklzc3VlQ29tbWVudDY1ODMyOTc3OQ== DWesl 22566757 2020-07-14T18:07:05Z 2020-07-14T18:07:05Z CONTRIBUTOR

formula_terms is another attribute with variable names, although it requires a bit more parsing.

Question: Should we allow decode_coords to control whether variables mentioned in these attributes are set as coordinate variables?

I don't think this is necessary. It's easy to explicitly set or reset coordinates afterwards if desired.

Is that "putting the variables in these attributes in coords is out of scope for XArray" or "putting the variables in these attributes in coords is out of scope for decode_coords" or something else?

I would say no however to ancillary_variables, since those are not really about coordinates and instead about linked data variables (like uncertainties).

I tend to think of uncertainties and status flags as important for the interpretation of the associated variables that should stay with the data variables unless a decision is explicitly made to drop them. On the other hand, since XArray seems to associate coordinates with dimensions rather than with variables, I can see why this might be less than desirable. This argument would also apply to grid_mapping.

My one concern with #2844 is clarifying the role of encoding vs. attrs.

I think we should probably ensure that xarray always propagates encoding exactly like how it propagates attrs.

Should this be part of #2844 or should preserving encoding be a separate PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  setting variables named in CF attributes as coordinate variables 654889988
497948836 https://github.com/pydata/xarray/pull/2844#issuecomment-497948836 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDQ5Nzk0ODgzNg== DWesl 22566757 2019-06-01T14:20:08Z 2020-07-14T15:55:17Z CONTRIBUTOR

On 5/31/2019 11:50 AM, dcherian wrote:

It isn't just MetPy though. I'm sure there's existing code relying on adding |grid_mapping| and |bounds| to |attrs| in order to write CF-compliant files. So there's a (potentially big) backward compatibility issue. This becomes worse if in the future we keep interpreting more CF attributes and moving them to |encoding| :/.

At present, the proper, CF-compliant way to do this is to have both |grid_mapping| and |bounds| variables in |data_vars|, and maintain the attributes yourself, including making sure the variables get copied into the result after relevant |ds[var_name]| and |ds.sel(axis=bounds)| operations.

If you decide to move these variables to |coords|, the |bounds| variables will still get dropped on any subsetting operation, including those where the relevant axis was retained, the |grid_mapping| variables will be included in the result of all subsetting operations (including pulling out, for example, a time coordinate), and both will be included in some |coordinates| attribute when written to disk, breaking CF compliance.

This PR only really addresses getting these variables in |coords| initially and keeping them out of the global |coordinates| attribute when writing to disk.

Since I'm doing this primarily to get grid_mapping and bounds
variables out of ds.data_vars.

I'm +1 on this but I wonder whether saving them in |attrs| and using that information when encoding coordinates would be the more pragmatic choice.

You have a point about |grid_mapping|, but applying the MetPy approach of saving the information in another, more directly useful format (|cartopy.Projection| instances) immediately after loading the file would be a way around that.

For |bounds|, I think |pd.PeriodIndex| would be the most natural representation for time, and |pd.IntervalIndex| for most other 1-D cases, but that still leaves |bounds| for two-or-more-dimensional coordinates.

That's a design choice I'll leave to the maintainers.

We could define |encoding| as containing a specified set of CF attributes that control on-disk representation such as |units|, |scale_factor|, |contiguous| etc. and leaving everything else in |attrs|. A full list of attributes that belong in |encoding| could be in the docs so that downstream packages can fully depend on this behaviour.

Currently I see |coordinates| is interpreted and moved to |encoding|. In the above proposal, this would be left in |attrs| but its value would still be interpreted if |decode_coords=True|.

What do you think?

At present, |set(ds[var_name].attrs["coordinates"].split())| and |set(ds[var_name].coords) - set(ds[var_name].indexes[dim_name])| would be identical, since the |coordinates| attribute is essentially computed from the second expression on write.

Do you have a use case in mind where you need specifically the list of CF auxiliary coordinates, or is that just an example of something that would change under the new proposal? I assume |units| would be moved to |encoding| only for |datetime64[ns]| and |timedelta64[ns]| variables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
644405067 https://github.com/pydata/xarray/pull/2844#issuecomment-644405067 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY0NDQwNTA2Nw== DWesl 22566757 2020-06-15T21:40:49Z 2020-06-15T21:40:49Z CONTRIBUTOR

This PR currently puts grid_mapping and bounds in encoding once it is done with them. Is that where XArray wants to put them, or should they be somewhere else?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
633296515 https://github.com/pydata/xarray/issues/2780#issuecomment-633296515 https://api.github.com/repos/pydata/xarray/issues/2780 MDEyOklzc3VlQ29tbWVudDYzMzI5NjUxNQ== DWesl 22566757 2020-05-24T20:45:43Z 2020-05-24T20:45:43Z CONTRIBUTOR

For the example given, this would mean finding largest = max(abs(ds.min()), abs(ds.max())) and finding the first integer dtype wide enough to write that: [np.iinfo("i{bytes:d}".format(bytes=2 ** i)).max >= largest for i in range(4)] would help there. The function below should help with this; I would tend to use this at array creation time rather than at save time so you get these benefits in memory as well as on disk.

For the character/string variables, the smallest representation varies a bit more: a fixed-width encoding (dtype=S6) will probably be smaller if all the strings are about the same size, while variable-width strings are probably smaller if there are many short strings and only a few long strings. If you happen to know that a given field is a five-character identifier or a one-character status code, you can again set these types to be used in memory (which I think makes dask happier when it comes time to save), while free-form survey responses will likely be better as a variable-length string. It may be possible use the distribution of string lengths (perhaps using numpy.char.str_len) to see whether most of the strings are at least 90% as long as the longest, but it's probably simpler to test.

Doing this correctly for floating-point types would be difficult, but I think that's outside the scope of this issue.

Hopefully this gives you something to work with.

```python import numpy as np

def dtype_for_int_array(arry: "array of integers") -> np.dtype: """Find the smallest integer dtype that will encode arry.

Parameters
----------
arry : array of integers
    The array to compress

Returns
-------
smallest: dtype
    The smallest dtype that will represent arry
"""
largest = max(abs(arry.min()), abs(arry.max()))
typecode = "i{bytes:d}".format(
    bytes=2 ** np.nonzero([
        np.iinfo("i{bytes:d}".format(bytes=2 ** i)).max >= largest
        for i in range(4)
    ])[0][0]
)
return np.dtype(typecode)

```

Looking at df.memory_usage() will explain why I do this early. If I extend your example with this new function, I see the following: ```python

df_small = df.copy() for col in df_small: ... df_small[col] = df_small[col].astype( ... dtype_for_int_array(df_small[col]) if df_small[col].dtype.kind == "i" else "S1" ... ) ... df_small.memory_usage() Index 80 a 100000 b 100000 c 100000 d 100000 e 800000 dtype: int64 df.memory_usage() Index 80 a 800000 b 800000 c 800000 d 800000 e 800000 dtype: int64 ```

It looks like pandas always uses object dtype for string arrays, so the numbers in that column likely reflect the size of an array of pointers. XArray lets you use a dtype of "S1" or "U1", but I haven't found the equivalent of the memory_usage method.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Automatic dtype encoding in to_netcdf 412180435
633253434 https://github.com/pydata/xarray/pull/2844#issuecomment-633253434 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDYzMzI1MzQzNA== DWesl 22566757 2020-05-24T16:09:04Z 2020-05-24T16:09:04Z CONTRIBUTOR

Should I change this to put grid_mapping and bounds back in attrs, or should I leave them in encoding?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
633251217 https://github.com/pydata/xarray/issues/4068#issuecomment-633251217 https://api.github.com/repos/pydata/xarray/issues/4068 MDEyOklzc3VlQ29tbWVudDYzMzI1MTIxNw== DWesl 22566757 2020-05-24T15:53:10Z 2020-05-24T15:53:10Z CONTRIBUTOR

For others reading this issue, the h5netcdf workaround was discussed in #3297, with further discussion on supporting complex numbers in netCDF in cf-convention/cf-conventions#204.

The short version: engine="h5netcdf", invalid_netcdf=True will save these files, but the netCDF-C library doesn't understand the result. Reading with engine="h5netcdf" may be able to round-trip these files, but I haven't checked that.

There is a longer discussion of why netCDF-C doesn't understand these files at Unidata/netcdf-c#267. That specific issue is for booleans, but complex numbers are likely the same.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  utility function to save complex values as a netCDF file 619347681
597375929 https://github.com/pydata/xarray/pull/2844#issuecomment-597375929 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDU5NzM3NTkyOQ== DWesl 22566757 2020-03-10T23:54:41Z 2020-03-10T23:54:41Z CONTRIBUTOR

I think the choice is between attrs and encoding, not both.

If it helps lean your decision one way or the other, attrs tends to stay associated with Datasets through more operations than encoding, so parse_cf() would have to be called fairly soon after opening if the information ends up in encoding, while putting it in attrs gives users a bit more time for that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
587466776 https://github.com/pydata/xarray/issues/3689#issuecomment-587466776 https://api.github.com/repos/pydata/xarray/issues/3689 MDEyOklzc3VlQ29tbWVudDU4NzQ2Njc3Ng== DWesl 22566757 2020-02-18T13:44:27Z 2020-02-18T13:44:27Z CONTRIBUTOR

bounds and grid_mapping?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Decode CF bounds to coords 548607657
587466093 https://github.com/pydata/xarray/pull/2844#issuecomment-587466093 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDU4NzQ2NjA5Mw== DWesl 22566757 2020-02-18T13:43:12Z 2020-02-18T13:43:12Z CONTRIBUTOR

The test failures seem to all be due to recent changes in cftime/CFTimeIndex, which I haven't touched.

Is sticking the grid_mapping and bounds attributes in encoding good, or should I put them back in attrs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
586273656 https://github.com/pydata/xarray/pull/2844#issuecomment-586273656 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDU4NjI3MzY1Ng== DWesl 22566757 2020-02-14T12:47:06Z 2020-02-14T12:47:06Z CONTRIBUTOR

I just noticed pandas.PeriodIndex would be an alternative to pandas.IntervalIndex for time data if which side the interval is closed on is largely irrelevant for such data.

Is there an interest in using these for 1D coordinates with bounds? I think ds.groupby_bins() already returns an IntervalIndex.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
586261327 https://github.com/pydata/xarray/pull/3724#issuecomment-586261327 https://api.github.com/repos/pydata/xarray/issues/3724 MDEyOklzc3VlQ29tbWVudDU4NjI2MTMyNw== DWesl 22566757 2020-02-14T12:07:21Z 2020-02-14T12:07:21Z CONTRIBUTOR

Not yet, at least: https://github.com/pydata/xarray/network/dependents GitHub points my projects using XArray at https://github.com/thadncs/https-github.com-pydata-xarray rather than this repository, There seem to be a decent number of repositories there: https://github.com/thadncs/https-github.com-pydata-xarray/network/dependents I have no idea why GitHub shifted them, nor what to do about it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  setuptools-scm (3) 555752381
497566742 https://github.com/pydata/xarray/pull/2844#issuecomment-497566742 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDQ5NzU2Njc0Mg== DWesl 22566757 2019-05-31T04:00:17Z 2019-05-31T04:00:17Z CONTRIBUTOR

Switched to use in rather than is not None.

Re: grid_mapping in .encoding not .attrs MetPy assumes grid_mapping will be in .attrs. Since the xarray documentation mentions this capability, should I be making concurrent changes to MetPy to allow this to continue?

If so, would it be sufficient to change their .attrs references to .encoding and mentioning in both sets of documentation that the user should call ds.metpy.parse_cf() immediately after loading to ensure the information is available for MetPy to use? I don't entirely understand the accessor API.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
497558317 https://github.com/pydata/xarray/pull/2844#issuecomment-497558317 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDQ5NzU1ODMxNw== DWesl 22566757 2019-05-31T03:04:06Z 2019-05-31T03:13:53Z CONTRIBUTOR

This is briefly mentioned above, in https://github.com/pydata/xarray/pull/2844#discussion_r270595609 The rationale was that everywhere else xarray uses CF attributes for something, the original values of those attributes are recorded in var.encoding, not var.attrs, and consistency across a code base is a good thing. Since I'm doing this primarily to get grid_mapping and bounds variables out of ds.data_vars, I don't have strong opinions on the subject.

If you feel strongly to the contrary, there's an idea at the top of this thread for getting bounds information encoded in terms xarray already uses in some cases (Dataset.groupby_bins()), and the diffs for this PR should help you figure out what needs changing to support this.

For grid_mapping there's http://xarray.pydata.org/en/latest/weather-climate.html#cf-compliant-coordinate-variables which is enough for my uses.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
478290053 https://github.com/pydata/xarray/pull/2844#issuecomment-478290053 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDQ3ODI5MDA1Mw== DWesl 22566757 2019-03-30T21:17:17Z 2019-03-30T21:17:17Z CONTRIBUTOR

I can shift this to use encoding only, but I'm having trouble figuring out where that code would go.

Would the preferred path be to create VariableCoder classes for each and add them to encode_cf_variable, then add tests to xarray.tests.test_coding?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
478248763 https://github.com/pydata/xarray/pull/2843#issuecomment-478248763 https://api.github.com/repos/pydata/xarray/issues/2843 MDEyOklzc3VlQ29tbWVudDQ3ODI0ODc2Mw== DWesl 22566757 2019-03-30T14:04:12Z 2019-03-30T14:04:12Z CONTRIBUTOR

I just checked and can't find that section of the documentation now, so that seems to be consistent.

I suppose that's a vote for "be sure to check current behavior before submitting old packages". I'll change my code to this new method then.

Thanks

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow passing _FillValue=False in encoding for vlen str variables. 424262546
476586154 https://github.com/pydata/xarray/pull/2844#issuecomment-476586154 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDQ3NjU4NjE1NA== DWesl 22566757 2019-03-26T11:31:05Z 2019-03-26T11:31:05Z CONTRIBUTOR

Related to #1475 and #2288 , but this is just keeping the metadata consistent where already present, not extending the data model to include bounds, cells, or projections. I should add a test to ensure saving still works if the bounds are lost when pulling out variables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
309484883 https://github.com/pydata/xarray/pull/814#issuecomment-309484883 https://api.github.com/repos/pydata/xarray/issues/814 MDEyOklzc3VlQ29tbWVudDMwOTQ4NDg4Mw== DWesl 22566757 2017-06-19T15:58:27Z 2017-06-19T15:58:27Z CONTRIBUTOR

If you're still looking for the old tests, it looks like they disappeared in the last merge commit, f48de5.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray to and from iris 145140657

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 21.09ms · About: xarray-datasette