home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

14 rows where author_association = "CONTRIBUTOR" and user = 43613877 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 10

  • Add date attribute to datetime accessor 2
  • KeyError when selecting "nearest" data with given tolerance 2
  • to_zarr: region not recognised as dataset dimensions 2
  • pdyap version dependent client.open_url call 2
  • Resample with limit/tolerance 1
  • ENH: resample methods with tolerance 1
  • Date missing in datetime accessor 1
  • `mean` returns empty DataArray for `groupby_bins` containing `datetime64` 1
  • import_metadata==5.0.0 causes error when loading netcdf file 1
  • Zarr: drop "source" and "original_shape" from encoding 1

user 1

  • observingClouds · 14 ✖

author_association 1

  • CONTRIBUTOR · 14 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1416861976 https://github.com/pydata/xarray/pull/7500#issuecomment-1416861976 https://api.github.com/repos/pydata/xarray/issues/7500 IC_kwDOAMm_X85Uc5kY observingClouds 43613877 2023-02-04T22:16:49Z 2023-02-04T22:16:49Z CONTRIBUTOR

I'm just mimicking the netCDF4 driver here. Maybe one could use less probable attributes than source? Maybe adding a prefix like xarray_ to those attributes? I'm open to suggestions.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr: drop "source" and "original_shape" from encoding 1571143098
1265699141 https://github.com/pydata/xarray/issues/7115#issuecomment-1265699141 https://api.github.com/repos/pydata/xarray/issues/7115 IC_kwDOAMm_X85LcQlF observingClouds 43613877 2022-10-03T16:13:20Z 2022-10-03T16:13:20Z CONTRIBUTOR

I was about to open the same issue but can confirm that I only get an issue with python<3.8 and importlib_metadata==5.0.0 @Illviljan , importlib-metadata should not be necessary for python>=3.8 as it became part of the standard library.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  import_metadata==5.0.0 causes error when loading netcdf file 1394854820
1263824497 https://github.com/pydata/xarray/issues/6995#issuecomment-1263824497 https://api.github.com/repos/pydata/xarray/issues/6995 IC_kwDOAMm_X85LVG5x observingClouds 43613877 2022-09-30T17:18:22Z 2022-09-30T17:18:22Z CONTRIBUTOR

This issue might be a duplicate of #5897 and it continues to exist in version 2022.09.0.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `mean` returns empty DataArray for `groupby_bins` containing `datetime64` 1362683132
1146453489 https://github.com/pydata/xarray/pull/6656#issuecomment-1146453489 https://api.github.com/repos/pydata/xarray/issues/6656 IC_kwDOAMm_X85EVX3x observingClouds 43613877 2022-06-03T23:36:54Z 2022-06-03T23:37:34Z CONTRIBUTOR

Our tests on min-all-deps are running with pydap 3.2.2 but were passing. Was the test xfailed? If so can we remove it.

Actually, the general unit test should fail but the specific tests are skipped. Only when the flag --run-network-tests is provided to pytest the tests run and would have failed in the past.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pdyap version dependent client.open_url call 1255359858
1143316526 https://github.com/pydata/xarray/pull/6656#issuecomment-1143316526 https://api.github.com/repos/pydata/xarray/issues/6656 IC_kwDOAMm_X85EJaAu observingClouds 43613877 2022-06-01T08:50:43Z 2022-06-01T08:50:43Z CONTRIBUTOR

Shall we raise a warning in case verify and/or user_charset are given and the used pydap version is older than 3.0.0? Or is it fine to just ignore those arguments in this case without warning the user?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pdyap version dependent client.open_url call 1255359858
1033814820 https://github.com/pydata/xarray/issues/6069#issuecomment-1033814820 https://api.github.com/repos/pydata/xarray/issues/6069 IC_kwDOAMm_X849nsMk observingClouds 43613877 2022-02-09T14:23:54Z 2022-02-09T14:36:48Z CONTRIBUTOR

You are right, the coordinates should not be dropped.

I think the function _validate_region has a bug. Currently it checks for all ds.variables if at least one of their dimensions agrees with the ones given in the region argument. However, ds.variables also returns the coordinates. However, we actually only want to check if the ds.data_vars have a dimension intersecting with the given region.

Changing the function to `python def _validate_region(ds, region): if not isinstance(region, dict): raise TypeError(f"region`` must be a dict, got {type(region)}")

for k, v in region.items():
    if k not in ds.dims:
        raise ValueError(
            f"all keys in ``region`` are not in Dataset dimensions, got "
            f"{list(region)} and {list(ds.dims)}"
        )
    if not isinstance(v, slice):
        raise TypeError(
            "all values in ``region`` must be slice objects, got "
            f"region={region}"
        )
    if v.step not in {1, None}:
        raise ValueError(
            "step on all slices in ``region`` must be 1 or None, got "
            f"region={region}"
        )

non_matching_vars = [
    k for k, v in ds.data_vars.items() if not set(region).intersection(v.dims)
]
if non_matching_vars:
    raise ValueError(
        f"when setting `region` explicitly in to_zarr(), all "
        f"variables in the dataset to write must have at least "
        f"one dimension in common with the region's dimensions "
        f"{list(region.keys())}, but that is not "
        f"the case for some variables here. To drop these variables "
        f"from this dataset before exporting to zarr, write: "
        f".drop({non_matching_vars!r})"
    )

``` seems to work.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 0
}
  to_zarr: region not recognised as dataset dimensions 1077079208
1031773761 https://github.com/pydata/xarray/issues/6069#issuecomment-1031773761 https://api.github.com/repos/pydata/xarray/issues/6069 IC_kwDOAMm_X849f55B observingClouds 43613877 2022-02-07T18:19:08Z 2022-02-07T18:19:08Z CONTRIBUTOR

Hi @Boorhin, I just ran into the same issue. The region argument has to be of type slice, in your case slice(t) instead of just t works:

python import xarray as xr from datetime import datetime,timedelta import numpy as np dt= datetime.now() times= np.arange(dt,dt+timedelta(days=6), timedelta(hours=1)) nodesx,nodesy,layers=np.arange(10,50), np.arange(10,50)+15, np.arange(10) ds=xr.Dataset() ds.coords['time']=('time', times) ds.coords['node_x']=('node', nodesx) ds.coords['node_y']=('node', nodesy) ds.coords['layer']=('layer', layers) outfile='my_zarr' varnames=['potato','banana', 'apple'] for var in varnames: ds[var]=(('time', 'layer', 'node'), np.zeros((len(times), len(layers),len(nodesx)))) ds.to_zarr(outfile, mode='a') for t in range(len(times)): for var in varnames: ds[var].isel(time=slice(t)).values += np.random.random((len(layers),len(nodesx))) ds.isel(time=slice(t)).to_zarr(outfile, region={"time": slice(t)})

This leads however to another issue: ```python


ValueError Traceback (most recent call last) <ipython-input-52-bb3d2c1adc12> in <module> 18 for var in varnames: 19 ds[var].isel(time=slice(t)).values += np.random.random((len(layers),len(nodesx))) ---> 20 ds.isel(time=slice(t)).to_zarr(outfile, region={"time": slice(t)})

~/.local/lib/python3.8/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks) 2029 encoding = {} 2030 -> 2031 return to_zarr( 2032 self, 2033 store=store,

~/.local/lib/python3.8/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks) 1359 1360 if region is not None: -> 1361 _validate_region(dataset, region) 1362 if append_dim is not None and append_dim in region: 1363 raise ValueError(

~/.local/lib/python3.8/site-packages/xarray/backends/api.py in _validate_region(ds, region) 1272 ] 1273 if non_matching_vars: -> 1274 raise ValueError( 1275 f"when setting region explicitly in to_zarr(), all " 1276 f"variables in the dataset to write must have at least "

ValueError: when setting region explicitly in to_zarr(), all variables in the dataset to write must have at least one dimension in common with the region's dimensions ['time'], but that is not the case for some variables here. To drop these variables from this dataset before exporting to zarr, write: .drop(['node_x', 'node_y', 'layer']) ```

Here, the solution is however provided with the error message. Following the instructions, the snippet below finally works (as far as I can tell):

```python import xarray as xr from datetime import datetime,timedelta import numpy as np dt= datetime.now() times= np.arange(dt,dt+timedelta(days=6), timedelta(hours=1)) nodesx,nodesy,layers=np.arange(10,50), np.arange(10,50)+15, np.arange(10) ds=xr.Dataset() ds.coords['time']=('time', times)

ds.coords['node_x']=('node', nodesx)

ds.coords['node_y']=('node', nodesy)

ds.coords['layer']=('layer', layers)

outfile='my_zarr' varnames=['potato','banana', 'apple'] for var in varnames: ds[var]=(('time', 'layer', 'node'), np.zeros((len(times), len(layers),len(nodesx)))) ds.to_zarr(outfile, mode='a') for t in range(len(times)): for var in varnames: ds[var].isel(time=slice(t)).values += np.random.random((len(layers),len(nodesx))) ds.isel(time=slice(t)).to_zarr(outfile, region={"time": slice(t)}) ```

Maybe one would like to generalise region in api.py to allow for single indices or throw a hint in case an a type different to a slice is provided.

Cheers

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_zarr: region not recognised as dataset dimensions 1077079208
800145214 https://github.com/pydata/xarray/pull/4994#issuecomment-800145214 https://api.github.com/repos/pydata/xarray/issues/4994 MDEyOklzc3VlQ29tbWVudDgwMDE0NTIxNA== observingClouds 43613877 2021-03-16T10:34:19Z 2021-03-16T10:34:19Z CONTRIBUTOR

Thanks for this great tool and the great support!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add date attribute to datetime accessor 822256201
799452244 https://github.com/pydata/xarray/pull/4994#issuecomment-799452244 https://api.github.com/repos/pydata/xarray/issues/4994 MDEyOklzc3VlQ29tbWVudDc5OTQ1MjI0NA== observingClouds 43613877 2021-03-15T14:12:30Z 2021-03-15T14:12:30Z CONTRIBUTOR

Great! @spencerkclark I added the information to api-hidden.rst and also to api.rst.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add date attribute to datetime accessor 822256201
799047819 https://github.com/pydata/xarray/issues/4995#issuecomment-799047819 https://api.github.com/repos/pydata/xarray/issues/4995 MDEyOklzc3VlQ29tbWVudDc5OTA0NzgxOQ== observingClouds 43613877 2021-03-15T02:28:51Z 2021-03-15T02:28:51Z CONTRIBUTOR

Thanks @dcherian, this is doing the job. I'll close this issue as there seems to be no need to implement this into the sel method.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  KeyError when selecting "nearest" data with given tolerance  822320976
791019238 https://github.com/pydata/xarray/issues/4995#issuecomment-791019238 https://api.github.com/repos/pydata/xarray/issues/4995 MDEyOklzc3VlQ29tbWVudDc5MTAxOTIzOA== observingClouds 43613877 2021-03-04T23:10:11Z 2021-03-04T23:10:11Z CONTRIBUTOR

Introducing a fill_value seems like a good idea, such that the size of the output does not change compared to the intended selection. Choosing the original/requested coordinate as a label for the missing datapoint seems to be a valid choice because this position has been checked for valid data nearby without success. I would suggest, that the fill_value should then be automatically determined from the _FillValue, the datatype and only at last requires the fill_value to be set.

However, the shortcoming that I see in using a fill_value is that the indexing has to modify the data ( insert e.g. -999) and also 'invent' a new coordinate point ( here 40). This gets reasonably complex, when applying to a dataset with DataArrays of different types, e.g. ```python import numpy as np import xarray as xr

ds = xr.Dataset() ds['data1'] = xr.DataArray(np.array([1,2,3,4,5], dtype=int), dims=["lat"], coords={'lat':[10,20,30,50,60]}) ds['data2'] = xr.DataArray(np.array([1,2,3,4,5], dtype=float), dims=["lat"], coords={'lat':[10,20,30,50,60]}) `` Onefill_valuemight not fit to all data arrays being it because of the datatype or the actual data. E.g.-999might be a goodfill_value` for one DataArray but a valid datapoint in another one.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  KeyError when selecting "nearest" data with given tolerance  822320976
790718449 https://github.com/pydata/xarray/issues/4983#issuecomment-790718449 https://api.github.com/repos/pydata/xarray/issues/4983 MDEyOklzc3VlQ29tbWVudDc5MDcxODQ0OQ== observingClouds 43613877 2021-03-04T15:52:38Z 2021-03-04T15:53:22Z CONTRIBUTOR

I didn't thought of using da.time.dt.floor("D"). This is indeed great to know, but as there seems to be more folks who are expecting da.time.dt.date to work, so I'd still like to see this implemented.

The time attribute that is already implemented has the same issue that it does not exists in cftime: ```python import numpy as np import pandas as pd import xarray as xr

attrs = {"units": "hours since 3000-01-01"} ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)}) xr.decode_cf(ds).time.dt.time

AttributeError: 'CFTimeIndex' object has no attribute 'time'

`` I implemented thedateattribute in PR #4994. The usage ofdateandCFTimeIndexraises an explicit AttributeError now and mentions the usage offloor("D")`.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Date missing in datetime accessor 819897789
457963623 https://github.com/pydata/xarray/pull/2716#issuecomment-457963623 https://api.github.com/repos/pydata/xarray/issues/2716 MDEyOklzc3VlQ29tbWVudDQ1Nzk2MzYyMw== observingClouds 43613877 2019-01-27T23:16:58Z 2019-01-27T23:24:46Z CONTRIBUTOR

Sure @jhamman, I'll add some tests. However, I thought the test should rather go into test_dataarray.py than test_missing.py, because this is an improvement to resample/_upsample?

Something like ```python def test_upsample_tolerance(self): # Test tolerance keyword for upsample methods bfill, pad, nearest times = pd.date_range('2000-01-01', freq='1D', periods=2) times_upsampled = pd.date_range('2000-01-01', freq='6H', periods=5) array = DataArray(np.arange(2), [('time', times)])

    # Forward fill
    actual = array.resample(time='6H').ffill(tolerance='12H')
    expected = DataArray([0., 0., 0., np.nan, 1.],
                         [('time', times_upsampled)])
    assert_identical(expected, actual)

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: resample methods with tolerance 403462155
456239824 https://github.com/pydata/xarray/issues/2695#issuecomment-456239824 https://api.github.com/repos/pydata/xarray/issues/2695 MDEyOklzc3VlQ29tbWVudDQ1NjIzOTgyNA== observingClouds 43613877 2019-01-22T01:25:55Z 2019-01-22T01:25:55Z CONTRIBUTOR

Thanks for the clarification! I think the tolerance argument might even be superior to the limit or to say the least, the resample methods would benefit from any of these arguments.

My above mentioned changes to the code, despite mixing up limit and tolerance, actually seem to implement the tolerance argument.

```python import xarray as xr import pandas as pd import datetime as dt

dates=[dt.datetime(2018,1,1), dt.datetime(2018,1,2)] data=[10,20] df=pd.DataFrame(data,index=dates) xdf = xr.Dataset.from_dataframe(df) xdf.resample({'index':'1H'}).nearest(tolerance=dt.timedelta(hours=2)) ``` would lead to

<xarray.Dataset> Dimensions: (index: 25) Coordinates: * index (index) datetime64[ns] 2018-01-01 ... 2018-01-02 Data variables: 0 (index) float64 10.0 10.0 10.0 nan nan ... nan nan 20.0 20.0 20.0

Would that be helpful to include and write a pull request?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample with limit/tolerance 401392318

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.07ms · About: xarray-datasette