home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where author_association = "NONE" and user = 3309802 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 5

  • Check for path-like objects rather than Path type, use os.fspath 4
  • Means of zarr arrays cause a memory overload in dask workers 3
  • Chunked processing across multiple raster (geoTIF) files 1
  • fix dask meta and output_dtypes error 1
  • `interp` performance with chunked dimensions 1

user 1

  • gjoseph92 · 10 ✖

author_association 1

  • NONE · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1317352980 https://github.com/pydata/xarray/issues/6799#issuecomment-1317352980 https://api.github.com/repos/pydata/xarray/issues/6799 IC_kwDOAMm_X85OhTYU gjoseph92 3309802 2022-11-16T17:00:04Z 2022-11-16T17:00:04Z NONE

The current code also has the unfortunate side-effect of merging all chunks too

Don't really know what I'm talking about here, but it looks to me like the current dask-interpolation routine uses blockwise. That is, it's trying to simply map a function over each chunk in the array. To get the chunks into a structure where this is correct to do, you have to first merge all the chunks along the interpolation axis.

I would have expected interpolation to use map_overlap. You'd add some padding to each chunk, map the interpolation over each chunk (without combining them), then trim off the extra. By using overlap, you don't need to combine all the chunks into one big array first, so the operation can actually be parallel.

FYI, fixing this would probably be a big deal to geospatial people—then you could do array reprojection without GDAL! Unfortunately not something I have time to work on right now, but perhaps someone else would be interested?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `interp` performance with chunked dimensions 1307112340
1165001097 https://github.com/pydata/xarray/issues/6709#issuecomment-1165001097 https://api.github.com/repos/pydata/xarray/issues/6709 IC_kwDOAMm_X85FcIGJ gjoseph92 3309802 2022-06-23T23:15:19Z 2022-06-23T23:15:19Z NONE

I took a little bit more of a look at this and I don't think root task overproduction is the (only) problem here.

I also feel like intuitively, this operation shouldn't require holding so many root tasks around at once. But the graph dask is making, or how it's ordering it, doesn't seem to work that way. We can see the ordering is pretty bad:

When we actually run it (on https://github.com/dask/distributed/pull/6614 with overproduction fixed), you can see that dask requires keeping tons of the input chunks in memory, because they're going to be needed by a future task that isn't able to run yet (because not all of its inputs have been computed):

I feel like it's possible that the order in which dask is executing the input tasks is bad? But I more thank that I haven't thought about the problem enough, and there's an obvious reason why the graph is structured like this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Means of zarr arrays cause a memory overload in dask workers 1277437106
1164690164 https://github.com/pydata/xarray/issues/6709#issuecomment-1164690164 https://api.github.com/repos/pydata/xarray/issues/6709 IC_kwDOAMm_X85Fa8L0 gjoseph92 3309802 2022-06-23T17:37:59Z 2022-06-23T17:37:59Z NONE

FYI @robin-cls I would be a bit surprised if there is anything you can do on your end to fix things here with off-the-shelf dask. What @dcherian mentioned in https://github.com/dask/distributed/issues/6360#issuecomment-1129484190 is probably the only thing that might work. Otherwise you'll need to run one my experimental branches.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Means of zarr arrays cause a memory overload in dask workers 1277437106
1164660225 https://github.com/pydata/xarray/issues/6709#issuecomment-1164660225 https://api.github.com/repos/pydata/xarray/issues/6709 IC_kwDOAMm_X85Fa04B gjoseph92 3309802 2022-06-23T17:05:12Z 2022-06-23T17:05:12Z NONE

Thanks @dcherian, yeah this is definitely root task overproduction. I think your case is somewhat similar to @TomNicholas's https://github.com/dask/distributed/issues/6571 (that one might even be a little simpler actually).

There's some prototyping going on to address this, but I'd say "soon" is probably on the couple month timescale right now FYI.

https://github.com/dask/distributed/pull/6598 or https://github.com/dask/distributed/pull/6614 will probably make this work. I'm hopefully going to benchmark these against some real workloads in the next couple days, so I'll probably add yours in. Thanks for the MVCE!

Is my understanding of distributed mean wrong ? Why are the random-sample not flushed?

See https://github.com/dask/distributed/issues/6360#issuecomment-1129434333 and the linked issues for why this happens.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Means of zarr arrays cause a memory overload in dask workers 1277437106
1085150420 https://github.com/pydata/xarray/pull/5879#issuecomment-1085150420 https://api.github.com/repos/pydata/xarray/issues/5879 IC_kwDOAMm_X85ArhTU gjoseph92 3309802 2022-03-31T21:41:32Z 2022-03-31T21:41:32Z NONE

Yeah, I guess I expected OpenFile to, well, act like an open file. So maybe this is more of an fsspec interface issue?

I'll open a separate issue for improving the UX of this in xarray though. I think this would be rather confusing for new users.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Check for path-like objects rather than Path type, use os.fspath 1031275532
1085125053 https://github.com/pydata/xarray/issues/2314#issuecomment-1085125053 https://api.github.com/repos/pydata/xarray/issues/2314 IC_kwDOAMm_X85ArbG9 gjoseph92 3309802 2022-03-31T21:15:59Z 2022-03-31T21:15:59Z NONE

Just noticed this issue; people needing to do this sort of thing might want to look at stackstac (especially playing with the chunks= parameter) or odc-stac for loading the data. The graph will be cleaner than what you'd get from xr.concat([xr.open_rasterio(...) for ...]).

still appears to "over-eagerly" load more than just what is being worked on

FYI, this is basically expected behavior for distributed, see: * https://github.com/dask/distributed/issues/5223 * https://github.com/dask/distributed/issues/5555

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Chunked processing across multiple raster (geoTIF) files 344621749
1085077801 https://github.com/pydata/xarray/pull/5879#issuecomment-1085077801 https://api.github.com/repos/pydata/xarray/issues/5879 IC_kwDOAMm_X85ArPkp gjoseph92 3309802 2022-03-31T20:34:51Z 2022-03-31T20:34:51Z NONE

"s3://noaa-nwm-retrospective-2-1-zarr-pds/lakeout.zarr" is a directory, right? You cannot open that as a file

Yeah correct. I oversimplified this from the problem I actually cared about, since of course zarr is not a single file that can be fsspec.open'd in the first place, and the zarr engine is doing some magic there when passed the plain string.

Here's a more illustrative example:

```python In [1]: import xarray as xr

In [2]: import fsspec

In [3]: import os

In [4]: url = "s3://noaa-nwm-retrospective-2-1-pds/model_output/1979/197902010100.CHRTOUT_DOMAIN1.comp" # a netCDF file in s3

In [5]: f = fsspec.open(url)

In [6]: f Out[6]: <OpenFile 'noaa-nwm-retrospective-2-1-pds/model_output/1979/197902010100.CHRTOUT_DOMAIN1.comp'>

In [7]: isinstance(f, os.PathLike) Out[7]: True

In [8]: s3f = f.open()

In [9]: s3f Out[9]: <File-like object S3FileSystem, noaa-nwm-retrospective-2-1-pds/model_output/1979/197902010100.CHRTOUT_DOMAIN1.comp>

In [10]: isinstance(s3f, os.PathLike) Out[10]: False

In [11]: ds = xr.open_dataset(s3f, engine='h5netcdf')

In [12]: ds Out[12]: <xarray.Dataset> Dimensions: (time: 1, reference_time: 1, feature_id: 2776738) Coordinates: * time (time) datetime64[ns] 1979-02-01T01:00:00 * reference_time (reference_time) datetime64[ns] 1979-02-01 * feature_id (feature_id) int32 101 179 181 ... 1180001803 1180001804 latitude (feature_id) float32 ... longitude (feature_id) float32 ... Data variables: crs |S1 ... order (feature_id) int32 ... elevation (feature_id) float32 ... streamflow (feature_id) float64 ... q_lateral (feature_id) float64 ... velocity (feature_id) float64 ... qSfcLatRunoff (feature_id) float64 ... qBucket (feature_id) float64 ... qBtmVertRunoff (feature_id) float64 ... Attributes: (12/18) TITLE: OUTPUT FROM WRF-Hydro v5.2.0-beta2 featureType: timeSeries proj4: +proj=lcc +units=m +a=6370000.0 +b=6370000.0 ... model_initialization_time: 1979-02-01_00:00:00 station_dimension: feature_id model_output_valid_time: 1979-02-01_01:00:00 ... ... model_configuration: retrospective dev_OVRTSWCRT: 1 dev_NOAH_TIMESTEP: 3600 dev_channel_only: 0 dev_channelBucket_only: 0 dev: dev_ prefix indicates development/internal me...

In [13]: ds = xr.open_dataset(f, engine='h5netcdf')

AttributeError Traceback (most recent call last) <ipython-input-13-de834ca911b4> in <module> ----> 1 ds = xr.open_dataset(f, engine='h5netcdf')

~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, args, *kwargs) 493 494 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 495 backend_ds = backend.open_dataset( 496 filename_or_obj, 497 drop_variables=drop_variables,

~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, format, group, lock, invalid_netcdf, phony_dims, decode_vlen_strings) 384 ): 385 --> 386 filename_or_obj = _normalize_path(filename_or_obj) 387 store = H5NetCDFStore.open( 388 filename_or_obj,

~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/common.py in _normalize_path(path) 21 def _normalize_path(path): 22 if isinstance(path, os.PathLike): ---> 23 path = os.fspath(path) 24 25 if isinstance(path, str) and not is_remote_uri(path):

~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in fspath(self) 96 def fspath(self): 97 # may raise if cannot be resolved to local file ---> 98 return self.open().fspath() 99 100 def enter(self):

AttributeError: 'S3File' object has no attribute 'fspath' ```

Because the plain fsspec.OpenFile object has an __fspath__ attribute (but calling it raises an error), it causes xarray.backends.common._normalize_path to fail.

Because the s3fs.S3File object does not have an __fspath__ attribute, normalize_path doesn't try to call os.fspath on it, so the file-like object is able to be passed all the way down into h5netcdf, which is able to handle it.

Note though that if I downgrade xarray to 0.19.0 (last version before this PR was merged), I still can't use the plain `fssspec.OpenFile` object successfully. It's not xarray's fault anymore—it gets passed all the way into h5netcdf—but h5netcdf also tries to call `fspath` on the `OpenFile`, which fails in the same way. ```python In [1]: import xarray as xr In [2]: import fsspec In [3]: xr.__version__ Out[3]: '0.19.0' In [4]: url = "s3://noaa-nwm-retrospective-2-1-pds/model_output/1979/197902010100.CHRTOUT_DOMAIN1.comp" # a netCDF file in s3 In [5]: f = fsspec.open(url) In [6]: xr.open_dataset(f.open(), engine="h5netcdf") Out[6]: <xarray.Dataset> Dimensions: (time: 1, reference_time: 1, feature_id: 2776738) Coordinates: * time (time) datetime64[ns] 1979-02-01T01:00:00 * reference_time (reference_time) datetime64[ns] 1979-02-01 * feature_id (feature_id) int32 101 179 181 ... 1180001803 1180001804 latitude (feature_id) float32 ... longitude (feature_id) float32 ... Data variables: crs |S1 ... order (feature_id) int32 ... elevation (feature_id) float32 ... streamflow (feature_id) float64 ... q_lateral (feature_id) float64 ... velocity (feature_id) float64 ... qSfcLatRunoff (feature_id) float64 ... qBucket (feature_id) float64 ... qBtmVertRunoff (feature_id) float64 ... Attributes: (12/18) TITLE: OUTPUT FROM WRF-Hydro v5.2.0-beta2 featureType: timeSeries proj4: +proj=lcc +units=m +a=6370000.0 +b=6370000.0 ... model_initialization_time: 1979-02-01_00:00:00 station_dimension: feature_id model_output_valid_time: 1979-02-01_01:00:00 ... ... model_configuration: retrospective dev_OVRTSWCRT: 1 dev_NOAH_TIMESTEP: 3600 dev_channel_only: 0 dev_channelBucket_only: 0 dev: dev_ prefix indicates development/internal me... In [7]: xr.open_dataset(f, engine="h5netcdf") --------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock) 198 try: --> 199 file = self._cache[self._key] 200 except KeyError: ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/lru_cache.py in __getitem__(self, key) 52 with self._lock: ---> 53 value = self._cache[key] 54 self._cache.move_to_end(key) KeyError: [<class 'h5netcdf.core.File'>, (<OpenFile 'noaa-nwm-retrospective-2-1-pds/model_output/1979/197902010100.CHRTOUT_DOMAIN1.comp'>,), 'r', (('decode_vlen_strings', True), ('invalid_netcdf', None))] During handling of the above exception, another exception occurred: AttributeError Traceback (most recent call last) <ipython-input-7-e6098b8ab402> in <module> ----> 1 xr.open_dataset(f, engine="h5netcdf") ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs) 495 496 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 497 backend_ds = backend.open_dataset( 498 filename_or_obj, 499 drop_variables=drop_variables, ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, format, group, lock, invalid_netcdf, phony_dims, decode_vlen_strings) 372 373 filename_or_obj = _normalize_path(filename_or_obj) --> 374 store = H5NetCDFStore.open( 375 filename_or_obj, 376 format=format, ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in open(cls, filename, mode, format, group, lock, autoclose, invalid_netcdf, phony_dims, decode_vlen_strings) 176 177 manager = CachingFileManager(h5netcdf.File, filename, mode=mode, kwargs=kwargs) --> 178 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose) 179 180 def _acquire(self, needs_lock=True): ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in __init__(self, manager, group, mode, lock, autoclose) 121 # todo: utilizing find_root_and_group seems a bit clunky 122 # making filename available on h5netcdf.Group seems better --> 123 self._filename = find_root_and_group(self.ds)[0].filename 124 self.is_remote = is_remote_uri(self._filename) 125 self.lock = ensure_lock(lock) ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in ds(self) 187 @property 188 def ds(self): --> 189 return self._acquire() 190 191 def open_store_variable(self, name, var): ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/h5netcdf_.py in _acquire(self, needs_lock) 179 180 def _acquire(self, needs_lock=True): --> 181 with self._manager.acquire_context(needs_lock) as root: 182 ds = _nc4_require_group( 183 root, self._group, self._mode, create_group=_h5netcdf_create_group ~/.pyenv/versions/3.9.1/lib/python3.9/contextlib.py in __enter__(self) 115 del self.args, self.kwds, self.func 116 try: --> 117 return next(self.gen) 118 except StopIteration: 119 raise RuntimeError("generator didn't yield") from None ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/file_manager.py in acquire_context(self, needs_lock) 185 def acquire_context(self, needs_lock=True): 186 """Context manager for acquiring a file.""" --> 187 file, cached = self._acquire_with_cache_info(needs_lock) 188 try: 189 yield file ~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock) 203 kwargs = kwargs.copy() 204 kwargs["mode"] = self._mode --> 205 file = self._opener(*self._args, **kwargs) 206 if self._mode == "w": 207 # ensure file doesn't get overriden when opened again ~/dev/dask-playground/env/lib/python3.9/site-packages/h5netcdf/core.py in __init__(self, path, mode, invalid_netcdf, phony_dims, **kwargs) 978 self._preexisting_file = mode in {"r", "r+", "a"} 979 self._h5py = h5py --> 980 self._h5file = self._h5py.File( 981 path, mode, track_order=track_order, **kwargs 982 ) ~/dev/dask-playground/env/lib/python3.9/site-packages/h5py/_hl/files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, fs_strategy, fs_persist, fs_threshold, fs_page_size, page_buf_size, min_meta_keep, min_raw_keep, locking, **kwds) 484 name = repr(name).encode('ASCII', 'replace') 485 else: --> 486 name = filename_encode(name) 487 488 if track_order is None: ~/dev/dask-playground/env/lib/python3.9/site-packages/h5py/_hl/compat.py in filename_encode(filename) 17 filenames in h5py for more information. 18 """ ---> 19 filename = fspath(filename) 20 if sys.platform == "win32": 21 if isinstance(filename, str): ~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in __fspath__(self) 96 def __fspath__(self): 97 # may raise if cannot be resolved to local file ---> 98 return self.open().__fspath__() 99 100 def __enter__(self): AttributeError: 'S3File' object has no attribute '__fspath__' ``` The problem is that `OpenFile` doesn't have a `read` or `seek` method, so h5py doesn't think it's a proper file-like object and tries to `fspath` it here: https://github.com/h5py/h5py/blob/master/h5py/_hl/files.py#L509

So I may just be misunderstanding what an fsspec.OpenFile object is supposed to be (it's not actually a file-like object until you .open() it?). But I expect users would be similarly confused by this distinction.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Check for path-like objects rather than Path type, use os.fspath 1031275532
1085030197 https://github.com/pydata/xarray/pull/5879#issuecomment-1085030197 https://api.github.com/repos/pydata/xarray/issues/5879 IC_kwDOAMm_X85ArD81 gjoseph92 3309802 2022-03-31T19:46:40Z 2022-03-31T19:50:07Z NONE

@martindurant exactly, os.PathLike just uses duck-typing, which fsspec matches.

This generally means you can't pass s3fs/gcsfs files into xr.open_dataset (from what I've tried so far). (I don't know if you actually should be able to do this, but regardless, the error would be very confusing to a new user.)

```python In [32]: xr.open_dataset("s3://noaa-nwm-retrospective-2-1-zarr-pds/lakeout.zarr", engine="zarr") Out[32]: <xarray.Dataset> Dimensions: (feature_id: 5783, time: 367439) Coordinates: * feature_id (feature_id) int32 491 531 747 ... 947070204 1021092845 latitude (feature_id) float32 ... longitude (feature_id) float32 ... * time (time) datetime64[ns] 1979-02-01T01:00:00 ... 2020-12-31T... Data variables: crs |S1 ... inflow (time, feature_id) float64 ... outflow (time, feature_id) float64 ... water_sfc_elev (time, feature_id) float32 ... Attributes: Conventions: CF-1.6 TITLE: OUTPUT FROM WRF-Hydro v5.2.0-beta2 code_version: v5.2.0-beta2 featureType: timeSeries model_configuration: retrospective model_output_type: reservoir proj4: +proj=lcc +units=m +a=6370000.0 +b=6370000.... reservoir_assimilated_value: Assimilation not performed reservoir_type: 1 = level pool everywhere station_dimension: lake_id

In [33]: xr.open_dataset(fsspec.open("s3://noaa-nwm-retrospective-2-1-zarr-pds/lakeout.zarr"), engine="zarr")

KeyError Traceback (most recent call last) <ipython-input-33-76e10d75e2c2> in <module> ----> 1 xr.open_dataset(fsspec.open("s3://noaa-nwm-retrospective-2-1-zarr-pds/lakeout.zarr"), engine="zarr")

~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, args, *kwargs) 493 494 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 495 backend_ds = backend.open_dataset( 496 filename_or_obj, 497 drop_variables=drop_variables,

~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/zarr.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel) 797 ): 798 --> 799 filename_or_obj = _normalize_path(filename_or_obj) 800 store = ZarrStore.open_group( 801 filename_or_obj,

~/dev/dask-playground/env/lib/python3.9/site-packages/xarray/backends/common.py in _normalize_path(path) 21 def _normalize_path(path): 22 if isinstance(path, os.PathLike): ---> 23 path = os.fspath(path) 24 25 if isinstance(path, str) and not is_remote_uri(path):

~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in fspath(self) 96 def fspath(self): 97 # may raise if cannot be resolved to local file ---> 98 return self.open().fspath() 99 100 def enter(self):

~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in open(self) 138 been deleted; but a with-context is better style. 139 """ --> 140 out = self.enter() 141 closer = out.close 142 fobjects = self.fobjects.copy()[:-1]

~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/core.py in enter(self) 101 mode = self.mode.replace("t", "").replace("b", "") + "b" 102 --> 103 f = self.fs.open(self.path, mode=mode) 104 105 self.fobjects = [f]

~/dev/dask-playground/env/lib/python3.9/site-packages/fsspec/spec.py in open(self, path, mode, block_size, cache_options, compression, **kwargs) 1007 else: 1008 ac = kwargs.pop("autocommit", not self._intrans) -> 1009 f = self._open( 1010 path, 1011 mode=mode,

~/dev/dask-playground/env/lib/python3.9/site-packages/s3fs/core.py in _open(self, path, mode, block_size, acl, version_id, fill_cache, cache_type, autocommit, requester_pays, **kwargs) 532 cache_type = self.default_cache_type 533 --> 534 return S3File( 535 self, 536 path,

~/dev/dask-playground/env/lib/python3.9/site-packages/s3fs/core.py in init(self, s3, path, mode, block_size, acl, version_id, fill_cache, s3_additional_kwargs, autocommit, cache_type, requester_pays) 1824 1825 if "r" in mode: -> 1826 self.req_kw["IfMatch"] = self.details["ETag"] 1827 1828 def _call_s3(self, method, kwarglist, *kwargs):

KeyError: 'ETag' ```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Check for path-like objects rather than Path type, use os.fspath 1031275532
1085012805 https://github.com/pydata/xarray/pull/5879#issuecomment-1085012805 https://api.github.com/repos/pydata/xarray/issues/5879 IC_kwDOAMm_X85Aq_tF gjoseph92 3309802 2022-03-31T19:25:28Z 2022-03-31T19:25:28Z NONE

Note that isinstance(fsspec.OpenFile(...), os.PathLike) due to the magic of ABCs. Are we sure that we want to be calling os.fspath on fsspec files? In many cases (like an S3File, GCSFile, etc.) this will fail with a confusing error like 'S3File' object has no attribute '__fspath__'.

cc @martindurant

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Check for path-like objects rather than Path type, use os.fspath 1031275532
856285749 https://github.com/pydata/xarray/pull/5449#issuecomment-856285749 https://api.github.com/repos/pydata/xarray/issues/5449 MDEyOklzc3VlQ29tbWVudDg1NjI4NTc0OQ== gjoseph92 3309802 2021-06-07T21:45:30Z 2021-06-07T21:45:30Z NONE

@mathause sorry for breaking things here. Note that passing output_dtypes didn't work as it was supposed to before, and also didn't cause a cast. We went back and forth on whether output_types should cause explicit casting, and whether it was sensible to provide both it and meta. Ultimately we decided they should be mutually exclusive, and should not cause casting, but without much knowledge of how downstream libraries were using these arguments. So maybe we should revisit that choice in dask?

Also I think maybe this test should be changed rather than skipped. Saying output_dtypes=[int] and then assert float == actual.dtype just seems weird to me. Perhaps removing one of output_dtypes or meta from the test would be the best solution.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix dask meta and output_dtypes error 913830070

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.733ms · About: xarray-datasette