home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where user = 2067093 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 5

  • Explicit indexes in xarray's data-model (Future of MultiIndex) 2
  • How should xarray serialize bytes/unicode strings across Python/netCDF versions? 1
  • Remote writing NETCDF4 files to Amazon S3 1
  • Keep index dimension when selecting only a single coord 1
  • requires io.IOBase subclass rather than duck file-like 1

user 1

  • NowanIlfideme · 6 ✖

author_association 1

  • NONE 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
808605422 https://github.com/pydata/xarray/issues/5070#issuecomment-808605422 https://api.github.com/repos/pydata/xarray/issues/5070 MDEyOklzc3VlQ29tbWVudDgwODYwNTQyMg== NowanIlfideme 2067093 2021-03-27T00:39:26Z 2021-03-27T00:43:35Z NONE

Just ran into this. Unsure whether checking hasattr is better than just trying to read the object and catching an error - someone could implement a non-compliant read method, which would create other errors.

As a workaround, you could read it into BytesIO and pass the BytesIO instance:

```python import fsspec import xarray as xr from io import BytesIO

of = fsspec.open("example.nc") with of as f: xr.load_dataset(BytesIO(f.read())) ```

Also, here's the link to the code referenced above.

Ideally xarray would work with fsspec or pyfilesystem2 out of the box (to parse access URLs, for example). I've had to fall back to using BytesIO buffers too many times. 😛

Edit: You don't even need BytesIO, it works even with Bytes:

```python import fsspec import xarray as xr

of = fsspec.open("example.nc") with of as f: xr.load_dataset(f.read()) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  requires io.IOBase subclass rather than duck file-like 839823306
730263703 https://github.com/pydata/xarray/issues/2059#issuecomment-730263703 https://api.github.com/repos/pydata/xarray/issues/2059 MDEyOklzc3VlQ29tbWVudDczMDI2MzcwMw== NowanIlfideme 2067093 2020-11-19T10:02:35Z 2020-11-19T10:02:35Z NONE

This may be relevant here, maybe not, but it appears the HDF5 backend is also at odds with all the above serialization.

Our internal project's dependencies changed, and that moved the h5py version from 2.10 to 3.1; apparently there was a breaking change that meant unicode strings were either encoded or decoded as bytes. Thankfully we had a test for that, but figuring out what was wrong was difficult.

Essentially, netCDF4 files that were round-tripped to a BytesIO (via an HDF5 backend) had unicode strings converted to bytes. I'm not sure whether it was the encoding or decoding part, likely decoding, judging by the docs:

https://docs.h5py.org/en/stable/strings.html https://docs.h5py.org/en/stable/whatsnew/3.0.html#breaking-changes-deprecations

This might require even more special-casing to achieve consistent behavior for xarray users who don't really want to go into backend details (like me 😋).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray serialize bytes/unicode strings across Python/netCDF versions? 314444743
657798184 https://github.com/pydata/xarray/issues/2995#issuecomment-657798184 https://api.github.com/repos/pydata/xarray/issues/2995 MDEyOklzc3VlQ29tbWVudDY1Nzc5ODE4NA== NowanIlfideme 2067093 2020-07-13T21:17:06Z 2020-07-13T21:17:06Z NONE

I ran into this issue, here's a simple workaround that seems to work:

```python def dataset_to_bytes(ds: xr.Dataset, name: str = "my-dataset") -> bytes: """Converts datset to bytes."""

nc4_ds = netCDF4.Dataset(name, mode="w", diskless=True, memory=ds.nbytes)
nc4_store = NetCDF4DataStore(nc4_ds)
dump_to_store(ds, nc4_store)
res_mem = nc4_ds.close()
res_bytes = res_mem.tobytes()
return res_bytes

```

I tested this using the following:

```python import BytesIO

fname = "REDACTED.nc" ds = xr.load_dataset(fname) ds_bytes = dataset_to_bytes(ds) ds2 = xr.load_dataset(BytesIO(ds_bytes))

assert ds2.equals(ds) and all(ds2.attrs[k]==ds.attrs[k] for k in set(ds2.attrs).union(ds.attrs)) ```

The assertion holds true, however the file size on disk is different. It's possible they were saved using different netCDF4 versions, I haven't had time to test that.

I tried using just ds.to_netcdf() but get the following error:

`ValueError: NetCDF 3 does not support type |S32`

That's because it falls back to the 'scipy' engine. Would be nice to have a non-hacky way to write netcdf4 files to byte streams. :smiley:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remote writing NETCDF4 files to Amazon S3 449706080
557579503 https://github.com/pydata/xarray/issues/1603#issuecomment-557579503 https://api.github.com/repos/pydata/xarray/issues/1603 MDEyOklzc3VlQ29tbWVudDU1NzU3OTUwMw== NowanIlfideme 2067093 2019-11-22T15:34:57Z 2019-11-22T15:34:57Z NONE

Thanks @NowanIlfideme for your feedback.

Could you perhaps share a gist of code related to your use case?

The first example in this comment is similar to my use case: https://github.com/pydata/xarray/issues/3213#issuecomment-520741706 . There are several "core" dimensions, but some part of the coordinates may be hierarchical or cross-defined (e.g. country > province > city > building, but also country > province > voting district > building). We might have a full or nearly-full panel in the MultiIndex representation, but have a huge cross product (even if we keep strictly hierarchical dimensions out).

Meanwhile using a true COO sparse representation (as I understand it) will likely end up with slower operations overall, since nearly all machine learning models (think: linear regression) require a dense array input anyways.

I'll make an example of this when I find some free time, along with a contrasting one in Pandas. :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
557563566 https://github.com/pydata/xarray/issues/1603#issuecomment-557563566 https://api.github.com/repos/pydata/xarray/issues/1603 MDEyOklzc3VlQ29tbWVudDU1NzU2MzU2Ng== NowanIlfideme 2067093 2019-11-22T14:59:29Z 2019-11-22T14:59:29Z NONE

I've noticed that basically all my current troubles with xarray lead to this issue (lack of MultiIndex support). I use xarray for machine learning/data science/econometrics. My current problem requires a semi-hierarchical indexing on one of the dimensions, and slicing/aggregation along some levels of those dimensions.

My first attempt was to just assume each dimension was orthogonal, which resulted in out-of-memory errors. I ended up using a MultiIndex for the hierarchy dimension to have a "dense" representation of a sparse subspace. Unfortunately, currently .sel() and such will cut out MultiIndex dimensions, and I've had to do boolean masking to keep all the dimensions I need.

Multidimensional groupby, especially within the MultiIndex, is a headache as it currently stands. I had to resort to making auxilliary dimensions with one-hot encoded levels (dummy variables) and doing multiply-aggregate operations by hand.

xarray is really beautiful and should be used more by data scientists, but it's really difficult to recommend it to colleagues when not all the familiar pandas-style operations are supported.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Explicit indexes in xarray's data-model (Future of MultiIndex) 262642978
557476617 https://github.com/pydata/xarray/issues/3458#issuecomment-557476617 https://api.github.com/repos/pydata/xarray/issues/3458 MDEyOklzc3VlQ29tbWVudDU1NzQ3NjYxNw== NowanIlfideme 2067093 2019-11-22T10:21:08Z 2019-11-22T10:21:08Z NONE

Note that this doesn't work on MultiIndex levels, since vectorized operations on them are not currently supported. Meanwhile, using sel(multiindex_level_name="a") drops the level from the multiindex entirely. The running theme is that this is dependent on #1603, it seems. :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Keep index dimension when selecting only a single coord 514077742

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.285ms · About: xarray-datasette