home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where author_association = "MEMBER", issue = 296561316 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer · 11 ✖

issue 1

  • Added PNC backend to xarray · 11 ✖

author_association 1

  • MEMBER · 11 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
393753739 https://github.com/pydata/xarray/pull/1905#issuecomment-393753739 https://api.github.com/repos/pydata/xarray/issues/1905 MDEyOklzc3VlQ29tbWVudDM5Mzc1MzczOQ== shoyer 1217238 2018-06-01T04:21:54Z 2018-06-01T04:21:54Z MEMBER

thanks for sticking with this @barronh !

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added PNC backend to xarray  296561316
386817035 https://github.com/pydata/xarray/pull/1905#issuecomment-386817035 https://api.github.com/repos/pydata/xarray/issues/1905 MDEyOklzc3VlQ29tbWVudDM4NjgxNzAzNQ== shoyer 1217238 2018-05-05T16:22:32Z 2018-05-05T16:36:14Z MEMBER

@barronh clarifying question for you: does PNC support some sort of "lazy loading" of data, where it is only loaded into NumPy arrays when accessed? Or does it eagerly load data into NumPy arrays? (sorry if you already answered this somewhere above!)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added PNC backend to xarray  296561316
385129301 https://github.com/pydata/xarray/pull/1905#issuecomment-385129301 https://api.github.com/repos/pydata/xarray/issues/1905 MDEyOklzc3VlQ29tbWVudDM4NTEyOTMwMQ== shoyer 1217238 2018-04-28T01:24:22Z 2018-04-28T01:24:22Z MEMBER

I tried disabling mask and scale, but many other tests fail. At its root this is because I am implicitly supporting netCDF4 and other formats.

I also tried this and was surprised to see many other tests fail.

I see two ways to solve this. Right now, it is only important to add non-netcdf support to xarray via PseudoNetCDF. I am currently allowing dynamic identification of the file format, which implicitly supports netCDF. I could disable implicit format support, and require the format keyword. In that case, PseudoNetCDF tests no longer should be CFEncodedDataTest. Instead, I can simply test some round tripping with the other formats (uamiv and possibly adding one or two other formats).

This sounds like a good solution to me. I'll leave it up to your judgment which other tests (if any) are worth adding.

I trust that none of the other formats PNC supports use _FillValue, add_offset or scale_factor attributes?

If it is possible to detect the inferred file format from PNC, then another option (other than requiring the explicit format argument) would be to load the data and raise an error if the detected file format is netCDF.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added PNC backend to xarray  296561316
382940176 https://github.com/pydata/xarray/pull/1905#issuecomment-382940176 https://api.github.com/repos/pydata/xarray/issues/1905 MDEyOklzc3VlQ29tbWVudDM4Mjk0MDE3Ng== shoyer 1217238 2018-04-20T01:53:12Z 2018-04-20T01:53:12Z MEMBER

@barronh pleas see my comment above: https://github.com/pydata/xarray/pull/1905#issuecomment-381467470

I would be OK not applying masking/scaling at all. But applying masking/scaling twice by default seems problematic and hard to debug. If there's any chance someone will use psuedonetcdf to access a dataset with these attributes, we need to fix this behavior.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added PNC backend to xarray  296561316
381467470 https://github.com/pydata/xarray/pull/1905#issuecomment-381467470 https://api.github.com/repos/pydata/xarray/issues/1905 MDEyOklzc3VlQ29tbWVudDM4MTQ2NzQ3MA== shoyer 1217238 2018-04-16T03:05:42Z 2018-04-16T03:05:42Z MEMBER

OK, I pushed a couple of small changes to your branch. Generally this is looking pretty good. I have a couple of other minor comments that I will post inline.

Your issues with the skipped tests (which I switched to xfail) are part of a large issue in xarray, which is that we don't have good ways to handle backend specific decoding (https://github.com/pydata/xarray/issues/2061).

I agree that we can probably put this most of them for now (especially the string encoding), but I'm somewhat concerned about how decoding could end up with reading incorrect data values, e.g., if a source using scale/offset encoding. Consider this failed test: ``` def test_roundtrip_mask_and_scale(self): decoded = create_masked_and_scaled_data() encoded = create_encoded_masked_and_scaled_data() with self.roundtrip(decoded) as actual:

      assert_allclose(decoded, actual, decode_bytes=False)

E AssertionError: [ nan nan 10. 10.1 10.2] E [ nan nan 11. 11.01 11.02] ```

These sort of bugs can be pretty insidious, so if there's any chance that someone would use PNC to read a netCDF file with this sort of encoding we should try to fix this before merging this in.

One simple approach would be to raise an error for now if mask_and_scale=True in open_dataset(), that is, to force the user to explicitly disable masking and scaling with xr.open_dataset(filename, engine='pseudonetcdf', mask_and_scale=False).

Alternatively, I suppose we could switch the default value to mask_and_scale=None, and pick True or False based on the choice of backend.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added PNC backend to xarray  296561316
377119095 https://github.com/pydata/xarray/pull/1905#issuecomment-377119095 https://api.github.com/repos/pydata/xarray/issues/1905 MDEyOklzc3VlQ29tbWVudDM3NzExOTA5NQ== shoyer 1217238 2018-03-29T04:38:47Z 2018-03-29T04:38:47Z MEMBER

If you have a pseudonetcdf up on pypi, we can install it for our tests with pip instead of conda. That should work for now until the conda recipe is merged. You could simply add it to the end of the "pip" section in this file: https://github.com/pydata/xarray/blob/master/ci/requirements-py36.yml

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added PNC backend to xarray  296561316
376992451 https://github.com/pydata/xarray/pull/1905#issuecomment-376992451 https://api.github.com/repos/pydata/xarray/issues/1905 MDEyOklzc3VlQ29tbWVudDM3Njk5MjQ1MQ== shoyer 1217238 2018-03-28T18:39:44Z 2018-03-28T18:39:44Z MEMBER

The tests are fixed on master, please merge in master to fix this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added PNC backend to xarray  296561316
376678706 https://github.com/pydata/xarray/pull/1905#issuecomment-376678706 https://api.github.com/repos/pydata/xarray/issues/1905 MDEyOklzc3VlQ29tbWVudDM3NjY3ODcwNg== shoyer 1217238 2018-03-27T21:17:51Z 2018-03-27T21:17:51Z MEMBER

I opened a new issue for the scipy 1.0.1 failures: https://github.com/pydata/xarray/issues/2019

(I'll try to take a look at the shortly)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added PNC backend to xarray  296561316
366586817 https://github.com/pydata/xarray/pull/1905#issuecomment-366586817 https://api.github.com/repos/pydata/xarray/issues/1905 MDEyOklzc3VlQ29tbWVudDM2NjU4NjgxNw== shoyer 1217238 2018-02-19T04:35:09Z 2018-02-19T04:35:09Z MEMBER

My variable objects present a pure numpy array, so they follow numpy indexing precisely with one exception. If the files are actually netCDF4, they have the same limitations of the netCDF4.Variable object.

OK, we will need to surface this information in some way for xarray -- maybe as an attribute of some sort on the psuedonetcdf side? The good news is that we already have handling for indexing like NumPy arrays and netCDF4 variables, but we need to know which behavior to expect to make all of indexing operations work efficiently.

It's also safe to say for now that only basic indexing is supported, but that will result in sub-optimal indexing behavior ( slower and more memory intensive than necessary).

Note that we are currently in the process of refactoring how we handle indexers in backends, so you'll probably need to update things after https://github.com/pydata/xarray/pull/1899 is merged.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added PNC backend to xarray  296561316
365455583 https://github.com/pydata/xarray/pull/1905#issuecomment-365455583 https://api.github.com/repos/pydata/xarray/issues/1905 MDEyOklzc3VlQ29tbWVudDM2NTQ1NTU4Mw== shoyer 1217238 2018-02-14T00:40:02Z 2018-02-14T00:40:02Z MEMBER

I can create binary data from within python and then read it, but all those tests are in my software package. Duplicating that seems like a bad idea.

Right.

The goal isn't to duplicate your tests, but to provide a meaningful integration test for the xarray backend. Can we read data from PNC into xarray and do everything an xarray user would want to do with it? Testing your API with a netCDF3 file would probably be enough, assuming you have good test coverage internally.

We already have a somewhat complete test suite for netCDF data that you could probably hook into, but for reference sorts of issues that tend to come up include: - Indexing support. Do you only support basic-indexing like x[0, :5] or is indexing with integer arrays also supported? - Serialization/thread-safety. Can we simultaneously read a file with another process or thread using dask? - API consistency for scalar arrays. Do these require some sort of special API compared to non-scalar arrays? - Data types support. Are strings and datetimes converted properly into the format xarray expects?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added PNC backend to xarray  296561316
365104433 https://github.com/pydata/xarray/pull/1905#issuecomment-365104433 https://api.github.com/repos/pydata/xarray/issues/1905 MDEyOklzc3VlQ29tbWVudDM2NTEwNDQzMw== shoyer 1217238 2018-02-12T23:57:35Z 2018-02-12T23:57:35Z MEMBER

First of all -- this is very cool, thanks for putting this together!

In the long term, I think we would prefer to move more specialized backends out of xarray proper. But at the current time, it's difficult to do this easily, and the backend interface itself is not entirely stable. So it probably makes sense to add this directly into xarray for now.

To merge this into xarray, I'm afraid that an automated test suite of some sort is non-negotiable. Are there example datasets can you can create on the fly from Python?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added PNC backend to xarray  296561316

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 239.811ms · About: xarray-datasette