html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/1905#issuecomment-393753739,https://api.github.com/repos/pydata/xarray/issues/1905,393753739,MDEyOklzc3VlQ29tbWVudDM5Mzc1MzczOQ==,1217238,2018-06-01T04:21:54Z,2018-06-01T04:21:54Z,MEMBER,thanks for sticking with this @barronh !,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,296561316
https://github.com/pydata/xarray/pull/1905#issuecomment-386817035,https://api.github.com/repos/pydata/xarray/issues/1905,386817035,MDEyOklzc3VlQ29tbWVudDM4NjgxNzAzNQ==,1217238,2018-05-05T16:22:32Z,2018-05-05T16:36:14Z,MEMBER,"@barronh clarifying question for you: does PNC support some sort of ""lazy loading"" of data, where it is only loaded into NumPy arrays when accessed? Or does it eagerly load data into NumPy arrays? (sorry if you already answered this somewhere above!)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,296561316
https://github.com/pydata/xarray/pull/1905#issuecomment-385129301,https://api.github.com/repos/pydata/xarray/issues/1905,385129301,MDEyOklzc3VlQ29tbWVudDM4NTEyOTMwMQ==,1217238,2018-04-28T01:24:22Z,2018-04-28T01:24:22Z,MEMBER,"> I tried disabling mask and scale, but many other tests fail. At its root this is because I am implicitly supporting netCDF4 and other formats.
I also tried this and was surprised to see many other tests fail.
> I see two ways to solve this. Right now, it is only important to add non-netcdf support to xarray via PseudoNetCDF. I am currently allowing dynamic identification of the file format, which implicitly supports netCDF. I could disable implicit format support, and require the format keyword. In that case, PseudoNetCDF tests no longer should be CFEncodedDataTest. Instead, I can simply test some round tripping with the other formats (uamiv and possibly adding one or two other formats).
This sounds like a good solution to me. I'll leave it up to your judgment which other tests (if any) are worth adding.
I trust that none of the other formats PNC supports use `_FillValue`, `add_offset` or `scale_factor` attributes?
If it is possible to detect the inferred file format from PNC, then another option (other than requiring the explicit `format` argument) would be to load the data and raise an error if the detected file format is netCDF.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,296561316
https://github.com/pydata/xarray/pull/1905#issuecomment-382940176,https://api.github.com/repos/pydata/xarray/issues/1905,382940176,MDEyOklzc3VlQ29tbWVudDM4Mjk0MDE3Ng==,1217238,2018-04-20T01:53:12Z,2018-04-20T01:53:12Z,MEMBER,"@barronh pleas see my comment above: https://github.com/pydata/xarray/pull/1905#issuecomment-381467470
I would be OK not applying masking/scaling at all. But applying masking/scaling *twice* by default seems problematic and hard to debug. If there's any chance someone will use psuedonetcdf to access a dataset with these attributes, we need to fix this behavior.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,296561316
https://github.com/pydata/xarray/pull/1905#issuecomment-381467470,https://api.github.com/repos/pydata/xarray/issues/1905,381467470,MDEyOklzc3VlQ29tbWVudDM4MTQ2NzQ3MA==,1217238,2018-04-16T03:05:42Z,2018-04-16T03:05:42Z,MEMBER,"OK, I pushed a couple of small changes to your branch. Generally this is looking pretty good. I have a couple of other minor comments that I will post inline.
Your issues with the skipped tests (which I switched to xfail) are part of a large issue in xarray, which is that we don't have good ways to handle backend specific decoding (https://github.com/pydata/xarray/issues/2061).
I agree that we can probably put this most of them for now (especially the string encoding), but I'm somewhat concerned about how decoding could end up with reading incorrect data values, e.g., if a source using scale/offset encoding. Consider this failed test:
```
def test_roundtrip_mask_and_scale(self):
decoded = create_masked_and_scaled_data()
encoded = create_encoded_masked_and_scaled_data()
with self.roundtrip(decoded) as actual:
> assert_allclose(decoded, actual, decode_bytes=False)
E AssertionError: [ nan nan 10. 10.1 10.2]
E [ nan nan 11. 11.01 11.02]
```
These sort of bugs can be pretty insidious, so if there's any chance that someone would use PNC to read a netCDF file with this sort of encoding we should try to fix this before merging this in.
One simple approach would be to raise an error for now if `mask_and_scale=True` in `open_dataset()`, that is, to force the user to explicitly disable masking and scaling with `xr.open_dataset(filename, engine='pseudonetcdf', mask_and_scale=False)`.
Alternatively, I suppose we could switch the default value to `mask_and_scale=None`, and pick `True` or `False` based on the choice of backend.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,296561316
https://github.com/pydata/xarray/pull/1905#issuecomment-377119095,https://api.github.com/repos/pydata/xarray/issues/1905,377119095,MDEyOklzc3VlQ29tbWVudDM3NzExOTA5NQ==,1217238,2018-03-29T04:38:47Z,2018-03-29T04:38:47Z,MEMBER,"If you have a pseudonetcdf up on pypi, we can install it for our tests with pip instead of conda. That should work for now until the conda recipe is merged. You could simply add it to the end of the ""pip"" section in this file: https://github.com/pydata/xarray/blob/master/ci/requirements-py36.yml","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,296561316
https://github.com/pydata/xarray/pull/1905#issuecomment-376992451,https://api.github.com/repos/pydata/xarray/issues/1905,376992451,MDEyOklzc3VlQ29tbWVudDM3Njk5MjQ1MQ==,1217238,2018-03-28T18:39:44Z,2018-03-28T18:39:44Z,MEMBER,"The tests are fixed on master, please merge in master to fix this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,296561316
https://github.com/pydata/xarray/pull/1905#issuecomment-376678706,https://api.github.com/repos/pydata/xarray/issues/1905,376678706,MDEyOklzc3VlQ29tbWVudDM3NjY3ODcwNg==,1217238,2018-03-27T21:17:51Z,2018-03-27T21:17:51Z,MEMBER,"I opened a new issue for the scipy 1.0.1 failures: https://github.com/pydata/xarray/issues/2019
(I'll try to take a look at the shortly)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,296561316
https://github.com/pydata/xarray/pull/1905#issuecomment-366586817,https://api.github.com/repos/pydata/xarray/issues/1905,366586817,MDEyOklzc3VlQ29tbWVudDM2NjU4NjgxNw==,1217238,2018-02-19T04:35:09Z,2018-02-19T04:35:09Z,MEMBER,"> My variable objects present a pure numpy array, so they follow numpy indexing precisely with one exception. If the files are actually netCDF4, they have the same limitations of the netCDF4.Variable object.
OK, we will need to surface this information in some way for xarray -- maybe as an attribute of some sort on the psuedonetcdf side? The good news is that we already have handling for indexing like NumPy arrays and netCDF4 variables, but we need to know which behavior to expect to make all of indexing operations work efficiently.
It's also safe to say for now that only basic indexing is supported, but that will result in sub-optimal indexing behavior ( slower and more memory intensive than necessary).
Note that we are currently in the process of refactoring how we handle indexers in backends, so you'll probably need to update things after https://github.com/pydata/xarray/pull/1899 is merged.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,296561316
https://github.com/pydata/xarray/pull/1905#issuecomment-365455583,https://api.github.com/repos/pydata/xarray/issues/1905,365455583,MDEyOklzc3VlQ29tbWVudDM2NTQ1NTU4Mw==,1217238,2018-02-14T00:40:02Z,2018-02-14T00:40:02Z,MEMBER,"> I can create binary data from within python and then read it, but all those tests are in my software package. Duplicating that seems like a bad idea.
Right.
The goal isn't to duplicate your tests, but to provide a meaningful integration test for the xarray backend. Can we read data from PNC into xarray and do everything an xarray user would want to do with it? Testing your API with a netCDF3 file would probably be enough, assuming you have good test coverage internally.
We already have a somewhat complete test suite for netCDF data that you could probably hook into, but for reference sorts of issues that tend to come up include:
- Indexing support. Do you only support basic-indexing like `x[0, :5]` or is indexing with integer arrays also supported?
- Serialization/thread-safety. Can we simultaneously read a file with another process or thread using dask?
- API consistency for scalar arrays. Do these require some sort of special API compared to non-scalar arrays?
- Data types support. Are strings and datetimes converted properly into the format xarray expects?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,296561316
https://github.com/pydata/xarray/pull/1905#issuecomment-365104433,https://api.github.com/repos/pydata/xarray/issues/1905,365104433,MDEyOklzc3VlQ29tbWVudDM2NTEwNDQzMw==,1217238,2018-02-12T23:57:35Z,2018-02-12T23:57:35Z,MEMBER,"First of all -- this is very cool, thanks for putting this together!
In the long term, I think we would prefer to move more specialized backends out of xarray proper. But at the current time, it's difficult to do this easily, and the backend interface itself is not entirely stable. So it probably makes sense to add this directly into xarray for now.
To merge this into xarray, I'm afraid that an automated test suite of some sort is non-negotiable. Are there example datasets can you can create on the fly from Python? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,296561316