home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

51 rows where author_association = "CONTRIBUTOR" and user = 868027 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 29

  • type annotations make docs confusing 8
  • RTD build failing 5
  • BUG: modify behavior of Dataset.filter_by_attrs to match netCDF4.Data… 3
  • dataset attrs list of strings to_netcdf() error 3
  • Behavior of filter_by_attrs() does not match netCDF4.Dataset.get_variables_by_attributes 2
  • Read grid mapping and bounds as coords 2
  • Remote writing NETCDF4 files to Amazon S3 2
  • Building the docs creates temporary files 2
  • Remove dangerous default argument 2
  • Read of netCDF file fails with units attribute that is not of type string 2
  • [Bug]: reading NaT/NaN on M1 ARM chip 2
  • When decode_times fails, warn rather than failing 1
  • Optional extras to manage dependencies 1
  • enable sphinx.ext.napoleon 1
  • bump rasterio to 1.0.24 in doc building environment 1
  • reduce the size of example dataset in dask docs 1
  • decode_cf called on mfdataset throws error: 'Array' object has no attribute 'tolist' 1
  • Transform variables into coordinates and associate them with another variable 1
  • Hundreds of Sphinx errors 1
  • Save 'S1' array without the char_dim_name dimension 1
  • RTD failing yet again 1
  • Apply blackdoc to the documentation 1
  • small contrast of html view in VScode darkmode 1
  • Millisecond precision is lost on datetime64 during IO roundtrip 1
  • 📚 New theme & rearrangement of the docs 1
  • Single matplotlib import 1
  • DataArray saved from v0.19.0 is faulty when reading with v0.21.0+ 1
  • Don't cast NaN to integer 1
  • Memory leak - xr.open_dataset() not releasing memory. 1

user 1

  • DocOtak · 51 ✖

author_association 1

  • CONTRIBUTOR · 51 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1472454634 https://github.com/pydata/xarray/issues/7608#issuecomment-1472454634 https://api.github.com/repos/pydata/xarray/issues/7608 IC_kwDOAMm_X85Xw9_q DocOtak 868027 2023-03-16T17:57:28Z 2023-03-16T17:57:28Z CONTRIBUTOR

I think 1d arrays of other data types are allowed... it's just that the 1d dim value is hidden from you in attributes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dataset attrs list of strings to_netcdf() error 1619835929
1472379128 https://github.com/pydata/xarray/issues/7608#issuecomment-1472379128 https://api.github.com/repos/pydata/xarray/issues/7608 IC_kwDOAMm_X85Xwrj4 DocOtak 868027 2023-03-16T17:12:43Z 2023-03-16T17:12:43Z CONTRIBUTOR

@dcherian would the following be a good place to put this check/raise? https://github.com/pydata/xarray/blob/b36819b1ed4f74ba8e254f2baa790303ef350e4a/xarray/backends/netcdf3.py#L75-L84

scipy has a short list of allowed attr dtypes, would we want our check to be in the form of an allow list? I guess does scipy implement all that is allowed in netcdf3?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dataset attrs list of strings to_netcdf() error 1619835929
1464795339 https://github.com/pydata/xarray/issues/7608#issuecomment-1464795339 https://api.github.com/repos/pydata/xarray/issues/7608 IC_kwDOAMm_X85XTwDL DocOtak 868027 2023-03-11T02:19:32Z 2023-03-11T02:19:32Z CONTRIBUTOR

Are you able to install the netcdf4 package in your environment? If my memory serves, the scipy netCDF implementation only supports netCDF3 and array of strings in attributes are a netcdf4 feature.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dataset attrs list of strings to_netcdf() error 1619835929
1367488859 https://github.com/pydata/xarray/issues/7404#issuecomment-1367488859 https://api.github.com/repos/pydata/xarray/issues/7404 IC_kwDOAMm_X85Rgjlb DocOtak 868027 2022-12-29T17:45:33Z 2022-12-29T17:45:33Z CONTRIBUTOR

I've personally seen a lot of what looks like memory reuse in numpy and related libraries. I don't think any of this happens explicitly but have never investigated. I would have some expectation that if memory was not being released as expected, that opening and closing the dataset in a loop would increase memory usage, it didn't on the recent library versions I have.

```python Start: 89.71875 MiB Before opening file: 90.203125 MiB After opening file: 96.6875 MiB Filename: test.py

Line # Mem usage Increment Occurrences Line Contents

 6     90.2 MiB     90.2 MiB           1   @profile
 7                                         def main():
 8     90.2 MiB      0.0 MiB           1       path = 'ECMWF_ERA-40_subset.nc'
 9     90.2 MiB      0.0 MiB           1       print(f"Before opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")
10     96.7 MiB     -0.1 MiB        1001       for i in range(1000):
11     96.7 MiB      6.4 MiB        1000           with xr.open_dataset(path) as ds:
12     96.7 MiB     -0.1 MiB        1000             ...
13     96.7 MiB      0.0 MiB           1       print(f"After opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")

End: 96.6875 MiB ```

Show Versions ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.13 (default, Jul 23 2022, 17:00:57) [Clang 13.1.6 (clang-1316.0.21.2.5)] python-bits: 64 OS: Darwin OS-release: 22.1.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.0 xarray: 2022.11.0 pandas: 1.4.3 numpy: 1.23.5 scipy: None netCDF4: 1.6.0 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.3 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 56.0.0 pip: 22.0.4 conda: None pytest: 6.2.5 IPython: 8.4.0 sphinx: 5.1.1 ```
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory leak - xr.open_dataset() not releasing memory. 1512460818
1262828844 https://github.com/pydata/xarray/pull/7098#issuecomment-1262828844 https://api.github.com/repos/pydata/xarray/issues/7098 IC_kwDOAMm_X85LRT0s DocOtak 868027 2022-09-29T21:22:32Z 2022-09-29T21:22:32Z CONTRIBUTOR

@TomNicholas Something different will need to happen with that cast eventually. See #6191 for something that is failing on some systems that users have but is currently unable to be captured in the tests. Numpy has already added runtime warnings about doing this, and is "thinking about" making nan to int casts raise https://github.com/numpy/numpy/issues/14412. Xarray's own @shoyer has hit issues like this before as well https://github.com/numpy/numpy/issues/6109.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Don't cast NaN to integer 1389176083
1246041612 https://github.com/pydata/xarray/issues/7032#issuecomment-1246041612 https://api.github.com/repos/pydata/xarray/issues/7032 IC_kwDOAMm_X85KRRYM DocOtak 868027 2022-09-13T23:13:21Z 2022-09-13T23:13:21Z CONTRIBUTOR

Pickle requires that the internal details of xarray's data structures be the same between versions. The documentation about xarray's pickle support says pickle is not guaranteed to work between versions of xarray.

Why didn't netcdf work for your use case?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray saved from v0.19.0 is faulty when reading with v0.21.0+  1372053736
1209664972 https://github.com/pydata/xarray/issues/6191#issuecomment-1209664972 https://api.github.com/repos/pydata/xarray/issues/6191 IC_kwDOAMm_X85IGgXM DocOtak 868027 2022-08-09T17:30:07Z 2022-08-09T17:30:07Z CONTRIBUTOR

Some additional info for when how to figure out the best way to address this.

For the decode using pandas approach, two things I tried worked: using a pandas.array with a nullable integer data type, or simulating what happens on x86_64 systems by checking for nans in the incoming array and setting those positions to numpy.iinfo(np.int64).min.

the pandas nullable integer array: ```python

# note that is a capital i Int64 to use the nullable type.
flat_num_dates_ns_int = pd.array(flat_num_dates * _NS_PER_TIME_DELTA[delta], dtype="Int64")

simulate x86:python

flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
    np.int64
)

flat_num_dates_ns_int[np.isnan(flat_num_dates)] = np.iinfo(np.int64).min

```

The pandas solution is explicitly experimental in their docs, and the emulate version just feels "hacky" to me. These don't break any existing tests on my local machine.

cftime itself has no support for nan type missing values and will fail:

(on x86_64) ```python

import numpy as np from xarray.coding.times import decode_cf_datetime decode_cf_datetime(np.array([0, np.nan]), "days since 1950-01-01", use_cftime=True) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/abarna/.pyenv/versions/3.8.5/lib/python3.8/site-packages/xarray/coding/times.py", line 248, in decode_cf_datetime dates = _decode_datetime_with_cftime(flat_num_dates, units, calendar) File "/home/abarna/.pyenv/versions/3.8.5/lib/python3.8/site-packages/xarray/coding/times.py", line 164, in _decode_datetime_with_cftime cftime.num2date(num_dates, units, calendar, only_use_cftime_datetimes=True) File "src/cftime/_cftime.pyx", line 484, in cftime._cftime.num2date TypeError: unsupported operand type(s) for +: 'cftime._cftime.DatetimeGregorian' and 'NoneType' ```

cftime is happy with masked arrays: ```python

import cftime a1 = np.ma.masked_invalid(np.array([0, np.nan])) cftime.num2date(a1, "days since 1950-01-01") masked_array(data=[cftime.DatetimeGregorian(1950, 1, 1, 0, 0, 0, 0), --], mask=[False, True], fill_value='?', dtype=object) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [Bug]: reading NaT/NaN on M1 ARM chip 1114351614
1209567966 https://github.com/pydata/xarray/issues/6191#issuecomment-1209567966 https://api.github.com/repos/pydata/xarray/issues/6191 IC_kwDOAMm_X85IGIre DocOtak 868027 2022-08-09T15:52:31Z 2022-08-09T15:52:31Z CONTRIBUTOR

I got caught by this one yesterday on an M1 machine. I did some digging and found what I think to be the underlying issue. The short explanation is that the time conversion functions do an astype(np.int64) or equivalent cast on arrays that contain nans. This is undefined behavior and very soon, doing this will start to emit RuntimeWarnings.

I knew from my own data files that it wasn't the first element of the array being substituted but whatever was in the units as the epoch. I started to poke at the xarray internals (and the CFtime internals) to try to get a minimal example working, eventually found the following:

On an M1: ```python

from xarray.coding.times import _decode_datetime_with_pandas import numpy as np _decode_datetime_with_pandas(np.array([20000, float('nan')]), "days since 1950-01-01", "proleptic_gregorian") array(['2004-10-04T00:00:00.000000000', '1950-01-01T00:00:00.000000000'], dtype='datetime64[ns]') np.array(np.nan).astype(np.int64) array(0) ```

On an x86_64: ```python

from xarray.coding.times import _decode_datetime_with_pandas import numpy as np _decode_datetime_with_pandas(np.array([20000, float('nan')]), "days since 1950-01-01", "proleptic_gregorian") array(['2004-10-04T00:00:00.000000000', 'NaT'], dtype='datetime64[ns]') np.array(np.nan).astype(np.int64) array(-9223372036854775808) ```

This issue is not Apple/M1/clang specific, I tested on an aws graviton (arm) instance and got the same results with ubuntu/gcc: ```python Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

from xarray.coding.times import _decode_datetime_with_pandas import numpy as np _decode_datetime_with_pandas(np.array([20000, float('nan')]), "days since 1950-01-01", "proleptic_gregorian") array(['2004-10-04T00:00:00.000000000', '1950-01-01T00:00:00.000000000'], dtype='datetime64[ns]') np.array(np.nan).astype(np.int64) array(0) ```

Here is where the cast is happening on the internal xarray implementation, CFtime has similar casts in its implementation. https://github.com/pydata/xarray/blob/8417f495e6b81a60833f86a978e5a8080a619aa0/xarray/coding/times.py#L237-L239

{
    "total_count": 4,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 2,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [Bug]: reading NaT/NaN on M1 ARM chip 1114351614
921889702 https://github.com/pydata/xarray/pull/5794#issuecomment-921889702 https://api.github.com/repos/pydata/xarray/issues/5794 IC_kwDOAMm_X8428uum DocOtak 868027 2021-09-17T15:32:04Z 2021-09-17T15:32:04Z CONTRIBUTOR

Python's import machinery has a lot of caching going on. In most cases, additional imports of a module that has been imported previously is about as expensive as a dict lookup.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Single matplotlib import 996352280
772729659 https://github.com/pydata/xarray/pull/4835#issuecomment-772729659 https://api.github.com/repos/pydata/xarray/issues/4835 MDEyOklzc3VlQ29tbWVudDc3MjcyOTY1OQ== DocOtak 868027 2021-02-03T18:38:34Z 2021-02-03T18:38:34Z CONTRIBUTOR

Also, the build step also has a -W flag which turns warnings into Errors and causes a non 0 exit status. This is probably because the read the docs config file for xarray has a fail_on_warning set to true

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  📚 New theme & rearrangement of the docs 790677360
736768644 https://github.com/pydata/xarray/issues/4634#issuecomment-736768644 https://api.github.com/repos/pydata/xarray/issues/4634 MDEyOklzc3VlQ29tbWVudDczNjc2ODY0NA== DocOtak 868027 2020-12-01T19:29:36Z 2020-12-01T19:29:36Z CONTRIBUTOR

@dcherian Often I find it a little easier to understand the Conformance Document, bullet point two says:

The type of the units attribute is a string that must be recognizable by the udunits package. Exceptions are the units level, layer, and sigma_level.

This shouldn't prevent xarray from doing something useful with non conforming files if it can.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read of netCDF file fails with units attribute that is not of type string 754413100
736609280 https://github.com/pydata/xarray/issues/4634#issuecomment-736609280 https://api.github.com/repos/pydata/xarray/issues/4634 MDEyOklzc3VlQ29tbWVudDczNjYwOTI4MA== DocOtak 868027 2020-12-01T15:02:22Z 2020-12-01T15:02:22Z CONTRIBUTOR

Keep in mind that the NetCDF user guide "strongly recommends" that units be a character string. The CF Conventions requires the units to be a character string. I think in xarray you can set decode_cf=False or decode_times=False in the various open_* methods to turn off this interpretation if you need things to work right now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read of netCDF file fails with units attribute that is not of type string 754413100
670991386 https://github.com/pydata/xarray/pull/2844#issuecomment-670991386 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MDk5MTM4Ng== DocOtak 868027 2020-08-09T01:11:09Z 2020-08-09T01:11:09Z CONTRIBUTOR

Yes, my view is that things in ancillary_variables should stay in the attrs of their variable (DataArray) and not moved to coords. Currently this PR will remove the ancillary_variables from the attrs of the variables in a file which have it. This appears to break CF defined connection between associated variables (like uncertainty and QC). While the information isn't lost, I would need to look in encoding to get it. It looks like the first reply in #4215 also didn't like putting ancillary_variables in the coords.

What would be really awesome is some sort of variable proxy I could replace the string names with actual references/pointers to the correct DataArray in the Dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
670808763 https://github.com/pydata/xarray/pull/2844#issuecomment-670808763 https://api.github.com/repos/pydata/xarray/issues/2844 MDEyOklzc3VlQ29tbWVudDY3MDgwODc2Mw== DocOtak 868027 2020-08-08T02:12:08Z 2020-08-08T02:12:08Z CONTRIBUTOR

I decided to try out this PR on some of the data files we are working with at my data office. In our datasets we have per variable quality flag information per variable uncertainty information. The CF way of tying all these together is via the ancillary_variables attribute. This PR pulls all these out into the Dataset coordinates. Since in the xarray data model (right now) the coordinates apply to an entire dataset, this feels inappropriate and maybe even breaking. The ancillary_variables attribute is not used in CF grid mapping or bounds as far as I can tell.

Here is an example using this PR (note that all the varN type names will be replaced with better variable names before we publish these):

```python In [1]: import xarray as xr

In [2]: ds = xr.open_dataset("examples/converted/06AQ19840719.nc")

In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (N_LEVELS: 24, N_PROF: 38) Coordinates: var1_qc (N_PROF, N_LEVELS) float32 ... var4_qc (N_PROF, N_LEVELS) float32 ... var5_qc (N_PROF, N_LEVELS) float32 ... var6_qc (N_PROF, N_LEVELS) float32 ... var7_qc (N_PROF, N_LEVELS) float32 ... var8_qc (N_PROF, N_LEVELS) float32 ... var9_qc (N_PROF, N_LEVELS) float32 ... var10_qc (N_PROF, N_LEVELS) float32 ... var11_qc (N_PROF, N_LEVELS) float32 ... var12_qc (N_PROF, N_LEVELS) float32 ... var13_qc (N_PROF, N_LEVELS) float32 ... var14_qc (N_PROF, N_LEVELS) float32 ... var15_qc (N_PROF, N_LEVELS) float32 ... pressure (N_PROF, N_LEVELS) float64 ... latitude (N_PROF) float64 ... longitude (N_PROF) float64 ... time (N_PROF) datetime64[ns] ... expocode (N_PROF) object ... station (N_PROF) object ... cast (N_PROF) int8 ... sample (N_PROF, N_LEVELS) object ... Dimensions without coordinates: N_LEVELS, N_PROF Data variables: var0 (N_PROF) object ... var1 (N_PROF, N_LEVELS) object ... var2 (N_PROF) float32 ... var3 (N_PROF, N_LEVELS) float32 ... var4 (N_PROF, N_LEVELS) float32 ... var5 (N_PROF, N_LEVELS) float32 ... var6 (N_PROF, N_LEVELS) float32 ... var7 (N_PROF, N_LEVELS) float32 ... var8 (N_PROF, N_LEVELS) float32 ... var9 (N_PROF, N_LEVELS) float32 ... var10 (N_PROF, N_LEVELS) float32 ... var11 (N_PROF, N_LEVELS) float32 ... var12 (N_PROF, N_LEVELS) float32 ... var13 (N_PROF, N_LEVELS) float32 ... var14 (N_PROF, N_LEVELS) float32 ... var15 (N_PROF, N_LEVELS) float32 ... Attributes: Conventions: CF-1.8 CCHDO-0.1.dev157+g52933e0.d20200707 ```

This looks especially confusing when you ask for one specific variable:

python In [15]: ds.var6 Out[15]: <xarray.DataArray 'var6' (N_PROF: 38, N_LEVELS: 24)> array([[33.3965, 33.5742, 34.8769, ..., 34.9858, 34.9852, nan], [33.1129, 34.0742, 34.6595, ..., nan, nan, nan], [32.5328, 33.2687, 34.2262, ..., nan, nan, nan], ..., [35.0686, 35.09 , 35.1415, ..., nan, nan, nan], [35.0303, 35.0295, 35.0715, ..., nan, nan, nan], [35.0682, 35.0756, 35.0622, ..., nan, nan, nan]], dtype=float32) Coordinates: var1_qc (N_PROF, N_LEVELS) float32 ... var4_qc (N_PROF, N_LEVELS) float32 ... var5_qc (N_PROF, N_LEVELS) float32 ... var6_qc (N_PROF, N_LEVELS) float32 ... var7_qc (N_PROF, N_LEVELS) float32 ... var8_qc (N_PROF, N_LEVELS) float32 ... var9_qc (N_PROF, N_LEVELS) float32 ... var10_qc (N_PROF, N_LEVELS) float32 ... var11_qc (N_PROF, N_LEVELS) float32 ... var12_qc (N_PROF, N_LEVELS) float32 ... var13_qc (N_PROF, N_LEVELS) float32 ... var14_qc (N_PROF, N_LEVELS) float32 ... var15_qc (N_PROF, N_LEVELS) float32 ... pressure (N_PROF, N_LEVELS) float64 ... latitude (N_PROF) float64 ... longitude (N_PROF) float64 ... time (N_PROF) datetime64[ns] ... expocode (N_PROF) object ... station (N_PROF) object ... cast (N_PROF) int8 ... sample (N_PROF, N_LEVELS) object ... Dimensions without coordinates: N_PROF, N_LEVELS Attributes: whp_name: CTDSAL whp_unit: PSS-78 standard_name: sea_water_practical_salinity units: 1 reference_scale: PSS-78

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read grid mapping and bounds as coords 424265093
625481974 https://github.com/pydata/xarray/issues/4045#issuecomment-625481974 https://api.github.com/repos/pydata/xarray/issues/4045 MDEyOklzc3VlQ29tbWVudDYyNTQ4MTk3NA== DocOtak 868027 2020-05-07T20:32:22Z 2020-05-07T20:32:22Z CONTRIBUTOR

This has something to do with the time values at some point being a float:

```python

import numpy as np np.datetime64("2017-02-22T16:24:10.586000000").astype("float64").astype(np.dtype('<M8[ns]')) numpy.datetime64('2017-02-22T16:24:10.585999872') ```

It looks like this is happening somewhere in the cftime library.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Millisecond precision is lost on datetime64 during IO roundtrip 614275938
624408302 https://github.com/pydata/xarray/issues/4024#issuecomment-624408302 https://api.github.com/repos/pydata/xarray/issues/4024 MDEyOklzc3VlQ29tbWVudDYyNDQwODMwMg== DocOtak 868027 2020-05-06T02:19:09Z 2020-05-06T02:19:09Z CONTRIBUTOR

VS Code will tell you if it is in "dark" "light" or "high contrast" modes https://code.visualstudio.com/api/extension-guides/webview#theming-webview-content

Looks like there is an upstream issue which might prevent getting the actual theme colors in some situations: https://github.com/microsoft/vscode-python/issues/9597

For my own stuff in VS Code, I usually disable the HTML repr in those notebooks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  small contrast of html view in VScode darkmode 611643130
620746876 https://github.com/pydata/xarray/pull/4012#issuecomment-620746876 https://api.github.com/repos/pydata/xarray/issues/4012 MDEyOklzc3VlQ29tbWVudDYyMDc0Njg3Ng== DocOtak 868027 2020-04-28T17:26:56Z 2020-04-28T17:26:56Z CONTRIBUTOR

I really like what it did to some of the long xr.Dataset() and xr.DataArray() calls. Very readable!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Apply blackdoc to the documentation 607814501
619320263 https://github.com/pydata/xarray/issues/4002#issuecomment-619320263 https://api.github.com/repos/pydata/xarray/issues/4002 MDEyOklzc3VlQ29tbWVudDYxOTMyMDI2Mw== DocOtak 868027 2020-04-25T04:48:12Z 2020-04-25T04:48:12Z CONTRIBUTOR

My concern was due to python not evaluating asserts if "optimization" is requested.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove dangerous default argument 606549469
619281828 https://github.com/pydata/xarray/issues/4002#issuecomment-619281828 https://api.github.com/repos/pydata/xarray/issues/4002 MDEyOklzc3VlQ29tbWVudDYxOTI4MTgyOA== DocOtak 868027 2020-04-24T23:40:54Z 2020-04-24T23:40:54Z CONTRIBUTOR

Slightly related, I've noticed a bunch of assert statements outside the testing paths e.g. the __init__ for xr.DataArray has 3 of them. Would that be something to fix up as well?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove dangerous default argument 606549469
594783943 https://github.com/pydata/xarray/issues/3370#issuecomment-594783943 https://api.github.com/repos/pydata/xarray/issues/3370 MDEyOklzc3VlQ29tbWVudDU5NDc4Mzk0Mw== DocOtak 868027 2020-03-04T19:38:01Z 2020-03-04T19:38:01Z CONTRIBUTOR

Every time I see activity on this... I feel like it's all my fault. Feel free to undo whatever is needed.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hundreds of Sphinx errors 502130982
590920738 https://github.com/pydata/xarray/issues/3796#issuecomment-590920738 https://api.github.com/repos/pydata/xarray/issues/3796 MDEyOklzc3VlQ29tbWVudDU5MDkyMDczOA== DocOtak 868027 2020-02-25T15:22:29Z 2020-02-25T15:22:29Z CONTRIBUTOR

The docs seem to build ok on Azure piplines, is it possible to get the built docs from that and publish somewhere? I do know this is possible with Travis, but haven't actually done it myself since my docs don't (yet?) have a memory problem.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  RTD failing yet again 570190199
555715128 https://github.com/pydata/xarray/issues/3178#issuecomment-555715128 https://api.github.com/repos/pydata/xarray/issues/3178 MDEyOklzc3VlQ29tbWVudDU1NTcxNTEyOA== DocOtak 868027 2019-11-19T21:07:50Z 2019-11-19T21:07:50Z CONTRIBUTOR

Any recollection as to if these ever worked as expected? Looks like between landing this change and doing the 0.14 release, the sphinx version bumped from 2.1.2 to 2.2.0 which included some changes to autodoc... This PR might be of interest https://github.com/sphinx-doc/sphinx/pull/6592 but it is not immediately obvious to me how/if this could have broken things.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  type annotations make docs confusing 476222321
542594876 https://github.com/pydata/xarray/issues/3407#issuecomment-542594876 https://api.github.com/repos/pydata/xarray/issues/3407 MDEyOklzc3VlQ29tbWVudDU0MjU5NDg3Ng== DocOtak 868027 2019-10-16T08:44:50Z 2019-10-16T08:44:50Z CONTRIBUTOR

Hi @zxdawn

Does this modified version of your code do what you want?: python import numpy as np import xarray as xr tstr='2019-07-25_00:00:00' Times = xr.DataArray(np.array([tstr], dtype = np.dtype(('S', 16))), dims = ['Time']) ds = xr.Dataset({'Times':Times}) ds.to_netcdf( 'test.nc', format='NETCDF4', encoding={ 'Times': { 'zlib':True, 'complevel':5, 'char_dim_name':'DateStrLen' } }, unlimited_dims={'Time':True} ) Output of ncdump: ``` netcdf test { dimensions: Time = UNLIMITED ; // (1 currently) DateStrLen = 19 ; variables: char Times(Time, DateStrLen) ; data:

Times = "2019-07-25_00:00:00" ; } ```

Some explanation of what is going on: Strings in numpy aren't the most friendly thing to work with, and the data types can be a little confusing. In your code, the "S1" data type is saying "this array has null terminated strings of length 1". That 1 in the "S1" is the string length. This resulted in you having an array of one character strings that was 19 elements long: array([[b'2', b'0', b'1', b'9', b'-', b'0', b'7', b'-', b'2', b'5', b'_', b'0', b'0', b':', b'0', b'0', b':', b'0', b'0']], dtype='|S1') vs what I think you want: array([b'2019-07-25_00:00:00'], dtype='|S19')

Since you know that your string length is going to be 19, you should tell numpy about this instead of xarray by either specifying the data type as "S19" or using the data type constructor (which I prefer): np.dtype(("S", 19))

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Save 'S1' array without the char_dim_name dimension 507658070
525332330 https://github.com/pydata/xarray/issues/3246#issuecomment-525332330 https://api.github.com/repos/pydata/xarray/issues/3246 MDEyOklzc3VlQ29tbWVudDUyNTMzMjMzMA== DocOtak 868027 2019-08-27T14:39:53Z 2019-08-27T14:39:53Z CONTRIBUTOR

Hi @gr4fitt3

Do you know if the data are already gridded somehow? if yes, some simple reshaping might be all you need. However, I suspect they are actually swaths traced out by the satellite, in which case perhaps the pyresample library might help? I've never used it myself.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Transform variables into coordinates and associate them with another variable 484243348
525325332 https://github.com/pydata/xarray/issues/3227#issuecomment-525325332 https://api.github.com/repos/pydata/xarray/issues/3227 MDEyOklzc3VlQ29tbWVudDUyNTMyNTMzMg== DocOtak 868027 2019-08-27T14:24:35Z 2019-08-27T14:24:35Z CONTRIBUTOR

Hi @gwgundersen some clarification on those "extra snippets", github is not aware of the ipython directive so it prints those out like code snippets. In the actual built docs, these don't appear (the :suppress: in that block does this).

I personally feel that the code that makes these temporary files should be responsible for cleaning it up, especially since it already tries, and they aren't build artifacts needed in other steps. I'd probably reach for the tempfile.TemporaryDirectory in the standard library and bracket the dask docs in a create, cd in, cd out, cleanup type flow. There is already a suppressed setup ipython block at the top of the dask docs too.

@max-sixty Any opinions on which option we should go for?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Building the docs creates temporary files 482023929
522589729 https://github.com/pydata/xarray/issues/3227#issuecomment-522589729 https://api.github.com/repos/pydata/xarray/issues/3227 MDEyOklzc3VlQ29tbWVudDUyMjU4OTcyOQ== DocOtak 868027 2019-08-19T14:03:24Z 2019-08-19T14:03:24Z CONTRIBUTOR

The files and directories that were not cleaned up by the make clean command are all artifacts of the code examples which run in the docs themselves. For example, the manipulated-example-data.nc is created in this section.

At leas one of these files is cleaned up at the end, see the ipython block.

I'd probably look into something like a temporary directory rather than trying to track down all the "example artifacts" created during a run. I'm not sure what sort of configuration the IPython blocks have, but there are also some tempdir utilities in IPython.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Building the docs creates temporary files 482023929
521349841 https://github.com/pydata/xarray/issues/3215#issuecomment-521349841 https://api.github.com/repos/pydata/xarray/issues/3215 MDEyOklzc3VlQ29tbWVudDUyMTM0OTg0MQ== DocOtak 868027 2019-08-14T17:51:06Z 2019-08-14T17:51:06Z CONTRIBUTOR

I think this is being thrown by dask, here is an even more minimal example: ```python

import dask as da da.array.from_array([]).tolist() Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'Array' object has no attribute 'tolist' ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  decode_cf called on mfdataset throws error: 'Array' object has no attribute 'tolist' 480512400
518716822 https://github.com/pydata/xarray/pull/3187#issuecomment-518716822 https://api.github.com/repos/pydata/xarray/issues/3187 MDEyOklzc3VlQ29tbWVudDUxODcxNjgyMg== DocOtak 868027 2019-08-06T15:20:06Z 2019-08-06T15:20:06Z CONTRIBUTOR

Sure, that seems to work as well, want a second PR or just update this one (with some forcing)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  reduce the size of example dataset in dask docs 477427854
518677703 https://github.com/pydata/xarray/issues/3182#issuecomment-518677703 https://api.github.com/repos/pydata/xarray/issues/3182 MDEyOklzc3VlQ29tbWVudDUxODY3NzcwMw== DocOtak 868027 2019-08-06T13:49:47Z 2019-08-06T13:49:47Z CONTRIBUTOR

Seems the docs are still failing to build, except this time it is being killed due to too much resource consumption.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  RTD build failing 476494705
518427365 https://github.com/pydata/xarray/pull/3186#issuecomment-518427365 https://api.github.com/repos/pydata/xarray/issues/3186 MDEyOklzc3VlQ29tbWVudDUxODQyNzM2NQ== DocOtak 868027 2019-08-05T22:37:32Z 2019-08-05T22:37:32Z CONTRIBUTOR

Examining the output of conda list in the CI build shows most of the packages are coming from conda-forge now, including rasterio. This should mean the docs will build successfully on RTD.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  bump rasterio to 1.0.24 in doc building environment 477084478
518426134 https://github.com/pydata/xarray/issues/3182#issuecomment-518426134 https://api.github.com/repos/pydata/xarray/issues/3182 MDEyOklzc3VlQ29tbWVudDUxODQyNjEzNA== DocOtak 868027 2019-08-05T22:32:12Z 2019-08-05T22:32:12Z CONTRIBUTOR

@max-sixty I made a PR which bumps rasterio.

Something else to consider is enabling channel_priority strict in the conda environment. When I had enabled that in local testing, the conda solver was unable to create the requested environment. It seemed the requested rasterio version was no longer on conda-forge (though maybe under the cf201901 label?). Though if I recall, there was also a mismatch between the requested pandas and python versions when strict was enabled.

It seems the trade off is where you want the failure to occur, either in making the environment in conda, or when some package stops working.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  RTD build failing 476494705
518381266 https://github.com/pydata/xarray/issues/3182#issuecomment-518381266 https://api.github.com/repos/pydata/xarray/issues/3182 MDEyOklzc3VlQ29tbWVudDUxODM4MTI2Ng== DocOtak 868027 2019-08-05T20:12:37Z 2019-08-05T20:12:37Z CONTRIBUTOR

So it looks more like a conda channel mixing problem to me now. Perhaps just bumping rasterio to 1.0.24?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  RTD build failing 476494705
518368420 https://github.com/pydata/xarray/issues/3182#issuecomment-518368420 https://api.github.com/repos/pydata/xarray/issues/3182 MDEyOklzc3VlQ29tbWVudDUxODM2ODQyMA== DocOtak 868027 2019-08-05T19:32:10Z 2019-08-05T19:32:10Z CONTRIBUTOR

Has anyone with access tried just wiping the env? https://docs.readthedocs.io/en/stable/guides/wipe-environment.html

Specifically, when I was testing locally my rasterio was not importing, but at some point in the past, had run successfully. I was able to fix by removing the auto_gallery dir from the docs dir.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  RTD build failing 476494705
518062884 https://github.com/pydata/xarray/issues/3182#issuecomment-518062884 https://api.github.com/repos/pydata/xarray/issues/3182 MDEyOklzc3VlQ29tbWVudDUxODA2Mjg4NA== DocOtak 868027 2019-08-05T02:25:02Z 2019-08-05T02:25:02Z CONTRIBUTOR

More information here: https://sphinx-gallery.github.io/configuration.html#don-t-fail-the-build-on-exit

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  RTD build failing 476494705
517840342 https://github.com/pydata/xarray/pull/3180#issuecomment-517840342 https://api.github.com/repos/pydata/xarray/issues/3180 MDEyOklzc3VlQ29tbWVudDUxNzg0MDM0Mg== DocOtak 868027 2019-08-02T20:50:49Z 2019-08-02T20:50:49Z CONTRIBUTOR

I was diffing the directory outputs when testing locally and nothing really breaking stood out to me... I think this is safe enough that it should be able to merge and asses the results. IMO the results were quite pretty and definitely addressed #3056 in most places.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  enable sphinx.ext.napoleon 476323960
517817952 https://github.com/pydata/xarray/issues/3178#issuecomment-517817952 https://api.github.com/repos/pydata/xarray/issues/3178 MDEyOklzc3VlQ29tbWVudDUxNzgxNzk1Mg== DocOtak 868027 2019-08-02T19:27:45Z 2019-08-02T19:27:45Z CONTRIBUTOR

See #3180 for the napoleon enabling PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  type annotations make docs confusing 476222321
517815307 https://github.com/pydata/xarray/issues/3178#issuecomment-517815307 https://api.github.com/repos/pydata/xarray/issues/3178 MDEyOklzc3VlQ29tbWVudDUxNzgxNTMwNw== DocOtak 868027 2019-08-02T19:17:59Z 2019-08-02T19:17:59Z CONTRIBUTOR

So I made a PR for just removing the type annotations, turns out it is built in to autodoc. Enabling napoleon seems to be less "clean". While it doesn't actually conflict with numpydoc it does appear to "compete" with it. It only really seemed to affect the autowrapped ufunc documentation. I'm going to do a separate "enable napoleon" PR

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  type annotations make docs confusing 476222321
517786566 https://github.com/pydata/xarray/issues/3178#issuecomment-517786566 https://api.github.com/repos/pydata/xarray/issues/3178 MDEyOklzc3VlQ29tbWVudDUxNzc4NjU2Ng== DocOtak 868027 2019-08-02T17:38:49Z 2019-08-02T17:38:49Z CONTRIBUTOR

Suspicions confirmed. I removed the type parts in the docstrings. The attached is the result which I think is way less readable:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  type annotations make docs confusing 476222321
517781349 https://github.com/pydata/xarray/issues/3178#issuecomment-517781349 https://api.github.com/repos/pydata/xarray/issues/3178 MDEyOklzc3VlQ29tbWVudDUxNzc4MTM0OQ== DocOtak 868027 2019-08-02T17:22:10Z 2019-08-02T17:22:10Z CONTRIBUTOR

So I think why it isn't putting the types anywhere in the docs is because they already exist (at least for this Dataset __init__ that we are looking at).

The relevant part of the code in the extension appears to be this https://github.com/agronholm/sphinx-autodoc-typehints/blob/master/sphinx_autodoc_typehints.py#L333:L338

It's looking for :param name: and I think things with types are already :param type name: with napoleon enabled, so it doesn't find anything to replace. Without napoleon enabled, the :param name: fields are not present since it is "raw" numpy doc style.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  type annotations make docs confusing 476222321
517772870 https://github.com/pydata/xarray/issues/3178#issuecomment-517772870 https://api.github.com/repos/pydata/xarray/issues/3178 MDEyOklzc3VlQ29tbWVudDUxNzc3Mjg3MA== DocOtak 868027 2019-08-02T16:55:22Z 2019-08-02T16:55:22Z CONTRIBUTOR

So the plugin seems to "just works" in that it remove these data type annotation, it doesn't seem to put them anywhere. I can probably put the docs I built somewhere if you all want to look at them. Here is a screen shot of the "Dataset" class, first one is just the extension, second screenshot also has the napoleon extension enabled. Main difference is how the "parameters" appear.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  type annotations make docs confusing 476222321
517742116 https://github.com/pydata/xarray/issues/3178#issuecomment-517742116 https://api.github.com/repos/pydata/xarray/issues/3178 MDEyOklzc3VlQ29tbWVudDUxNzc0MjExNg== DocOtak 868027 2019-08-02T15:23:46Z 2019-08-02T15:23:46Z CONTRIBUTOR

@dcherian sure, I'll try it right now with the xarray docs

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  type annotations make docs confusing 476222321
517734518 https://github.com/pydata/xarray/issues/3178#issuecomment-517734518 https://api.github.com/repos/pydata/xarray/issues/3178 MDEyOklzc3VlQ29tbWVudDUxNzczNDUxOA== DocOtak 868027 2019-08-02T15:03:07Z 2019-08-02T15:03:07Z CONTRIBUTOR

Perhaps the sphinx-autodoc-typehints extension?

The docs suggest it will remove the types from the method signatures and put them in the in :param: parts. I haven't used or tested myself.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  type annotations make docs confusing 476222321
497066189 https://github.com/pydata/xarray/issues/2995#issuecomment-497066189 https://api.github.com/repos/pydata/xarray/issues/2995 MDEyOklzc3VlQ29tbWVudDQ5NzA2NjE4OQ== DocOtak 868027 2019-05-29T18:56:17Z 2019-05-29T18:56:17Z CONTRIBUTOR

Thanks @rabernat I had forgotten about the other netcdf storage engines... do you know if h5netcdf stable enough that I should use in "production" outside of xarray for my netcdf4 reading/writing needs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remote writing NETCDF4 files to Amazon S3 449706080
497026828 https://github.com/pydata/xarray/issues/2995#issuecomment-497026828 https://api.github.com/repos/pydata/xarray/issues/2995 MDEyOklzc3VlQ29tbWVudDQ5NzAyNjgyOA== DocOtak 868027 2019-05-29T17:11:10Z 2019-05-29T17:12:51Z CONTRIBUTOR

Hi @Non-Descript-Individual

I've found that the netcdf4-python library really wants to have direct access to a disk/filesystem to work, it also really wants to do its own file access management. I've always attributed this to the python library being a wrapper for the netcdf C library.

My guess would be that the easiest way to do what you want is to separate the writing of the netcdf file step in xarray from the putting the file into S3. Something like this:

python x.to_netcdf('temp_file.nc') s3.upload_file('temp_file.nc', 'bucketname', 'real_name_for_temp_file.nc')

The netcdf4-python library does seem to provide an interface for the "diskless" flags. In this case, from the examples it looks to give you a bunch of bytes in a memoryview object on calling close(). I'm not sure this is accessible from xarray though.

Alternatively, @rabernat is an advocate of using zarr when putting netcdf compatible data into cloud storage, the zarr docs provide an example using s3fs

Quick edit: Here is the to_zarr docs in xarray

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remote writing NETCDF4 files to Amazon S3 449706080
486024154 https://github.com/pydata/xarray/issues/2888#issuecomment-486024154 https://api.github.com/repos/pydata/xarray/issues/2888 MDEyOklzc3VlQ29tbWVudDQ4NjAyNDE1NA== DocOtak 868027 2019-04-24T00:41:49Z 2019-04-24T00:41:49Z CONTRIBUTOR

Some of my workflows involve the manual creation and destruction of virtualenvs. On occasion, I've found myself wanting a pip install xarray[complete] much in the same way dask will do. The difference between dask and xarray, however, is that the "complete" submodules are part of dask and not optional external third party dependencies.

Alternatively, it might be nice to be able to query xarray for what its current serialization capabilities are.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Optional extras to manage dependencies 432058005
475914120 https://github.com/pydata/xarray/issues/2848#issuecomment-475914120 https://api.github.com/repos/pydata/xarray/issues/2848 MDEyOklzc3VlQ29tbWVudDQ3NTkxNDEyMA== DocOtak 868027 2019-03-23T23:37:41Z 2019-03-23T23:37:41Z CONTRIBUTOR

When I was looking into this real quick after it was posted to the xarray mailing list, one of the things I attempted to do was use xr.decode_cf() on a DataArray object, which seems unsupported. I also found myself wanting some of the configuration options that the pandas read_csv() method has for decoding dates, particularly the ability to say which labels I want to decode and even method hooks for implementing my own parsing function if necessary. While in entirety would be way too complicated for xarray I think, it might be nice to emulate that API a little bit.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  When decode_times fails, warn rather than failing 424545013
416423848 https://github.com/pydata/xarray/pull/2322#issuecomment-416423848 https://api.github.com/repos/pydata/xarray/issues/2322 MDEyOklzc3VlQ29tbWVudDQxNjQyMzg0OA== DocOtak 868027 2018-08-28T01:48:00Z 2018-08-28T01:48:00Z CONTRIBUTOR

Hey @shoyer no worries, we all get busy with other things. Seems I messed up the docs slightly, a fix is in #2386

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  BUG: modify behavior of Dataset.filter_by_attrs to match netCDF4.Data… 345322908
410381054 https://github.com/pydata/xarray/pull/2322#issuecomment-410381054 https://api.github.com/repos/pydata/xarray/issues/2322 MDEyOklzc3VlQ29tbWVudDQxMDM4MTA1NA== DocOtak 868027 2018-08-03T21:28:14Z 2018-08-03T21:28:14Z CONTRIBUTOR

@shoyer Hopefully I've done this all correctly, please have a look once all the tests pass.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  BUG: modify behavior of Dataset.filter_by_attrs to match netCDF4.Data… 345322908
408580878 https://github.com/pydata/xarray/pull/2322#issuecomment-408580878 https://api.github.com/repos/pydata/xarray/issues/2322 MDEyOklzc3VlQ29tbWVudDQwODU4MDg3OA== DocOtak 868027 2018-07-28T04:06:03Z 2018-07-28T04:06:03Z CONTRIBUTOR

Wow this is sloppy.... I’ll get the test added and the code cleaned up.

Any thoughts on the modified doc string?

Should I add this change as “breaking” or “bug fix” in the what’s new doc?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  BUG: modify behavior of Dataset.filter_by_attrs to match netCDF4.Data… 345322908
408263759 https://github.com/pydata/xarray/issues/2315#issuecomment-408263759 https://api.github.com/repos/pydata/xarray/issues/2315 MDEyOklzc3VlQ29tbWVudDQwODI2Mzc1OQ== DocOtak 868027 2018-07-26T23:17:26Z 2018-07-26T23:17:26Z CONTRIBUTOR

I can work on a PR tomorrow. Does the benefit of having the same behavior as the netCDF4 library warrant a potentially breaking change for existing code which relies on the current behavior of filter_by_attrs()? This might need adding a new method with the same behavior as netCDF4 and keeping the existing one as is (with appropriate documentation updates).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Behavior of filter_by_attrs() does not match netCDF4.Dataset.get_variables_by_attributes 344631360
408260961 https://github.com/pydata/xarray/issues/2315#issuecomment-408260961 https://api.github.com/repos/pydata/xarray/issues/2315 MDEyOklzc3VlQ29tbWVudDQwODI2MDk2MQ== DocOtak 868027 2018-07-26T23:01:44Z 2018-07-26T23:01:44Z CONTRIBUTOR

I'm fairly certain that the netCDF4.Dataset.get_variables_by_attributes behaves as a logical AND.

Here is the currently implementation body from https://github.com/Unidata/netcdf4-python/blob/master/netCDF4/_netCDF4.pyx#L2868 ```python vs = []

has_value_flag = False

this is a hack to make inheritance work in MFDataset

(which stores variables in _vars)

_vars = self.variables if _vars is None: _vars = self._vars for vname in _vars: var = _vars[vname] for k, v in kwargs.items(): if callable(v): has_value_flag = v(getattr(var, k, None)) if has_value_flag is False: break elif hasattr(var, k) and getattr(var, k) == v: has_value_flag = True else: has_value_flag = False break

if has_value_flag is True: vs.append(_vars[vname])

return vs ```

The difference appears to be in the presence of that has_value_flag and the breaks in the innermost loop. I must admit I had a little trouble following the above code, but it seems that when any of they key=value tests fails, it will stop checking the current variable (DataArray in the xarray context) and check the next variable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Behavior of filter_by_attrs() does not match netCDF4.Dataset.get_variables_by_attributes 344631360

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 28.528ms · About: xarray-datasette