home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

15 rows where user = 40218891 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 5

  • open_mfdataset crashes with segfault 7
  • h5netcdf fails to decode attribute coordinates. 3
  • xr.open_dataset(f1).to_netcdf(file2) is not idempotent 2
  • to_zarr() fails on time coordinate in append mode 2
  • GH2550 revisited 1

user 1

  • yt87 · 15 ✖

author_association 1

  • NONE 15
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
822968365 https://github.com/pydata/xarray/issues/5106#issuecomment-822968365 https://api.github.com/repos/pydata/xarray/issues/5106 MDEyOklzc3VlQ29tbWVudDgyMjk2ODM2NQ== yt87 40218891 2021-04-20T04:41:07Z 2021-04-20T04:41:07Z NONE

I am closing this issue. It is impossible to guess the proper time unit when dealing with missing data. Setting the attribute explicitly is a better solution.

A minor quibble: the statement ds1.reftime.encoding['units'] = 'hours since Big Bang' rises an exception AttributeError: 'NoneType' object has no attribute 'groups' It should be ValueError: invalid time units: hours since Big Bang the same as in the case 'hours after 1970-01-01'

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_zarr() fails on time coordinate in append mode 849751721
822108756 https://github.com/pydata/xarray/issues/5106#issuecomment-822108756 https://api.github.com/repos/pydata/xarray/issues/5106 MDEyOklzc3VlQ29tbWVudDgyMjEwODc1Ng== yt87 40218891 2021-04-19T01:27:02Z 2021-04-19T01:28:38Z NONE

When the time dimension of the dataset being appended to is 1, the inferred unit is "days". This happens on line 318 in file conding/times.py. In this case variable timedeltas is an empty array and np.all evaluates to True: np.all(np.array([]) % 86400000000000 == 0) True (which surprised me, by the way). When I forced _infer_time_units_from_diff to return "hours", the time coordinate in my example is evaluated correctly, so I think this particular code is the cause for the error.

Since the fallback return value is set to "seconds", I would argue that the case of empty timedeltas should be set to "seconds" as well. Are there alternatives or I should go ahead and create a pull request?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_zarr() fails on time coordinate in append mode 849751721
767170277 https://github.com/pydata/xarray/issues/4830#issuecomment-767170277 https://api.github.com/repos/pydata/xarray/issues/4830 MDEyOklzc3VlQ29tbWVudDc2NzE3MDI3Nw== yt87 40218891 2021-01-25T23:06:00Z 2021-01-25T23:06:00Z NONE

One could always set source to str(filename_or_object). In this case: ``` import s3fs

s3 = s3fs.S3FileSystem(anon=True) s3path = 's3://wrf-se-ak-ar5/gfdl/hist/daily/1980/WRFDS_1980-01-02.nc' fileset = s3.open(s3path) fileset fileset.path prints <File-like object S3FileSystem, wrf-se-ak-ar5/gfdl/hist/daily/1980/WRFDS_1980-01-02.nc>

'wrf-se-ak-ar5/gfdl/hist/daily/1980/WRFDS_1980-01-02.nc' ` It is easy to parse the abovefileset`` representation, but there is no guarantee that some other external file representation will be amenable to parsing.

If the fix is only for s3fs, getting path attribute is more elegant, however this would require xarray to be aware of the module.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  GH2550 revisited 789653499
762438483 https://github.com/pydata/xarray/issues/4822#issuecomment-762438483 https://api.github.com/repos/pydata/xarray/issues/4822 MDEyOklzc3VlQ29tbWVudDc2MjQzODQ4Mw== yt87 40218891 2021-01-18T19:39:34Z 2021-01-18T19:39:34Z NONE

You might be right. Adding -k nc4 works when string is removed from attribute specification. If it is present, the error is as before: AttributeError: 'numpy.ndarray' object has no attribute 'split'.

However, after changing my AWS script to ``` import s3fs import xarray as xr

s3 = s3fs.S3FileSystem(anon=True) s3path = 's3://wrf-se-ak-ar5/gfdl/hist/daily/1988/WRFDS_1988-04-23.nc'

ds = xr.open_dataset(s3.open(s3path), engine='scipy') print(ds) ` the error isTypeError: Error: None is not a valid NetCDF 3 file``.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  h5netcdf fails to decode attribute coordinates. 787947436
762423707 https://github.com/pydata/xarray/issues/4822#issuecomment-762423707 https://api.github.com/repos/pydata/xarray/issues/4822 MDEyOklzc3VlQ29tbWVudDc2MjQyMzcwNw== yt87 40218891 2021-01-18T19:03:19Z 2021-01-18T19:03:19Z NONE

This is how I did it:

``` $ ncdump /tmp/x.nc netcdf x { dimensions: x = 1 ; y = 1 ; variables: int foo(y, x) ; foo:coordinates = "x y" ; data:

foo = 0 ; } $ rm x.nc $ ncgen -o x.nc < x.cdl $ python -c "import xarray as xr; ds = xr.open_dataset('/tmp/x.nc', engine='h5netcdf'); print(ds)" ``` Engine netcdf4 works fine, with string or without.

My original code retrieving data from AWS: ``` import s3fs import xarray as xr

s3 = s3fs.S3FileSystem(anon=True) s3path = 's3://wrf-se-ak-ar5/gfdl/hist/daily/1988/WRFDS_1988-04-23.nc'

ds = xr.open_dataset(s3.open(s3path)) print(ds) ` Addingdecode_cf=False`` is a workaround. All attributes are arrays:

Attributes: contact: ['rtladerjr@alaska.edu'] info: ['Alaska CASC'] data: ['Downscaled GFDL-CM3'] format: ['version 2'] date: ['Mon Jul 1 15:17:16 AKDT 2019']

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  h5netcdf fails to decode attribute coordinates. 787947436
762376418 https://github.com/pydata/xarray/issues/4822#issuecomment-762376418 https://api.github.com/repos/pydata/xarray/issues/4822 MDEyOklzc3VlQ29tbWVudDc2MjM3NjQxOA== yt87 40218891 2021-01-18T17:12:53Z 2021-01-18T17:19:53Z NONE

Dropping string changes error to Unable to open file (file signature not found). This issue popped up while reading data from https://registry.opendata.aws/wrf-se-alaska-snap/

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  h5netcdf fails to decode attribute coordinates. 787947436
481033093 https://github.com/pydata/xarray/issues/2871#issuecomment-481033093 https://api.github.com/repos/pydata/xarray/issues/2871 MDEyOklzc3VlQ29tbWVudDQ4MTAzMzA5Mw== yt87 40218891 2019-04-08T22:35:10Z 2019-04-08T22:35:10Z NONE

After rethinking the issue, I would drop it: one can simply pass ds.fromkeys(ds.data_vars.keys(), {}) as the encoding attribute. Going back to the original problem. The fix above is not enough, the SerializationWarning is still present. An alternative, provided that missing_value attribute is still considered deprecated: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.1/build/cf-conventions.html#missing-data, would be to replace it by _FillValue on decoding: $ diff variables.py variables.py.orig 179,180d178 < if 'FillValue' not in encoding: < encoding['_FillValue'] = encoding.pop('missing_value')``

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.open_dataset(f1).to_netcdf(file2) is not idempotent 429914958
480475645 https://github.com/pydata/xarray/issues/2871#issuecomment-480475645 https://api.github.com/repos/pydata/xarray/issues/2871 MDEyOklzc3VlQ29tbWVudDQ4MDQ3NTY0NQ== yt87 40218891 2019-04-06T05:24:52Z 2019-04-06T05:24:52Z NONE

Indeed it works. Thanks. My quick fix: $ diff variables.py variables.py.orig 152,155d151 < elif encoding.get('missing_value') is not None: < fill_value = pop_to(encoding, attrs, 'missing_value', name=name) < if not pd.isnull(fill_value): < data = duck_array_ops.fillna(data, fill_value) I also figured out how to write back floating point values: encoding=None means use existing values, so specifying encoding={'tmpk': {}} in to_netcdf() did the trick. Should there be an option for this? What you see on the screen is not what you get in the file.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.open_dataset(f1).to_netcdf(file2) is not idempotent 429914958
455351725 https://github.com/pydata/xarray/issues/2554#issuecomment-455351725 https://api.github.com/repos/pydata/xarray/issues/2554 MDEyOklzc3VlQ29tbWVudDQ1NTM1MTcyNQ== yt87 40218891 2019-01-17T22:13:52Z 2019-01-17T22:13:52Z NONE

After upgrading to anaconda python 3.7 the code works without crashes. I think this issue can be closed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset crashes with segfault 379472634
439281383 https://github.com/pydata/xarray/issues/2554#issuecomment-439281383 https://api.github.com/repos/pydata/xarray/issues/2554 MDEyOklzc3VlQ29tbWVudDQzOTI4MTM4Mw== yt87 40218891 2018-11-16T04:50:43Z 2018-11-16T04:50:43Z NONE

The error RuntimeError: NetCDF: Bad chunk sizes. is unrelated to the original problem with segv crashes. It is caused by a bug in netcdf4 C library. It is fixed in the latest version 4.6.1. As of yesterday, the newest netcdf4-python manylinux wheel contains an older version. The solution is to build netcdf4-python from source.

The segv crashes occur with other datasets as well. Example test set I used:

for year in range(2000, 2005): file = '/tmp/dx{:d}.nc'.format(year) #times = pd.date_range('{:d}-01-01'.format(year), '{:d}-12-31'.format(year), name='time') times = pd.RangeIndex(year, year+300, name='time') v = np.array([np.random.random((32, 32)) for i in range(times.size)]) dx = xr.Dataset({'v': (('time', 'y', 'x'), v)}, {'time': times}) dx.to_netcdf(file, format='NETCDF4', encoding={'time': {'chunksizes': (1024,)}}, unlimited_dims='time')

A simple fix is to change the scheduler as I did in my original post.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset crashes with segfault 379472634
437647881 https://github.com/pydata/xarray/issues/2554#issuecomment-437647881 https://api.github.com/repos/pydata/xarray/issues/2554 MDEyOklzc3VlQ29tbWVudDQzNzY0Nzg4MQ== yt87 40218891 2018-11-11T06:50:22Z 2018-11-11T06:50:22Z NONE

I meant at random points during execution. The script crashed every time.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset crashes with segfault 379472634
437647777 https://github.com/pydata/xarray/issues/2554#issuecomment-437647777 https://api.github.com/repos/pydata/xarray/issues/2554 MDEyOklzc3VlQ29tbWVudDQzNzY0Nzc3Nw== yt87 40218891 2018-11-11T06:47:47Z 2018-11-11T06:47:47Z NONE

soundings.zip

I did some further tests, the crash occurs somewhat randomly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset crashes with segfault 379472634
437646885 https://github.com/pydata/xarray/issues/2554#issuecomment-437646885 https://api.github.com/repos/pydata/xarray/issues/2554 MDEyOklzc3VlQ29tbWVudDQzNzY0Njg4NQ== yt87 40218891 2018-11-11T06:22:27Z 2018-11-11T06:22:27Z NONE

About 600k for 2 files. I could spend some time to try size that down, but if there is a way to upload the the whole set it would be easier for me.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset crashes with segfault 379472634
437633544 https://github.com/pydata/xarray/issues/2554#issuecomment-437633544 https://api.github.com/repos/pydata/xarray/issues/2554 MDEyOklzc3VlQ29tbWVudDQzNzYzMzU0NA== yt87 40218891 2018-11-11T00:38:03Z 2018-11-11T00:38:03Z NONE

Another puzzle, I don't know it is related to the crashes.

Trying to localize the issue I added line after else on line 453 in netCDF4_.py: print('=======', name, encoding.get('chunksizes'))

ds0 = xr.open_dataset('/tmp/nam/bufr.701940/bufr.701940.2010123112.nc') ds0.to_netcdf('/tmp/d0.nc')

This prints: ``` ======= hlcy (1, 85) ======= cdbp (1, 85) ======= hovi (1, 85) ======= itim (1024,)


RuntimeError Traceback (most recent call last) <ipython-input-5-aeb92962e874> in <module>() 1 ds0 = xr.open_dataset('/tmp/nam/bufr.701940/bufr.701940.2010123112.nc') ----> 2 ds0.to_netcdf('/tmp/d0.nc')

/usr/local/Python-3.6.5/lib/python3.6/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute) 1220 engine=engine, encoding=encoding, 1221 unlimited_dims=unlimited_dims, -> 1222 compute=compute) 1223 1224 def to_zarr(self, store=None, mode='w-', synchronizer=None, group=None,

/usr/local/Python-3.6.5/lib/python3.6/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile) 718 # to be parallelized with dask 719 dump_to_store(dataset, store, writer, encoding=encoding, --> 720 unlimited_dims=unlimited_dims) 721 if autoclose: 722 store.close()

/usr/local/Python-3.6.5/lib/python3.6/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 761 762 store.store(variables, attrs, check_encoding, writer, --> 763 unlimited_dims=unlimited_dims) 764 765

/usr/local/Python-3.6.5/lib/python3.6/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims) 265 self.set_variables(variables, check_encoding_set, writer, --> 266 unlimited_dims=unlimited_dims) 267 268 def set_attributes(self, attributes):

/usr/local/Python-3.6.5/lib/python3.6/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 302 check = vn in check_encoding_set 303 target, source = self.prepare_variable( --> 304 name, v, check, unlimited_dims=unlimited_dims) 305 306 writer.add(source, target)

/usr/local/Python-3.6.5/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in prepare_variable(self, name, variable, check_encoding, unlimited_dims) 466 least_significant_digit=encoding.get( 467 'least_significant_digit'), --> 468 fill_value=fill_value) 469 _disable_auto_decode_variable(nc4_var) 470

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.createVariable()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.init()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

RuntimeError: NetCDF: Bad chunk sizes. ```

The dataset is:

<xarray.Dataset> Dimensions: (dim_1: 1, dim_prof: 60, dim_slyr: 4, ftim: 85, itim: 1) Coordinates: * ftim (ftim) timedelta64[ns] 00:00:00 01:00:00 ... 3 days 12:00:00 * itim (itim) datetime64[ns] 2010-12-31T12:00:00 Dimensions without coordinates: dim_1, dim_prof, dim_slyr Data variables: stnm (dim_1) float64 ... rpid (dim_1) object ... clat (dim_1) float32 ... clon (dim_1) float32 ... gelv (dim_1) float32 ... clss (itim, ftim) float32 ... pres (itim, ftim, dim_prof) float32 ... tmdb (itim, ftim, dim_prof) float32 ... uwnd (itim, ftim, dim_prof) float32 ... vwnd (itim, ftim, dim_prof) float32 ... spfh (itim, ftim, dim_prof) float32 ... omeg (itim, ftim, dim_prof) float32 ... cwtr (itim, ftim, dim_prof) float32 ... dtcp (itim, ftim, dim_prof) float32 ... dtgp (itim, ftim, dim_prof) float32 ... dtsw (itim, ftim, dim_prof) float32 ... dtlw (itim, ftim, dim_prof) float32 ... cfrl (itim, ftim, dim_prof) float32 ... tkel (itim, ftim, dim_prof) float32 ... imxr (itim, ftim, dim_prof) float32 ... pmsl (itim, ftim) float32 ... prss (itim, ftim) float32 ... tmsk (itim, ftim) float32 ... tmin (itim, ftim) float32 ... tmax (itim, ftim) float32 ... wtns (itim, ftim) float32 ... tp01 (itim, ftim) float32 ... c01m (itim, ftim) float32 ... srlm (itim, ftim) float32 ... u10m (itim, ftim) float32 ... v10m (itim, ftim) float32 ... th10 (itim, ftim) float32 ... q10m (itim, ftim) float32 ... t2ms (itim, ftim) float32 ... q2ms (itim, ftim) float32 ... sfex (itim, ftim) float32 ... vegf (itim, ftim) float32 ... cnpw (itim, ftim) float32 ... fxlh (itim, ftim) float32 ... fxlp (itim, ftim) float32 ... fxsh (itim, ftim) float32 ... fxss (itim, ftim) float32 ... fxsn (itim, ftim) float32 ... swrd (itim, ftim) float32 ... swru (itim, ftim) float32 ... lwrd (itim, ftim) float32 ... lwru (itim, ftim) float32 ... lwrt (itim, ftim) float32 ... swrt (itim, ftim) float32 ... snfl (itim, ftim) float32 ... smoi (itim, ftim) float32 ... swem (itim, ftim) float32 ... n01m (itim, ftim) float32 ... r01m (itim, ftim) float32 ... bfgr (itim, ftim) float32 ... sltb (itim, ftim) float32 ... smc1 (itim, ftim, dim_slyr) float32 ... stc1 (itim, ftim, dim_slyr) float32 ... lsql (itim, ftim) float32 ... lcld (itim, ftim) float32 ... mcld (itim, ftim) float32 ... hcld (itim, ftim) float32 ... snra (itim, ftim) float32 ... wxts (itim, ftim) float32 ... wxtp (itim, ftim) float32 ... wxtz (itim, ftim) float32 ... wxtr (itim, ftim) float32 ... ustm (itim, ftim) float32 ... vstm (itim, ftim) float32 ... hlcy (itim, ftim) float32 ... cdbp (itim, ftim) float32 ... hovi (itim, ftim) float32 ... Attributes: model: Unknown

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset crashes with segfault 379472634
437631073 https://github.com/pydata/xarray/issues/2554#issuecomment-437631073 https://api.github.com/repos/pydata/xarray/issues/2554 MDEyOklzc3VlQ29tbWVudDQzNzYzMTA3Mw== yt87 40218891 2018-11-10T23:49:22Z 2018-11-10T23:49:22Z NONE

No, it works fine.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset crashes with segfault 379472634

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.758ms · About: xarray-datasette