home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

14 rows where user = 1797906 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 4

  • Drop coordinates on loading large dataset. 9
  • Modifying data set resulting in much larger file size 3
  • exit code 137 when using xarray.open_mfdataset 1
  • Cannot re-index or align objects with conflicting indexes 1

user 1

  • jamesstidard · 14 ✖

author_association 1

  • NONE 14
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1240646680 https://github.com/pydata/xarray/issues/7005#issuecomment-1240646680 https://api.github.com/repos/pydata/xarray/issues/7005 IC_kwDOAMm_X85J8sQY jamesstidard 1797906 2022-09-08T12:24:41Z 2022-09-08T12:24:41Z NONE

Hi @benbovy,

Thanks for the detailed response.

Yeah, that it was only raising for the second multi indexing map, does seem like a bug in that case, I'll leave the ticket open to track that.

I didn't stumble on the set_levels function while skimming though the docs, thanks. I've updated my function to make use of that. I'm hoping things should be safe with this, and I'm correctly replacing things in the correct order.

For anyone else who's looking to do the same, or for anyone to tell me what I'm doing is not safe, or there's a simpler way, here's the updated function:

```python import numpy as np import pandas as pd import xarray as xr

def map_coords(ds, *, name, mapping): """ Takes a xarray dataset's coordinate values and updates them with the given the provided mapping. In-place.

Can handle both regular indices and multi-level indices.

ds: An xr.Datset
name: name of the coordinate to update
mapping: dictionary, key of old value, value of new value.
"""
# all attrs seem to get dropped on coords even if only
# one is altered. Hold on to them and reapply after
coord_attrs = {c: ds[c].attrs for c in ds.coords}

if ds.indexes.is_multi(name):
    target = name
    parent = ds.indexes[target].name

    if target == parent:
        valid_targets = ds.indexes[parent].names
        raise ValueError(
            f"Can only map levels of a MultiIndex, not the MultiIndex "
            f"itself. Target one of {valid_targets}",
        )

    multi_index = ds.indexes[parent]
    level_values = dict(zip(multi_index.names, multi_index.levels))
    new_values = [mapping[v] for v in level_values[name]]
    ds.coords[parent] = multi_index.set_levels(new_values, level=target)
else:
    old_values = ds.coords[name].values
    new_values = [mapping[v] for v in old_values]
    ds[name] = new_values

# reapply attrs
for coord, attrs in coord_attrs.items():
    ds[coord].attrs = attrs

midx = pd.MultiIndex.from_product([list("abc"), [0, 1]], names=("x_one", "x_two")) midy = pd.MultiIndex.from_product([list("abc"), [0, 1]], names=("y_one", "y_two")) mda = xr.DataArray(np.random.rand(6, 6, 3), [("x", midx), ("y", midy), ("z", range(3))])

map_coords(mda, name="z", mapping={0: "zero", 1: "one", 2: "two"}) map_coords(mda, name="x_one", mapping={"a": "aa", "b": "bb", "c": "cc"}) map_coords(mda, name="y_one", mapping={"a": "aa", "b": "bb", "c": "cc"})

print(mda) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot re-index or align objects with conflicting indexes 1364911775
365925282 https://github.com/pydata/xarray/issues/1854#issuecomment-365925282 https://api.github.com/repos/pydata/xarray/issues/1854 MDEyOklzc3VlQ29tbWVudDM2NTkyNTI4Mg== jamesstidard 1797906 2018-02-15T13:21:33Z 2018-02-15T13:24:46Z NONE

@rabernat Still seem to get a SIGKILL 9 (exit code 137) when trying to run with that pre-processor as well.

Maybe my expectations of how it lazy loads files is too high. The machine I'm running on has 8GB or ram and the files in total are just under 1Tb

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Drop coordinates on loading large dataset. 291332965
365896646 https://github.com/pydata/xarray/issues/1854#issuecomment-365896646 https://api.github.com/repos/pydata/xarray/issues/1854 MDEyOklzc3VlQ29tbWVudDM2NTg5NjY0Ng== jamesstidard 1797906 2018-02-15T11:12:48Z 2018-02-15T11:12:48Z NONE

@jhamman Here's the ncdump of one of the resource files:

```bash netcdf \34.128_1900_01_05_05 { dimensions: longitude = 720 ; latitude = 361 ; time = UNLIMITED ; // (124 currently) variables: float longitude(longitude) ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int time(time) ; time:units = "hours since 1900-01-01 00:00:0.0" ; time:long_name = "time" ; time:calendar = "gregorian" ; short sst(time, latitude, longitude) ; sst:scale_factor = 0.000552094668668839 ; sst:add_offset = 285.983000319853 ; sst:_FillValue = -32767s ; sst:missing_value = -32767s ; sst:units = "K" ; sst:long_name = "Sea surface temperature" ;

// global attributes: :Conventions = "CF-1.6" ; :history = "2017-08-04 06:17:58 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data05/scratch/_mars-atls09-95e2cf679cd58ee9b4db4dd119a05a8d-gF5gxN.grib -o /data/data04/scratch/_grib2netcdf-atls01-a562cefde8a29a7288fa0b8b7f9413f7-VvH7PP.nc -utime" ; :_Format = "64-bit offset" ; } ``` Unfortunately removing the chunks didn't seem to help. I'm running with the pre-process workaround this morning to see if that completes. Sorry for the late response on this - been pretty busy.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Drop coordinates on loading large dataset. 291332965
364492783 https://github.com/pydata/xarray/issues/1854#issuecomment-364492783 https://api.github.com/repos/pydata/xarray/issues/1854 MDEyOklzc3VlQ29tbWVudDM2NDQ5Mjc4Mw== jamesstidard 1797906 2018-02-09T16:58:42Z 2018-02-09T16:58:42Z NONE

I'll give both of those a shot.

For hosting, the files are currently on a local drive and they sum to about 1Tb. I can probably host a couple examples though.

Thanks again for the support.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Drop coordinates on loading large dataset. 291332965
364488847 https://github.com/pydata/xarray/issues/1854#issuecomment-364488847 https://api.github.com/repos/pydata/xarray/issues/1854 MDEyOklzc3VlQ29tbWVudDM2NDQ4ODg0Nw== jamesstidard 1797906 2018-02-09T16:45:51Z 2018-02-09T16:45:51Z NONE

That run was killed with the output

``bash ~/.pyenv/versions/3.4.6/lib/python3.4/site-packages/xarray/core/dtypes.py:23: FutureWarning: Conversion of the second argument of issubdtype fromfloattonp.floatingis deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type`. if np.issubdtype(dtype, float):

Process finished with exit code 137 (interrupted by signal 9: SIGKILL) ```

I wasn't watching the machine at the time but I assume that's it falling over to memory pressure.

Hi @jhamman, I'm using 0.10.0 of xarray with dask 0.16.1 and distrobuted 1.18.0. I realise that last one is out of date, I will update and retry.

I'm just using whatever the default scheduler is as that's pretty much all the code I've got written above.

I'm unsure how to do a performance check as the dataset can't even be fully loaded currently. I've tried different chuck sizes in the past hoping to stumble on a magic size, but have been unsuccessful with that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Drop coordinates on loading large dataset. 291332965
364463855 https://github.com/pydata/xarray/issues/1854#issuecomment-364463855 https://api.github.com/repos/pydata/xarray/issues/1854 MDEyOklzc3VlQ29tbWVudDM2NDQ2Mzg1NQ== jamesstidard 1797906 2018-02-09T15:22:38Z 2018-02-09T15:22:38Z NONE

Sure, I'm running that now. I'll reply once/if it finished. Though watching my system monitor memory usage, it does not appear to be growing. I seem to remember the open function continually allocating itself more ram until it was killed.

I'll take a read through that issue while I wait.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Drop coordinates on loading large dataset. 291332965
364459162 https://github.com/pydata/xarray/issues/1854#issuecomment-364459162 https://api.github.com/repos/pydata/xarray/issues/1854 MDEyOklzc3VlQ29tbWVudDM2NDQ1OTE2Mg== jamesstidard 1797906 2018-02-09T15:06:37Z 2018-02-09T15:09:02Z NONE

That's true, maybe I misread last time or it's month dependant.

Hopefully this is what you're after - let me know if not. I used 3 *.nc files to make this, with the snippet you posted above.

bash <xarray.Dataset> Dimensions: (time: 728) Coordinates: longitude float32 10.0 latitude float32 10.0 * time (time) datetime64[ns] 1992-01-01 1992-01-01T03:00:00 ... Data variables: mwp (time) float64 dask.array<shape=(728,), chunksize=(127,)> Attributes: Conventions: CF-1.6 history: 2017-08-10 04:58:48 GMT by grib_to_netcdf-2.4.0: grib_to_ne... If you're after the entire dataset, I should be able to get that but may take some time.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Drop coordinates on loading large dataset. 291332965
364451782 https://github.com/pydata/xarray/issues/1854#issuecomment-364451782 https://api.github.com/repos/pydata/xarray/issues/1854 MDEyOklzc3VlQ29tbWVudDM2NDQ1MTc4Mg== jamesstidard 1797906 2018-02-09T14:40:20Z 2018-02-09T14:40:20Z NONE

Sure, this is the repr of a single file:

bash <xarray.Dataset> Dimensions: (time: 248) Coordinates: longitude float32 10.0 latitude float32 10.0 * time (time) datetime64[ns] 2004-12-01 2004-12-01T03:00:00 ... Data variables: mwd (time) float64 dask.array<shape=(248,), chunksize=(248,)> Attributes: Conventions: CF-1.6 history: 2017-08-09 16:22:56 GMT by grib_to_netcdf-2.4.0: grib_to_ne...

Thanks

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Drop coordinates on loading large dataset. 291332965
364399084 https://github.com/pydata/xarray/issues/1854#issuecomment-364399084 https://api.github.com/repos/pydata/xarray/issues/1854 MDEyOklzc3VlQ29tbWVudDM2NDM5OTA4NA== jamesstidard 1797906 2018-02-09T10:41:28Z 2018-02-09T10:41:28Z NONE

Sorry to bump this. Still looking to a solution to this problem if anyone has had a similar experience. Thanks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Drop coordinates on loading large dataset. 291332965
361576685 https://github.com/pydata/xarray/issues/1854#issuecomment-361576685 https://api.github.com/repos/pydata/xarray/issues/1854 MDEyOklzc3VlQ29tbWVudDM2MTU3NjY4NQ== jamesstidard 1797906 2018-01-30T12:19:12Z 2018-01-30T12:19:12Z NONE

Hi @rabernat, thanks for the response. Sorry it's taken me a few days to get back to you.

Here's the info dump of one of the files:

``` xarray.Dataset { dimensions: latitude = 361 ; longitude = 720 ; time = 248 ;

variables: float32 longitude(longitude) ; longitude:units = degrees_east ; longitude:long_name = longitude ; float32 latitude(latitude) ; latitude:units = degrees_north ; latitude:long_name = latitude ; datetime64[ns] time(time) ; time:long_name = time ; float64 mwd(time, latitude, longitude) ; mwd:units = Degree true ; mwd:long_name = Mean wave direction ;

// global attributes: :Conventions = CF-1.6 ; :history = 2017-08-09 18:15:34 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data05/scratch/_mars-atls02-70e05f9f8ba4e9d19932f1c45a7be8d8-Pwy6jZ.grib -o /data/data01/scratch/_grib2netcdf-atls02-95e2cf679cd58ee9b4db4dd119a05a8d-v4TKah.nc -utime ; ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Drop coordinates on loading large dataset. 291332965
330162706 https://github.com/pydata/xarray/issues/1572#issuecomment-330162706 https://api.github.com/repos/pydata/xarray/issues/1572 MDEyOklzc3VlQ29tbWVudDMzMDE2MjcwNg== jamesstidard 1797906 2017-09-18T08:57:39Z 2017-09-18T08:59:24Z NONE

@shoyer great, thanks. I added the line below and it has reduced the size of the file down to that of the duplicate. Thanks pointing me the in the right direction. I'm assuming I do not need to fillnans with _FillValue after (though maybe I might).

python masked_ds.swh.encoding = {k: v for k, v in ds.swh.encoding.items() if k in {'_FillValue', 'add_offset', 'dtype', 'scale_factor'}}

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modifying data set resulting in much larger file size 257400162
329233581 https://github.com/pydata/xarray/issues/1572#issuecomment-329233581 https://api.github.com/repos/pydata/xarray/issues/1572 MDEyOklzc3VlQ29tbWVudDMyOTIzMzU4MQ== jamesstidard 1797906 2017-09-13T17:06:12Z 2017-09-13T17:06:12Z NONE

@fmaussion @jhamman Ah great - that makes sense. I'll see if I can set them to the original file's short fill representation instead of nan.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modifying data set resulting in much larger file size 257400162
329230620 https://github.com/pydata/xarray/issues/1572#issuecomment-329230620 https://api.github.com/repos/pydata/xarray/issues/1572 MDEyOklzc3VlQ29tbWVudDMyOTIzMDYyMA== jamesstidard 1797906 2017-09-13T16:55:45Z 2017-09-13T16:59:57Z NONE

Sure, here you go:

Original (128.9MB): ```bash $ ncdump -h -s swh_2010_01_05_05.nc netcdf swh_2010_01_05_05 { dimensions: longitude = 720 ; latitude = 361 ; time = UNLIMITED ; // (248 currently) variables: float longitude(longitude) ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int time(time) ; time:units = "hours since 1900-01-01 00:00:0.0" ; time:long_name = "time" ; time:calendar = "gregorian" ; short swh(time, latitude, longitude) ; swh:scale_factor = 0.000203558072860934 ; swh:add_offset = 6.70098898894319 ; swh:_FillValue = -32767s ; swh:missing_value = -32767s ; swh:units = "m" ; swh:long_name = "Significant height of combined wind waves and swell" ;

// global attributes: :Conventions = "CF-1.6" ; :history = "2017-08-09 16:41:57 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data04/scratch/_mars-atls01-a562cefde8a29a7288fa0b8b7f9413f7-5gV0xP.grib -o /data/data05/scratch/_grib2netcdf-atls09-70e05f9f8ba4e9d19932f1c45a7be8d8-jU8lEi.nc -utime" ; :_Format = "64-bit offset" ; } Duplicate (129.0MB):bash $ ncdump -h -s swh_2010_01_05_05-duplicate.nc netcdf swh_2010_01_05_05-duplicate { dimensions: longitude = 720 ; latitude = 361 ; time = UNLIMITED ; // (248 currently) variables: float longitude(longitude) ; longitude:_FillValue = NaNf ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; longitude:_Storage = "contiguous" ; float latitude(latitude) ; latitude:_FillValue = NaNf ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; latitude:_Storage = "contiguous" ; int time(time) ; time:long_name = "time" ; time:units = "hours since 1900-01-01" ; time:calendar = "gregorian" ; time:_Storage = "chunked" ; time:_ChunkSizes = 1024 ; time:_Endianness = "little" ; short swh(time, latitude, longitude) ; swh:_FillValue = -32767s ; swh:units = "m" ; swh:long_name = "Significant height of combined wind waves and swell" ; swh:add_offset = 6.70098898894319 ; swh:scale_factor = 0.000203558072860934 ; swh:_Storage = "chunked" ; swh:_ChunkSizes = 1, 361, 720 ; swh:_Endianness = "little" ;

// global attributes: :_NCProperties = "version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18" ; :Conventions = "CF-1.6" ; :history = "2017-08-09 16:41:57 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data04/scratch/_mars-atls01-a562cefde8a29a7288fa0b8b7f9413f7-5gV0xP.grib -o /data/data05/scratch/_grib2netcdf-atls09-70e05f9f8ba4e9d19932f1c45a7be8d8-jU8lEi.nc -utime" ; :_Format = "netCDF-4" ; } Masked (515.7MB):bash $ ncdump -h -s swh_2010_01_05_05-masked.nc netcdf swh_2010_01_05_05-masked { dimensions: longitude = 720 ; latitude = 361 ; time = 248 ; variables: float longitude(longitude) ; longitude:_FillValue = NaNf ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; longitude:_Storage = "contiguous" ; float latitude(latitude) ; latitude:_FillValue = NaNf ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; latitude:_Storage = "contiguous" ; int time(time) ; time:long_name = "time" ; time:units = "hours since 1900-01-01" ; time:calendar = "gregorian" ; time:_Storage = "contiguous" ; time:_Endianness = "little" ; double swh(time, latitude, longitude) ; swh:_FillValue = NaN ; swh:units = "m" ; swh:long_name = "Significant height of combined wind waves and swell" ; swh:_Storage = "contiguous" ;

// global attributes: :_NCProperties = "version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18" ; :Conventions = "CF-1.6" ; :history = "2017-08-09 16:41:57 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data04/scratch/_mars-atls01-a562cefde8a29a7288fa0b8b7f9413f7-5gV0xP.grib -o /data/data05/scratch/_grib2netcdf-atls09-70e05f9f8ba4e9d19932f1c45a7be8d8-jU8lEi.nc -utime" ; :_Format = "netCDF-4" ; } ``` I assume it's about that fill/missing value changing? Thanks for the help.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modifying data set resulting in much larger file size 257400162
329181674 https://github.com/pydata/xarray/issues/1561#issuecomment-329181674 https://api.github.com/repos/pydata/xarray/issues/1561 MDEyOklzc3VlQ29tbWVudDMyOTE4MTY3NA== jamesstidard 1797906 2017-09-13T14:16:06Z 2017-09-13T14:16:06Z NONE

Increasing the chunk sizes seemed to resolve this issue. I was loading readings over time on the world as 360x180xT was tryin to rechunk them to 1x1xT.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  exit code 137 when using xarray.open_mfdataset 255997962

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.294ms · About: xarray-datasette