home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 304624171 and user = 22245117 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • malmans2 · 4 ✖

issue 1

  • Load a small subset of data from a big dataset takes forever · 4 ✖

author_association 1

  • CONTRIBUTOR 4
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
373694632 https://github.com/pydata/xarray/issues/1985#issuecomment-373694632 https://api.github.com/repos/pydata/xarray/issues/1985 MDEyOklzc3VlQ29tbWVudDM3MzY5NDYzMg== malmans2 22245117 2018-03-16T12:09:50Z 2018-03-16T12:09:50Z CONTRIBUTOR

Alright, I found the problem. I'm loading several variables from different files. All the variables have 1464 snapshots. However, one of the 3D variables has just one snapshot at a different time (I found a bag in my bash script to re-organize the raw data). When I load my dataset using .open_mfdataset, the time dimension has an extra snapshot (length is 1465). However, xarray doesn't like it and when I run functions such as to_netcdf it takes forever (no error). Thanks @fujiisoup for the help!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Load a small subset of data from a big dataset takes forever 304624171
372570107 https://github.com/pydata/xarray/issues/1985#issuecomment-372570107 https://api.github.com/repos/pydata/xarray/issues/1985 MDEyOklzc3VlQ29tbWVudDM3MjU3MDEwNw== malmans2 22245117 2018-03-13T07:21:10Z 2018-03-13T07:21:10Z CONTRIBUTOR

I forgot to mention that I'm getting this warning: /home/idies/anaconda3/lib/python3.5/site-packages/dask/core.py:306: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison elif type_arg is type(key) and arg == key:

However, I don't think it is relevant since I get the same warning when I'm able to run .to_netcdf() on the 3D variable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Load a small subset of data from a big dataset takes forever 304624171
372566304 https://github.com/pydata/xarray/issues/1985#issuecomment-372566304 https://api.github.com/repos/pydata/xarray/issues/1985 MDEyOklzc3VlQ29tbWVudDM3MjU2NjMwNA== malmans2 22245117 2018-03-13T07:01:51Z 2018-03-13T07:01:51Z CONTRIBUTOR

The problem occurs when I run the very last line, which is to_netcdf(). Right before, the dataset looks like this: python <xarray.Dataset> Dimensions: (X: 10, Y: 25, Z: 1, time: 2) Coordinates: * time (time) datetime64[ns] 2007-11-15 2007-11-16 * Z (Z) float64 1.0 * X (X) float64 -29.94 -29.89 -29.85 -29.81 -29.76 -29.72 -29.67 ... * Y (Y) float64 65.01 65.03 65.05 65.07 65.09 65.11 65.13 65.15 ... Data variables: drF (time, Z) float64 2.0 2.0 dxF (time, Y, X) float64 2.066e+03 2.066e+03 2.066e+03 2.066e+03 ... dyF (time, Y, X) float64 2.123e+03 2.123e+03 2.123e+03 2.123e+03 ... rA (time, Y, X) float64 4.386e+06 4.386e+06 4.386e+06 4.386e+06 ... fCori (time, Y, X) float64 0.0001322 0.0001322 0.0001322 0.0001322 ... R_low (time, Y, X) float64 -2.001e+03 -1.989e+03 -1.973e+03 ... Ro_surf (time, Y, X) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... Depth (time, Y, X) float64 2.001e+03 1.989e+03 1.973e+03 1.963e+03 ... HFacC (time, Z, Y, X) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 ... Temp (time, Z, Y, X) float64 dask.array<shape=(2, 1, 25, 10), chunksize=(1, 1, 25, 10)> This is a dask array, right?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Load a small subset of data from a big dataset takes forever 304624171
372558850 https://github.com/pydata/xarray/issues/1985#issuecomment-372558850 https://api.github.com/repos/pydata/xarray/issues/1985 MDEyOklzc3VlQ29tbWVudDM3MjU1ODg1MA== malmans2 22245117 2018-03-13T06:19:47Z 2018-03-13T06:23:00Z CONTRIBUTOR

I have the same issue if I don't copy the dataset.

Here are the coordinates of my dataset: python <xarray.Dataset> Dimensions: (X: 960, Xp1: 961, Y: 880, Yp1: 881, Z: 216, Zl: 216, Zp1: 217, Zu: 216, time: 1465) Coordinates: * Z (Z) float64 1.0 3.5 7.0 11.5 17.0 23.5 31.0 39.5 49.0 59.5 ... * Zp1 (Zp1) float64 0.0 2.0 5.0 9.0 14.0 20.0 27.0 35.0 44.0 54.0 ... * Zu (Zu) float64 2.0 5.0 9.0 14.0 20.0 27.0 35.0 44.0 54.0 65.0 ... * Zl (Zl) float64 0.0 2.0 5.0 9.0 14.0 20.0 27.0 35.0 44.0 54.0 ... * X (X) float64 -46.92 -46.83 -46.74 -46.65 -46.57 -46.48 -46.4 ... * Y (Y) float64 56.81 56.85 56.89 56.93 56.96 57.0 57.04 57.08 ... * Xp1 (Xp1) float64 -46.96 -46.87 -46.78 -46.7 -46.61 -46.53 ... * Yp1 (Yp1) float64 56.79 56.83 56.87 56.91 56.95 56.98 57.02 ... * time (time) datetime64[ns] 2007-09-01 2007-09-01T06:00:00 ... I don't think the horizontal coordinates are the problem because it works fine when I use the same function on 3D variables. I'm also attaching the function that I use to open the dataset, just in case is helpful: ```python def load_dataset(): """ Load the whole dataset """

# Import grid and fields separately, then merge
gridpath = '/home/idies/workspace/OceanCirculation/exp_ASR/grid_glued.nc'
fldspath = '/home/idies/workspace/OceanCirculation/exp_ASR/result_*/output_glued/*.*_glued.nc'
gridset = xr.open_dataset(gridpath,
                          drop_variables = ['XU','YU','XV','YV','RC','RF','RU','RL'])
fldsset = xr.open_mfdataset(fldspath,
                            concat_dim     = 'T',
                            drop_variables = ['diag_levels','iter'])
ds = xr.merge([gridset, fldsset])

# Adjust dimensions creating conflicts
ds = ds.rename({'Z': 'Ztmp'})
ds = ds.rename({'T': 'time', 'Ztmp': 'Z', 'Zmd000216': 'Z'})
ds = ds.squeeze('Zd000001')
for dim in ['Z','Zp1', 'Zu','Zl']:
    ds[dim].values   = np.fabs(ds[dim].values)
    ds[dim].attrs.update({'positive': 'down'})

# Create horizontal vectors (remove zeros due to exch2)
ds['X'].values   = ds.XC.where((ds.XC!=0) & (ds.YC!=0)).mean(dim='Y',   skipna=True)
ds['Xp1'].values = ds.XG.where((ds.XG!=0) & (ds.YG!=0)).mean(dim='Yp1', skipna=True)
ds['Y'].values   = ds.YC.where((ds.XC!=0) & (ds.YC!=0)).mean(dim='X',   skipna=True)
ds['Yp1'].values = ds.YG.where((ds.XG!=0) & (ds.YG!=0)).mean(dim='Xp1', skipna=True)
ds = ds.drop(['XC','YC','XG','YG'])

# Create xgcm grid
ds['Z'].attrs.update({'axis': 'Z'})
ds['X'].attrs.update({'axis': 'X'})
ds['Y'].attrs.update({'axis': 'Y'})
for dim in ['Zp1','Zu','Zl','Xp1','Yp1']:
    if min(ds[dim].values)<min(ds[dim[0]].values):
        ds[dim].attrs.update({'axis': dim[0], 'c_grid_axis_shift': -0.5})
    elif min(ds[dim].values)>min(ds[dim[0]].values):
        ds[dim].attrs.update({'axis': dim[0], 'c_grid_axis_shift': +0.5})
grid = xgcm.Grid(ds,periodic=False)

return ds, grid

``` I think somewhere I trigger the loading of the whole dataset. Otherwise, I don't understand why it works when I open just one month instead of the whole year.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Load a small subset of data from a big dataset takes forever 304624171

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.512ms · About: xarray-datasette