home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where issue = 266320445 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • olgabot 4
  • shoyer 3

author_association 2

  • NONE 4
  • MEMBER 3

issue 1

  • Unicode strings unexpectedly transformed to byte strings upon `open_dataset` · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
341220444 https://github.com/pydata/xarray/issues/1638#issuecomment-341220444 https://api.github.com/repos/pydata/xarray/issues/1638 MDEyOklzc3VlQ29tbWVudDM0MTIyMDQ0NA== olgabot 806256 2017-11-01T19:51:55Z 2017-11-01T19:51:55Z NONE

Posted the lost coordinate issue here: https://github.com/pydata/xarray/issues/1680

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unicode strings unexpectedly transformed to byte strings upon `open_dataset` 266320445
341198220 https://github.com/pydata/xarray/issues/1638#issuecomment-341198220 https://api.github.com/repos/pydata/xarray/issues/1638 MDEyOklzc3VlQ29tbWVudDM0MTE5ODIyMA== olgabot 806256 2017-11-01T18:33:24Z 2017-11-01T18:33:24Z NONE

Using v0.9.6 with engine='h5netcdf'

CPU times: user 1min, sys: 47.7 s, total: 1min 48s Wall time: 2min 19s

Using #1648:

CPU times: user 1min 5s, sys: 54.9 s, total: 2min Wall time: 2min 1s

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unicode strings unexpectedly transformed to byte strings upon `open_dataset` 266320445
339228857 https://github.com/pydata/xarray/issues/1638#issuecomment-339228857 https://api.github.com/repos/pydata/xarray/issues/1638 MDEyOklzc3VlQ29tbWVudDMzOTIyODg1Nw== shoyer 1217238 2017-10-25T06:32:35Z 2017-10-25T06:32:35Z MEMBER

Hmm. I'm not sure why h5netcdf was so much slower. I suspect the default engine you used might have been scipy, which we've noticed can be significantly faster in some cases. If you have time, I would be curious how well my branch in 1648 works using engine='scipy'.

Please file a separate issue if you can put together example code that reproduces the lost coordinate issue. I would like to dig into this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unicode strings unexpectedly transformed to byte strings upon `open_dataset` 266320445
339162189 https://github.com/pydata/xarray/issues/1638#issuecomment-339162189 https://api.github.com/repos/pydata/xarray/issues/1638 MDEyOklzc3VlQ29tbWVudDMzOTE2MjE4OQ== olgabot 806256 2017-10-24T23:02:34Z 2017-10-24T23:03:18Z NONE

Thank you for looking into this! I used the default engine to save, which looks like it was netcdf4. I did pip install h5netcdf and saved again. It took longer, ~2min instead of seconds. Loading was still 110ms and all the features are objects again! Though the coordinates --> variables thing is still happening.

<xarray.Dataset> Dimensions: (cell: 53760, gene: 23438) Coordinates: * cell (cell) object 'A17-B000126-3_39_F-1-1' ... * gene (gene) object '0610005C13Rik' ... Data variables: Columns sorted (cell) float64 nan nan nan nan nan nan nan ... Comments (cell) object 'nan' 'nan' 'nan' 'nan' ... Double check (cell) float64 nan nan nan nan nan nan nan ... EXP_ID (cell) object '170925_A00111_0066_AH3TKNDMXX' ... Experiment ID (cell) object 'exp22' 'exp22' 'exp22' ... FACS.instument (cell) object 'Sony SIM1' 'Sony SIM1' ... FACS.selection (cell) object 'Multiple' 'Multiple' ... Location (cell) object 'MACA20_3' 'MACA20_3' ... Lysis Plate Batch (cell) object '20' '20' '20' '20' '20' ... Number of input reads (cell) int64 1229254 730274 1075370 ... Plate (cell) object '1' '1' '1' '1' '1' '1' '1' ... TAXON (cell) object 'mus' 'mus' 'mus' 'mus' ... Uniquely mapped reads number (cell) int64 1017682 634557 941828 1392029 ... WELL_MAPPING (cell) object 'B000126' 'B000126' ... counts (cell, gene) int64 0 0 0 0 442 0 0 0 0 0 0 ... dNTP.batch (cell) object '457912' '457912' '457912' ... date.prepared (cell) object '07-06-17' '07-06-17' ... date.sorted (cell) object '170707' '170707' '170707' ... log10 (cell, gene) float64 0.0 0.0 0.0 0.0 2.646 ... log2 (cell, gene) float64 0.0 0.0 0.0 0.0 8.791 ... mouse.age (cell) object '3' '3' '3' '3' '3' '3' '3' ... mouse.id (cell) object '3_39_F' '3_39_F' '3_39_F' ... mouse.number (cell) object '39' '39' '39' '39' '39' ... mouse.sex (cell) object 'F' 'F' 'F' 'F' 'F' 'F' 'F' ... nozzle.size (cell) object '100' '100' '100' '100' ... oligodT.order.no (cell) object '6/23/17 12757296' ... plate.type (cell) object 'Biorad HSP3901' ... preparation.site (cell) object 'Biohub' 'Biohub' 'Biohub' ... subtissue (cell) object 'nan' 'nan' 'nan' 'nan' ... tissue (cell) object 'Skin' 'Skin' 'Skin' 'Skin' ...

Not sure if it matters, but one detail is that I created ~250 individual datasets (each sized at ~300 samples x 20,000 features) and then used xr.concat(datasets, dim='cell') to concatenate them because I couldn't read them all into memory at once.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unicode strings unexpectedly transformed to byte strings upon `open_dataset` 266320445
338367929 https://github.com/pydata/xarray/issues/1638#issuecomment-338367929 https://api.github.com/repos/pydata/xarray/issues/1638 MDEyOklzc3VlQ29tbWVudDMzODM2NzkyOQ== shoyer 1217238 2017-10-21T06:27:57Z 2017-10-21T06:27:57Z MEMBER

Reading over Unidata/netcdf-c#402, it seems like we should probably copy the handling of _Encoding from netcdf4-python (Unidata/netcdf4-python#665) to our scipy interface. That would solve our problem of faithfully round-tripping data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unicode strings unexpectedly transformed to byte strings upon `open_dataset` 266320445
337426817 https://github.com/pydata/xarray/issues/1638#issuecomment-337426817 https://api.github.com/repos/pydata/xarray/issues/1638 MDEyOklzc3VlQ29tbWVudDMzNzQyNjgxNw== shoyer 1217238 2017-10-18T00:54:04Z 2017-10-18T00:54:04Z MEMBER

Which backend are you using to save the data? Try explicitly setting engine to either netcdf4, h5netcdf or scipy. I think h5netcdf may be your best bet but it's probably worth trying all of them. Sadly unicode strings in Python 3 / NumPy is still quite painful.

Also, how did all the Coordinates somehow get moved into Data variables ?

This looks like a bug of some sort -- not sure how that happened!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unicode strings unexpectedly transformed to byte strings upon `open_dataset` 266320445
337418537 https://github.com/pydata/xarray/issues/1638#issuecomment-337418537 https://api.github.com/repos/pydata/xarray/issues/1638 MDEyOklzc3VlQ29tbWVudDMzNzQxODUzNw== olgabot 806256 2017-10-18T00:17:25Z 2017-10-18T00:17:25Z NONE

Also, how did all the Coordinates somehow get moved into Data variables ?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unicode strings unexpectedly transformed to byte strings upon `open_dataset` 266320445

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.414ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows