home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where issue = 36467304 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer 9

issue 1

  • Modular encoding · 9 ✖

author_association 1

  • MEMBER 9
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
58424470 https://github.com/pydata/xarray/pull/175#issuecomment-58424470 https://api.github.com/repos/pydata/xarray/issues/175 MDEyOklzc3VlQ29tbWVudDU4NDI0NDcw shoyer 1217238 2014-10-08T20:44:15Z 2014-10-08T20:44:15Z MEMBER

continued in #245.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modular encoding 36467304
57926291 https://github.com/pydata/xarray/pull/175#issuecomment-57926291 https://api.github.com/repos/pydata/xarray/issues/175 MDEyOklzc3VlQ29tbWVudDU3OTI2Mjkx shoyer 1217238 2014-10-05T04:24:08Z 2014-10-05T04:24:08Z MEMBER

@akleeman I just read over my rebased versions of this patch again, and unfortunately, although there are some useful features here (missing value support and not writing trivial indexes), overall I don't think this is the right approach.

The idea of decoding data stores into "CF decoded" data stores is clever, but (1) it adds a large amount of indirection/complexity and (2) it's not even flexible enough (e.g., it won't suffice to decode coordinates, since those only exist on datasets). Functions for CF decoding/encoding that transform a datastore to a dataset directly (and vice versa), more similar to the existing design, seem like a better option overall.

As we discussed, adding an argument like an array_hook to open_dataset and to_netcdf (patterned off of object_hook from the json module) should suffice for at least our immediate custom encoding/decoding needs.

To make things more extensible, let's add a handful of utility functions/classes to the public API (e.g., NDArrayMixin), and break down existing functions like encode_cf_variable/decode_cf_variable into more modular/extensible components.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modular encoding 36467304
57374273 https://github.com/pydata/xarray/pull/175#issuecomment-57374273 https://api.github.com/repos/pydata/xarray/issues/175 MDEyOklzc3VlQ29tbWVudDU3Mzc0Mjcz shoyer 1217238 2014-09-30T20:07:36Z 2014-09-30T20:07:36Z MEMBER

OK, I made a brief attempt at trying to get my proposal to work, but I don't think I'm smart enough to figure out a completely general approach.

@akleeman what would be the minimal new API that would suffice for your needs? I would rather not commit ourselves to supporting a fully extensible approach to encoding/decoding in the public API at this point (mostly because I suspect we won't get it right).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modular encoding 36467304
56917795 https://github.com/pydata/xarray/pull/175#issuecomment-56917795 https://api.github.com/repos/pydata/xarray/issues/175 MDEyOklzc3VlQ29tbWVudDU2OTE3Nzk1 shoyer 1217238 2014-09-26T04:20:07Z 2014-09-26T04:20:07Z MEMBER

Here are some use case for encoding: 1. save or load a dataset to a file using conventions 2. encode or decode a dataset to facilitate loading to or from another library (e.g., Iris or CDAT) 3. load a dataset that doesn't quite satisfy conventions from disk, fix it up, and then decode it. 4. directly use the dataset constructor to input encoded data, and then decode it

This patch does 1 pretty well, but not the others. I think the cleanest way to handle everything would be to separate Conventions from DataStores. That way we could also let you write something like ds.decode(conventions=CFConventions) (or even just ds.decode('CF') or ds.decode() for short) to decode a dataset into another dataset.

So the user would only need to write something like that looks like this, instead of a subclass of AbstractEncodedDataStore:

class Conventions(object): def encode(self, arrays, attrs): return array, attrs def decode(self, arrays, attrs): return array, attrs

The bonus here is that Conventions doesn't need to relate to data stores at all, and there's no danger of undesireable coupling. We could even have xray.create_conventions(encoder, decoder) as a shortcut to writing the class.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modular encoding 36467304
56476268 https://github.com/pydata/xarray/pull/175#issuecomment-56476268 https://api.github.com/repos/pydata/xarray/issues/175 MDEyOklzc3VlQ29tbWVudDU2NDc2MjY4 shoyer 1217238 2014-09-23T04:55:08Z 2014-09-23T04:55:08Z MEMBER

I would like to clean up this PR and get it merged.

One useful functionality that would be good to handle at the same time is ensuring that we have an interface that lets us encode or decode a dataset in memory, not just when loading or saving to disk.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modular encoding 36467304
47284159 https://github.com/pydata/xarray/pull/175#issuecomment-47284159 https://api.github.com/repos/pydata/xarray/issues/175 MDEyOklzc3VlQ29tbWVudDQ3Mjg0MTU5 shoyer 1217238 2014-06-26T21:44:14Z 2014-06-26T21:44:14Z MEMBER

@leon-barrett any thoughts on the design here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modular encoding 36467304
47198948 https://github.com/pydata/xarray/pull/175#issuecomment-47198948 https://api.github.com/repos/pydata/xarray/issues/175 MDEyOklzc3VlQ29tbWVudDQ3MTk4OTQ4 shoyer 1217238 2014-06-26T08:14:26Z 2014-06-26T08:14:26Z MEMBER

Another possibility to think about which I think would be a bit cleaner if we could pull it off -- decode/encode could take and return two arguments (variables, attributes) instead of actual datastores.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modular encoding 36467304
47198553 https://github.com/pydata/xarray/pull/175#issuecomment-47198553 https://api.github.com/repos/pydata/xarray/issues/175 MDEyOklzc3VlQ29tbWVudDQ3MTk4NTUz shoyer 1217238 2014-06-26T08:09:06Z 2014-06-26T08:09:06Z MEMBER

Some additional thoughts on the advanced interface (we should discuss more when we're both awake):

The advanced interface should support specifying decoding or encoding only (e.g., if we only need to read some obscure format, not write it).

Instead of all these DataStore subclasses, what about having more generic Coder (singleton?) objects which have encode and/or decode methods, but that don't store any data? It is somewhat confusing to keep track of state with all these custom datastores.

A neat trick about coders rather than data stores is that they are simple enough that they could be easily composed, similar to a scikit-learn pipeline. For example, the default CFCoder could be written as something like:

CFCoder = Compose(TimeCoder, ScaleCoder, MaskCoder, DTypeCoder)

I suppose the painful part of using coders is the need to close files on disk to cleanup. Still, I would rather have a single EncodedDataStore class which is initialized with coder and underlying_store arguments (either classes or objects) rather than have the primary interface be writing custom AbstractEncodedDataStore subclasses. That feels like unnecessary complexity.

Instead of using the decorator, the interface could look something like:

CFNetCDFStore = EncodedDataStore(CFCoder, NetCDF4DataStore)

or

my_store = EncodedDataStore(CFCoder(**my_options), NetCDF4DataStore('test.nc'))

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modular encoding 36467304
47180876 https://github.com/pydata/xarray/pull/175#issuecomment-47180876 https://api.github.com/repos/pydata/xarray/issues/175 MDEyOklzc3VlQ29tbWVudDQ3MTgwODc2 shoyer 1217238 2014-06-26T02:10:31Z 2014-06-26T02:10:31Z MEMBER

Generally this looks very nice! I'm still mulling over the API -- it seems close but not quite right yet. In particular, using a decorator + inheritance to implement custom encoding seems like too much.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Modular encoding 36467304

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.899ms · About: xarray-datasette