home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 88868867 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • shoyer 2
  • richardotis 2

author_association 2

  • MEMBER 2
  • NONE 2

issue 1

  • Working with labeled N-dimensional data with combinatoric independent variables · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
112984626 https://github.com/pydata/xarray/issues/435#issuecomment-112984626 https://api.github.com/repos/pydata/xarray/issues/435 MDEyOklzc3VlQ29tbWVudDExMjk4NDYyNg== richardotis 6405510 2015-06-18T00:16:19Z 2015-06-18T00:16:19Z NONE

xray definitely seems to be the correct tool, as you suggested.

For the record, this is my first pass at coming up with the Dataset:

<xray.Dataset> Dimensions: (P: 20, T: 20, components: 4, id: 600, internal_dof: 9) Coordinates: * components (components) <U2 'AL' 'NI' 'CR' 'FE' * internal_dof (internal_dof) <U4 'AL_0' 'NI_0' 'CR_0' 'FE_0' 'AL_1' ... composition (id, components) float64 0.153 0.2138 0.2917 0.3415 0.316 ... Phase <U6 'FCC_A1' * P (P) float64 1e+05 1.833e+05 3.36e+05 6.158e+05 1.129e+06 ... * T (T) float64 300.0 347.4 394.7 442.1 489.5 536.8 584.2 ... * id (id) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... Data variables: constitution (id, internal_dof) float64 0.1296 0.2055 0.301 0.3639 ... energies (T, P, id) float64 -5.533e+03 -605.8 -2.507e+03 -8.546e+03 ...

Full notebook: https://github.com/richardotis/pycalphad/blob/178f150b492099c32e197b417c11729f12d6dfe8/research/xrayTest.ipynb

I decided I'm better off giving each phase its own Dataset, and when I need to do multi-phase operations I'll drop all the internal dimensions before I merge them. The result of to_dataframe(), with how it's getting rows and columns mixed up, makes me think I don't yet specify the optimal combination of coordinates and variables in the Dataset.

Some initial queries of the data seem to function well and at a fraction of the memory cost of the pandas-based approach, so I'm feeling optimistic here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Working with labeled N-dimensional data with combinatoric independent variables 88868867
112811117 https://github.com/pydata/xarray/issues/435#issuecomment-112811117 https://api.github.com/repos/pydata/xarray/issues/435 MDEyOklzc3VlQ29tbWVudDExMjgxMTExNw== richardotis 6405510 2015-06-17T13:58:59Z 2015-06-17T13:58:59Z NONE

Thank you for your thoughts. While composing my response I realized I'm actually concerned about two distinct data representation problems. 1. Energies computed for any number of phases at discrete temperatures, pressures and compositions in a system ("energy surface data"). This is intermediate data in the computation. 2. Result of constrained equilibrium computations using the data in (1)

The shape of the data in (2) would be something like (condition axis 1, condition axis 2, ..., condition axis n). Conditions can be independent or dependent variables (the solver can work backwards), and not all combinations of conditions result in an answer.

For example, I want to map the phase relations of a 4-component system in T-P-x space. I choose 50 temperatures and pressures, plus 100 points per independent composition axis (here I fix the total system size so that 1 composition variable is dependent). So then the shape of my equilibrium data would be (50, 50, 100, 100, 100). But what is the value of each element, the equilibrium result?

The equilibrium result is also multi-dimensional. I need to store the computed chemical potentials for each component (1-dimensional), the fraction of each stable phase (1-dimensional), and references to the corresponding physical states in the energy surface data (1-dimensional).

Going back to (1), phases can also have "internal" composition variables that map to the overall composition in a non-invertible way, i.e., two physical states can have the same overall composition but different internal compositions. The way I've been handling this is by adding more columns to my DataFrames, but it's not a sustainable approach for reasons we've both mentioned.

The data in (1) makes the most sense to me as a "ragged ndarray", where the internal degrees of freedom of each phase are free to be be different but still mapping to global composition coordinates. For (2), I imagine a "result object" bundled up inside all the conditions dimensions, but I need to be able to slice and search the derived/computed quantities just as easily as the independent variables.

Does xray make sense for either or both of these cases?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Working with labeled N-dimensional data with combinatoric independent variables 88868867
112621215 https://github.com/pydata/xarray/issues/435#issuecomment-112621215 https://api.github.com/repos/pydata/xarray/issues/435 MDEyOklzc3VlQ29tbWVudDExMjYyMTIxNQ== shoyer 1217238 2015-06-17T01:45:18Z 2015-06-17T01:45:18Z MEMBER

To elaborate: even though both pandas and xray use numpy under the hood, I suspect you may see a performance benefit if you switch from pandas to xray, for three reasons: 1. as you noted, you will no longer need repeats for all those independent variables 2. flattening to put things in a 1D column can require a copy (if the data is not already C-contiguous) 3. pandas also often makes copies when you add new dataframe columns, because it tries to consolidate adjacent columns into the same type

To answer your other question about retrieving results for specific conditions: once you put things in xray dataset, that should be as simple as ds.sel(P=100000, T=300).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Working with labeled N-dimensional data with combinatoric independent variables 88868867
112617486 https://github.com/pydata/xarray/issues/435#issuecomment-112617486 https://api.github.com/repos/pydata/xarray/issues/435 MDEyOklzc3VlQ29tbWVudDExMjYxNzQ4Ng== shoyer 1217238 2015-06-17T01:10:45Z 2015-06-17T01:10:45Z MEMBER

I suspect that an xray.Dataset would indeed be a suitable data structure for your data.

If each of the columns in the data dataframe from your notebook were an numpy array, what would their shapes be?

As for iterative updates, arrays in xray objects can be efficiently modified in place just like numpy arrays.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Working with labeled N-dimensional data with combinatoric independent variables 88868867

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3596.154ms · About: xarray-datasette