home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

25 rows where author_association = "NONE" and user = 12912489 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 10

  • Fix lag in Jupyter caused by CSS in `_repr_html_` 8
  • Awkward array backend? 7
  • How should xarray use/support sparse arrays? 2
  • [Proposal] Expose Variable without Pandas dependency 2
  • Allow DataArray to hold cell boundaries as coordinate variables 1
  • WIP: html repr 1
  • NEP 18, physical units, uncertainties, and the scipp library? 1
  • Duck array compatibility meeting 1
  • Xarray ignores the underlying unit of "datetime64" types. 1
  • Should Xarray stop doing automatic index-based alignment? 1

user 1

  • SimonHeybrock · 25 ✖

author_association 1

  • NONE · 25 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1370563894 https://github.com/pydata/xarray/issues/1475#issuecomment-1370563894 https://api.github.com/repos/pydata/xarray/issues/1475 IC_kwDOAMm_X85RsSU2 SimonHeybrock 12912489 2023-01-04T07:20:36Z 2023-01-04T07:20:36Z NONE

Recently I experimented with an (incomplete) duck-array prototype, wrapping an array of length N+1 in a duck array of length N (such that you can use it as a coordinate for a DataArray of length/shape N). It mostly worked (even though there may be some issues when you want to use it as an xarray index).

See https://github.com/scipp/scippx/blob/main/src/scippx/bin_edge_array.py (there is a bunch of unrelated stuff in the repo, you can mostly ignore that).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow DataArray to hold cell boundaries as coordinate variables 242181620
1288374461 https://github.com/pydata/xarray/issues/4285#issuecomment-1288374461 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85Mywi9 SimonHeybrock 12912489 2022-10-24T03:44:44Z 2022-11-03T17:04:15Z NONE

Also note the Ragged Array Summit on Scientific Python.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1283416324 https://github.com/pydata/xarray/issues/4285#issuecomment-1283416324 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85Mf2EE SimonHeybrock 12912489 2022-10-19T04:39:06Z 2022-10-19T04:39:06Z NONE

A possibly relevant distinction that had not occurred to me previously is the example by @milancurcic: If I understand this correctly then this type of data is essentially an array of variable-length time-series (essentially a list of lists?), i.e., there is an order within each inner list. This is conceptually different from the data I am typically dealing with, where each inner list is a list of records without specific ordering.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1251836989 https://github.com/pydata/xarray/issues/7045#issuecomment-1251836989 https://api.github.com/repos/pydata/xarray/issues/7045 IC_kwDOAMm_X85KnYQ9 SimonHeybrock 12912489 2022-09-20T04:48:07Z 2022-09-20T06:13:32Z NONE

This suggestion looks roughly like what we are discussing in https://github.com/pydata/xarray/discussions/7041#discussioncomment-3662179, i.e., using a custom index that avoids this? So maybe the question here is whether such an ArrayIndex should be the default?

Aside from that, with my outside perspective (having used Xarray extremely little, looking at the docs and code occasionally, but developing a similar library that does not have indexes):

Indexes (including alignment behavior) feel like a massive complication of Xarray, both conceptually (which includes documentation and teaching efforts) as well as code. If all you require is the ArrayIndex behavior (i.e., exact coord comparison in operations) then the entire concept of indexes is just ballast, distraction in the documentation, and confusion. Example: Why can't we use loc/sel with a non-dimension (non-index) coord? --- without index we would just search the coord with no need to limit this to index-coords, and this is often fast enough?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should Xarray stop doing automatic index-based alignment? 1376109308
1243222416 https://github.com/pydata/xarray/issues/3981#issuecomment-1243222416 https://api.github.com/repos/pydata/xarray/issues/3981 IC_kwDOAMm_X85KGhGQ SimonHeybrock 12912489 2022-09-12T04:59:42Z 2022-09-12T04:59:42Z NONE

I note that xarray.Variable also provides attrs. Would it make sense to separate this aspect from the labelled dims? That is, instead of extracting this as a single library, turn it into two, such that users can pick one or both depending on their needs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [Proposal] Expose Variable without Pandas dependency 602256880
1243218951 https://github.com/pydata/xarray/issues/3981#issuecomment-1243218951 https://api.github.com/repos/pydata/xarray/issues/3981 IC_kwDOAMm_X85KGgQH SimonHeybrock 12912489 2022-09-12T04:51:23Z 2022-09-12T04:55:13Z NONE

This is something I am getting more and more interested in. We (scipp) currently have a C++ implementation (with Pything bindings) of a simpler version of xarray.Variable. I am starting considerations of moving more of this to the Python side. So I would like to hear about the status of this?

While I am still far from having reached a conclusion (or convincing anyone here to support this), investing in technology that is adopted and carried by the community is considered important here. In other words, we may in principle be able to help out and invest some time into this.

One important precondition would be full compatibility with other custom array containers: For our applications we do not just need to add labelled axes, but also units, masks, bin edges, and ragged data support. I am currently toying with the idea of a "stack" of Python array libraries (I guess you would call them duck arrays?) that add these features one by one, selectively, but can all be used also independently --- unlike Scipp, where you get all or nothing, and lose the ability of using NumPy (or other) array libraries under the hood. Each of those libraries could be small and simple, focussing one just one specific aspect, but everything should be composable. For example, we can imagine a Variable with a pint array for having units as well as labelled dimensions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [Proposal] Expose Variable without Pandas dependency 602256880
552756428 https://github.com/pydata/xarray/issues/3509#issuecomment-552756428 https://api.github.com/repos/pydata/xarray/issues/3509 MDEyOklzc3VlQ29tbWVudDU1Mjc1NjQyOA== SimonHeybrock 12912489 2019-11-12T06:37:20Z 2022-09-09T13:08:45Z NONE

@jthielen Thanks for your reply! I am not familiar with pint and uncertainties so I cannot go in much detail there, so this is just generally speaking:

Units

I do not see any advantage using scipp. The current unit system in scipp is based on boost::units, which is very powerful (supporting custom units, heterogeneous systems, ...), but unfortunately it is a compile-time library (EDIT 2022: This does not apply any more since we have long switched to a runtime units library). I would imagine we would need to wrap another library to become more flexible (we could even consider wrapping something like pint's unit implementation).

Uncertainties

There are two routes to take here:

1. Store a single array of value/variance pairs

  • Propagation of uncertainties is "fast by default".
  • Probably harder to vectorize (SIMD) since data layout implies interleaved values. In practice this is unlikely to be relevant, since many workloads are just limited by memory bandwidth and cache sizes, so vectorization is not crucial in my experience.

2. Store two arrays (values array and uncertainties array)

  • This is what scipp does.
  • Special care must be taken when implementing propagation of uncertainties: Naive implementation based on operating with arrays will lead to massive performance loss (I have seen 10x or more) for things like multiplication (there is no penalty for addition and subtraction).
  • In practice this is not hard to do, we simply need to avoid computing the result's values and variances in two steps and put everything into a single loop. This avoids allocation of temporaries and loading / storing from memory multiple times.
  • Scipp does this, and does not sacrifice any performance.
  • Save 2x in performance when operating only with values, even if variances are present.
  • Can add/remove variances independently, e.g., if no longer needed, avoiding copies.
  • Can use existing numpy code to operate directly with values and variances (could probably be done in case 1., with a stride, loosing some efficiency).

Other aspects

Scipp supports a generic transform-type operation that can apply an arbitrary lambda to variables (units + values array + variances array). - This is done at compile-time and therefore static. It does however allow for very quick addition of new compound operations that propagate units and uncertainties. - For example, we could generate an operation sqrt(a*a + b*b): - automatically written using a single loop => fast - gives the correct output units - propagates uncertainties - does all the broadcasting and transposing - Not using expression templates, in case anyone asks.

Other

  • scipp.Variable includes the dimension labels and operations can do broadcasting and transposition, yielding good performance. I am not sure if this an advantage or a drawback in this case? Would need to look more into the inner workings of xarray and the __array_function__ protocol.

  • Scipp is written in C++ with performance in mind. That being said, it is not terribly difficult to achieve good performance in these cases since many workloads are bound by memory bandwidth (and probably dozens of other libraries have done so).

Questions

  • What is pint's approach to uncertainties?
  • Have you looked at the performance? Is performance relevant for you in these cases?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  NEP 18, physical units, uncertainties, and the scipp library? 520815068
1222318201 https://github.com/pydata/xarray/issues/6591#issuecomment-1222318201 https://api.github.com/repos/pydata/xarray/issues/6591 IC_kwDOAMm_X85I2xh5 SimonHeybrock 12912489 2022-08-22T12:53:22Z 2022-08-22T12:53:22Z NONE

Note duplicate (or related): #5750

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Xarray ignores the underlying unit of "datetime64" types. 1232587833
1216208075 https://github.com/pydata/xarray/issues/4285#issuecomment-1216208075 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85IfdzL SimonHeybrock 12912489 2022-08-16T06:38:32Z 2022-08-16T06:42:28Z NONE

@jpivarski

Support for event data, a particular form of sparse data.

I might have been misinterpreting the word "sparse data" in conversations about this. I had thought that "sparse data" is logically rectilinear but represented in memory with the zeros removed, so the internal machinery has to deal with irregular structures, but the outward API it presents is regular (dimensionality is completely described by a shape: tuple[int]).

You are right that "sparse" is misleading. Since it is indeed most commonly used for sparse matrix/array representations we are now usually avoiding this term (and refer to it as binned data, or ragged data instead). Obviously our title page needs an update 😬 .

logically rectilinear

This does actually apply to Scipp's binned data. A scipp.Variable may have shape=(N,M) and be "ragged". But the "ragged" dimension is in addition to the two regular dimensions. That is, in this case we have (conceptually) a 2-D array of lists.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1216107702 https://github.com/pydata/xarray/issues/4285#issuecomment-1216107702 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85IfFS2 SimonHeybrock 12912489 2022-08-16T03:43:29Z 2022-08-16T05:11:50Z NONE
  1. Generalise xarray to allow for variable-length dimensions

This seems hard. Xarray's whole model is built assuming that dims has type Mapping[Hashable, int]. It also breaks our normal concept of alignment, which we need to put coordinate variables in DataArrays alongside data variables.

Anecdotal evidence that this is indeed not a good solution:

scipp's "ragged data" implementation was originally implemented with such a variable-length dimension support. This led to a whole series of problems, including significantly complicating scipp.DataArray, both in terms of code and conceptually. After this experience we switched to the current model, which exposes only the regular, aligned dimensions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1216144957 https://github.com/pydata/xarray/issues/4285#issuecomment-1216144957 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85IfOY9 SimonHeybrock 12912489 2022-08-16T04:54:25Z 2022-08-16T04:54:25Z NONE

Is anyone here going to EuroScipy (two weeks from now) and interested in having a chat/discussion about ragged data?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1216125098 https://github.com/pydata/xarray/issues/4285#issuecomment-1216125098 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85IfJiq SimonHeybrock 12912489 2022-08-16T04:17:52Z 2022-08-16T04:17:52Z NONE

@danielballan mentioned that the photon community (synchrotrons/X-ray scattering) is starting to talk more and more about ragged data related to "event mode" data collection as well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1216123818 https://github.com/pydata/xarray/issues/4285#issuecomment-1216123818 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85IfJOq SimonHeybrock 12912489 2022-08-16T04:15:24Z 2022-08-16T04:15:24Z NONE

5. Neutron scattering data

Scipp is an xarray-like labelled data structure for neutron scattering experiment data. On their FAQ Q titled "Why is xarray not enough", one of the things they quote is

Support for event data, a particular form of sparse data. More concretely, this is essentially a 1-D (or N-D) array of random-length lists, with very small list entries. This type of data arises in time-resolved detection of neutrons in pixelated detectors.

Would a RaggedArray class that's wrappable in xarray help with this? (cc @SimonHeybrock)

Partially, but the bigger challenge may be the related algorithms, e.g., for getting data into this layout, and for switching to other ragged layouts.

For context, one of the main reasons for our data layout is the ability to make cuts/slices quickly. We frequently deal with 2-D, 3-D, and 4-D data. For example, a 3-D case may be be the momentum transfer $\vec Q$ in a scattering process, with a "record" for every detected neutron. Desired final resolution may exceed 1000 per dimension (of the 3 components of $\vec Q$). On top of this there may be additional dimensions relating to environment parameters of the sample under study, such as temperature, pressure, or strain. This would lead to bin-counts that cannot be handled easily (in single-node memory).

A naive solution could be to simply work with something like pandas.DataFrame, with columns for the components of $\vec Q$ as well as the sample environment parameters. Those could then be used for grouping/histogramming to the desired 2-D cuts or slices. However, as frequently many such slices or required this can quickly become inefficient (though there is certainly cases where it would work well, providing a simpler solution that scipp).

Scipp's ragged data can be considered a "partial sorting", to build a sort of "index". Based on all this we can then, e.g., quickly compute high-resolution cuts. Say we are in 3-D (Qx, Qy, Qz). We would not have bin sizes that match the final resolution required by the science. Instead we could use 50x50x50 bins. Then we can very quickly produce a high-res 2-D plot (say (1000x1000), Qx, Qz or whatever), since our binned data format reduces the data/memory you have to load and consider by a factor of up to 50 (in this example).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
634558423 https://github.com/pydata/xarray/issues/3213#issuecomment-634558423 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDYzNDU1ODQyMw== SimonHeybrock 12912489 2020-05-27T10:00:25Z 2021-10-15T04:38:25Z NONE

@pnsaevik If the approach we adopt in scipp could be ported to xarray you would be able to to something like (assuming that the ragged array representation you have in mind is "list of lists"):

```python data = my_load_netcdf(...) # list of lists

assume 'x' is the dimension of the nested lists

bin_edges = sc.Variable(dims=['x'], values=[0.1,0.3,0.5,0.7,0.9]) realigned = sc.realign(data, {'x':bin_edges}) filtered = realigned['x', 1:3].copy() my_store_netcdf(filtered.unaligned, ...) ``` Basically, we have slicing for the "realigned" wrapper. It performs a filter operation when copied.

Edit 2021: Above example is very outdated, we have cleaned up the mechanism, see https://scipp.github.io/user-guide/binned-data/binned-data.html.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
632536798 https://github.com/pydata/xarray/issues/3213#issuecomment-632536798 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDYzMjUzNjc5OA== SimonHeybrock 12912489 2020-05-22T07:20:35Z 2021-10-15T04:36:17Z NONE

I am not familiar with the details of the various applications people in this discussion have, but here is an approach we are taking, trying to solve variations of the problem "data scattered in multi-dimensional space" or irregular time-series data. See https://scipp.github.io/user-guide/binned-data/binned-data.html for an illustrated description.

The basic idea is to keep data in a linear representation and wrap it in a "realigned" wrapper. One reason for this development was to provide a pathway to use dask with our type of data (independent time series at a large number of points in space, with chunking along the "time-series", which is not a dimension since every time series has a different length). With the linked approach we could use dask to distribute the linear underlying representation, keeping the lightweight realigned wrapper on all workers. We are still in early experimentation with this (the dask part is not actually in development yet). It probably has performance issues if more than "millions" of points are realigned --- our case is millions of time series with thousands/millions of time points in each, but the two do not mix (not both are realigned, and if they are it is independently), so we do not run into the performance issue in most cases.

In principle I could imagine this non-destructive realignment approach could be mapped to xarray, so it may be of interest to people here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
890459710 https://github.com/pydata/xarray/issues/5648#issuecomment-890459710 https://api.github.com/repos/pydata/xarray/issues/5648 IC_kwDOAMm_X841E1Y- SimonHeybrock 12912489 2021-08-01T06:12:19Z 2021-08-01T06:12:19Z NONE
* scipp (@SimonHeybrock, xref [NEP 18, physical units, uncertainties, and the scipp library? #3509](https://github.com/pydata/xarray/issues/3509))

Thanks! I am definitely interested.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Duck array compatibility meeting 956103236
872054936 https://github.com/pydata/xarray/pull/5201#issuecomment-872054936 https://api.github.com/repos/pydata/xarray/issues/5201 MDEyOklzc3VlQ29tbWVudDg3MjA1NDkzNg== SimonHeybrock 12912489 2021-07-01T08:49:04Z 2021-07-01T08:49:04Z NONE

Before:

After:

On the top band, I have used the screenshot-timeline to zoom onto the time window where the cell is being executed (marked with [*]), before the new output is displayed. You should be able to see that the time-scale is bastly different.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix lag in Jupyter caused by CSS in `_repr_html_` 863506023
872037862 https://github.com/pydata/xarray/pull/5201#issuecomment-872037862 https://api.github.com/repos/pydata/xarray/issues/5201 MDEyOklzc3VlQ29tbWVudDg3MjAzNzg2Mg== SimonHeybrock 12912489 2021-07-01T08:25:12Z 2021-07-01T08:25:12Z NONE

Maybe can we measure the first-loading time? I observe the first-loading time is very long...

I think this is also a problem, but I believe this is independent and not improved by the CSS changes in this branch. Maybe a Jupyter issue and not related to libraries in use?

The only way I was able to see it was to use the Web Dev tools that come as part of Firefox or Chrome.

Can you tell me more about this? I'll try to reproduce and measure the performance.

So I had used Chrome, open "Developer Tools" > "Performance" tab: - start recording a profile - run a cell the displays HTML output - stop profile

I think I had observed a difference in the "Render" part of the profile, but I cannot check now (I may be able later today when I am back to my main computer).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix lag in Jupyter caused by CSS in `_repr_html_` 863506023
872013761 https://github.com/pydata/xarray/pull/5201#issuecomment-872013761 https://api.github.com/repos/pydata/xarray/issues/5201 MDEyOklzc3VlQ29tbWVudDg3MjAxMzc2MQ== SimonHeybrock 12912489 2021-07-01T07:54:15Z 2021-07-01T07:54:15Z NONE

Indeed, such timings do not include the CSS timings. The only way I was able to see it was to use the Web Dev tools that come as part of Firefox or Chrome. You should be able to see the timings included there when recording a profile.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix lag in Jupyter caused by CSS in `_repr_html_` 863506023
871996744 https://github.com/pydata/xarray/pull/5201#issuecomment-871996744 https://api.github.com/repos/pydata/xarray/issues/5201 MDEyOklzc3VlQ29tbWVudDg3MTk5Njc0NA== SimonHeybrock 12912489 2021-07-01T07:26:50Z 2021-07-01T07:26:50Z NONE

@fujiisoup Maybe I missed it in the video, but did you try if there are differences when running an individual cell, not just when loading the page the first time? My point is: - When a page is first loaded, obviously the CSS for everything (all cells) has to be processed. That cannot be changed. - When updated a single cell, prior to this branch, it triggered CSS changes for all cells. - With this branch, only current cell should be affected.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix lag in Jupyter caused by CSS in `_repr_html_` 863506023
828259218 https://github.com/pydata/xarray/pull/5201#issuecomment-828259218 https://api.github.com/repos/pydata/xarray/issues/5201 MDEyOklzc3VlQ29tbWVudDgyODI1OTIxOA== SimonHeybrock 12912489 2021-04-28T08:27:14Z 2021-04-28T08:27:14Z NONE

Or not quite: The DOM seems to end at editor-instance, which is the whole Jupyter part of the Window. I cannot seem to access anything below that using the Developer Tools.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix lag in Jupyter caused by CSS in `_repr_html_` 863506023
828236723 https://github.com/pydata/xarray/pull/5201#issuecomment-828236723 https://api.github.com/repos/pydata/xarray/issues/5201 MDEyOklzc3VlQ29tbWVudDgyODIzNjcyMw== SimonHeybrock 12912489 2021-04-28T07:55:23Z 2021-04-28T07:55:23Z NONE

Cheers, that was what I was looking for!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix lag in Jupyter caused by CSS in `_repr_html_` 863506023
828216637 https://github.com/pydata/xarray/pull/5201#issuecomment-828216637 https://api.github.com/repos/pydata/xarray/issues/5201 MDEyOklzc3VlQ29tbWVudDgyODIxNjYzNw== SimonHeybrock 12912489 2021-04-28T07:26:41Z 2021-04-28T07:28:06Z NONE

Ok, I tried, but got stuck: I can reproduce the issue in VScode. However, I cannot find a way to inspect the CSS in VScode's Jupyter console. The theme itself is a json and I cannot figure how this is translated into CSS.

We somehow need to detect the theme within xr-wrap and change colors accordingly. That would require checking if a parent/grandparent/... is something defined by VScode, or by checking if custom properties exist. Does someone know how to access the actual HTML/CSS in VScode? In a normal notebook I can use, e.g., the Firefox web-dev tools to do this directly in the browser, but I cannot find anything equivalent in VScode.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix lag in Jupyter caused by CSS in `_repr_html_` 863506023
826492174 https://github.com/pydata/xarray/pull/5201#issuecomment-826492174 https://api.github.com/repos/pydata/xarray/issues/5201 MDEyOklzc3VlQ29tbWVudDgyNjQ5MjE3NA== SimonHeybrock 12912489 2021-04-26T04:29:00Z 2021-04-26T04:29:00Z NONE

I don't have VS code so I can't try, but looking at the CSS I feel that this would actually break the colors there, since I moved the general settings from root into xr-wrap, below the level where the vscode-dark settings are defined. I don't know how to fix this though.

So I would recommend not to merge this unless someone is able to try it out.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix lag in Jupyter caused by CSS in `_repr_html_` 863506023
526541433 https://github.com/pydata/xarray/pull/1820#issuecomment-526541433 https://api.github.com/repos/pydata/xarray/issues/1820 MDEyOklzc3VlQ29tbWVudDUyNjU0MTQzMw== SimonHeybrock 12912489 2019-08-30T09:55:16Z 2019-08-30T09:55:16Z NONE

I was just following the new draft dask repr, and it seems the tools are in place to be able to autogenerate a html repr of a full xarray dataset which includes an image, e.g. autogenerate something like:

It seems to me @benbovy that 90% of your ToDo list is nice-to-have or special-case stuff which can be left for later? The main thing that has to be done before merging is tests? If that bare-bones version gets merged (even as a hidden feature) then others can start having a go at adding images like dask?

We have done something similar using inline svg (see, e.g., https://scipp.readthedocs.io/en/latest/user-guide/data-structures.html#Dataset). It is basically a hack for testing right now, but is sufficient for auto-generated illustration in the documentation.

I am pretty impressed by the html representation previewed in https://github.com/pydata/xarray/issues/1627. Since our data structures are very similar I would be happy to contribute to this output rendering somehow, since we could then also benefit from it (with a few tweaks, probably). So let me know if I can help out somehow (unfortunately I do not know much html and css, just C++ and a bit of Python).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: html repr 287844110

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 19.045ms · About: xarray-datasette