pull_requests: 200571225
This data as json
id | node_id | number | state | locked | title | user | body | created_at | updated_at | closed_at | merged_at | merge_commit_sha | assignee | milestone | draft | head | base | author_association | auto_merge | repo | url | merged_by |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
200571225 | MDExOlB1bGxSZXF1ZXN0MjAwNTcxMjI1 | 2277 | closed | 0 | ENH: Scatter plots of one variable vs another | 6164157 | - [x] Closes #470 - [x] Tests added (for all bug fixes or enhancements) - [x] Tests passed (for all non-documentation changes) - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API - [x] Add support for `size`? - [x] Revert hue=datetime support bits Say you have two variables in a `Dataset` and you want to make a scatter plot of one vs the other, possibly using different hues and/or faceting. This is useful if you want to inspect the data to see whether two variables have some underlying relationships between them that you might have missed. It's something that I found myself manually writing the code for quite a few times, so I thought it would be better to have it as a feature. I'm not sure if this is actually useful for other people, but I have the feeling that it probably is. First, set up dataset with two variables: ```python import xarray as xr import numpy as np import matplotlib from matplotlib import pyplot as plt A = xr.DataArray(np.zeros([3, 11, 4, 4]), dims=[ 'x', 'y', 'z', 'w'], coords=[np.arange(3), np.linspace(0,1,11), np.arange(4), 0.1*np.random.randn(4)]) B = 0.1*A.x**2+A.y**2.5+0.1*A.z*A.w A = -0.1*A.x+A.y/(5+A.z)+A.w ds = xr.Dataset({'A':A, 'B':B}) ds['w'] = ['one', 'two', 'three', 'five'] ``` Now, we can plot all values of `A` vs all values of `B`: ```python plt.plot(A.values.flat,B.values.flat,'.') ```  What a mess. Wouldn't it be nice if you could color each point according to the value of some coordinate, say `w`? ```python ds.scatter(x='A',y='B', hue='w') ```  Huh! There seems to be some underlying structure there. Can we also facet over a different coordinate? ```python ds.scatter(x='A',y='B',col='x', hue='w') ```  or two coordinates? ```python ds.scatter(x='A',y='B',col='x', row='z', hue='w') ```  The logic is that dimensions that are not faceted/hue are just stacked using `xr.stack` and plotted. Only variables that have exactly the same dimensions are allowed. Regarding implementation -- I am certainly not sure about the API and I probably haven't thought about edge cases with missing data or nans or whatnot, so any input would be welcome. Also, there might be a simpler implementation by first using `to_array` and then using existing line plot functions, but I couldn't find it. | 2018-07-11T02:31:01Z | 2019-08-08T18:05:00Z | 2019-08-08T15:57:17Z | 2019-08-08T15:57:17Z | f172c6738ae4bc9802e08d355ea05ea6c47527ab | 0 | d56f7d13c9b82afbbe63734448e3594bfd06c940 | 8a9c4710b2ee389a41e08a665108aca05ef02544 | CONTRIBUTOR | 13221727 | https://github.com/pydata/xarray/pull/2277 |
Links from other tables
- 1 row from pull_requests_id in labels_pull_requests