issue_comments: 555745623
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/pull/2652#issuecomment-555745623 | https://api.github.com/repos/pydata/xarray/issues/2652 | 555745623 | MDEyOklzc3VlQ29tbWVudDU1NTc0NTYyMw== | 45787861 | 2019-11-19T22:27:10Z | 2019-11-19T23:00:31Z | NONE | Alright, I only got two merge conflicts in dataarray.py: minor merge conflict concerning imports: 1. accessors -> accessors_td 2. broadcast has been dropped in master?
```python
<<<<<<< HEAD
from . import (
computation,
dtypes,
groupby,
indexing,
ops,
pdcompat,
resample,
rolling,
utils,
)
from .accessor_dt import DatetimeAccessor
from .accessor_str import StringAccessor
from .alignment import (
_broadcast_helper,
_get_broadcast_dims_map_common_coords,
align,
reindex_like_indexers,
)
=======
from .accessors import DatetimeAccessor
from .alignment import align, reindex_like_indexers, broadcast
>>>>>>> added da.corr() and da.cov() to dataarray.py. Test added in test_dataarray.py, and tested using pytest.
```
Secondly, some bigger merge conflicts concerning some of dataarray's methods, but they seem to be not in conflict with each other:
1.
```
<<<<<<< HEAD
def integrate(
self, dim: Union[Hashable, Sequence[Hashable]], datetime_unit: str = None
) -> "DataArray":
""" integrate the array with the trapezoidal rule.
.. note::
This feature is limited to simple cartesian geometry, i.e. dim
must be one dimensional.
Parameters
----------
dim: hashable, or a sequence of hashable
Coordinate(s) used for the integration.
datetime_unit: str, optional
Can be used to specify the unit if datetime coordinate is used.
One of {'Y', 'M', 'W', 'D', 'h', 'm', 's', 'ms', 'us', 'ns', 'ps',
'fs', 'as'}
Returns
-------
integrated: DataArray
See also
--------
numpy.trapz: corresponding numpy function
Examples
--------
>>> da = xr.DataArray(np.arange(12).reshape(4, 3), dims=['x', 'y'],
... coords={'x': [0, 0.1, 1.1, 1.2]})
>>> da
<xarray.DataArray (x: 4, y: 3)>
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
Coordinates:
* x (x) float64 0.0 0.1 1.1 1.2
Dimensions without coordinates: y
>>>
>>> da.integrate('x')
<xarray.DataArray (y: 3)>
array([5.4, 6.6, 7.8])
Dimensions without coordinates: y
"""
ds = self._to_temp_dataset().integrate(dim, datetime_unit)
return self._from_temp_dataset(ds)
def unify_chunks(self) -> "DataArray":
""" Unify chunk size along all chunked dimensions of this DataArray.
Returns
-------
DataArray with consistent chunk sizes for all dask-array variables
See Also
--------
dask.array.core.unify_chunks
"""
ds = self._to_temp_dataset().unify_chunks()
return self._from_temp_dataset(ds)
def map_blocks(
self,
func: "Callable[..., T_DSorDA]",
args: Sequence[Any] = (),
kwargs: Mapping[str, Any] = None,
) -> "T_DSorDA":
"""
Apply a function to each chunk of this DataArray. This method is experimental
and its signature may change.
Parameters
----------
func: callable
User-provided function that accepts a DataArray as its first parameter. The
function will receive a subset of this DataArray, corresponding to one chunk
along each chunked dimension. ``func`` will be executed as
``func(obj_subset, *args, **kwargs)``.
The function will be first run on mocked-up data, that looks like this array
but has sizes 0, to determine properties of the returned object such as
dtype, variable names, new dimensions and new indexes (if any).
This function must return either a single DataArray or a single Dataset.
This function cannot change size of existing dimensions, or add new chunked
dimensions.
args: Sequence
Passed verbatim to func after unpacking, after the sliced DataArray. xarray
objects, if any, will not be split by chunks. Passing dask collections is
not allowed.
kwargs: Mapping
Passed verbatim to func after unpacking. xarray objects, if any, will not be
split by chunks. Passing dask collections is not allowed.
Returns
-------
A single DataArray or Dataset with dask backend, reassembled from the outputs of
the function.
Notes
-----
This method is designed for when one needs to manipulate a whole xarray object
within each chunk. In the more common case where one can work on numpy arrays,
it is recommended to use apply_ufunc.
If none of the variables in this DataArray is backed by dask, calling this
method is equivalent to calling ``func(self, *args, **kwargs)``.
See Also
--------
dask.array.map_blocks, xarray.apply_ufunc, xarray.map_blocks,
xarray.Dataset.map_blocks
"""
from .parallel import map_blocks
return map_blocks(func, self, args, kwargs)
# this needs to be at the end, or mypy will confuse with `str`
# https://mypy.readthedocs.io/en/latest/common_issues.html#dealing-with-conflicting-names
str = property(StringAccessor)
=======
def cov(self, other, dim = None):
"""Compute covariance between two DataArray objects along a shared dimension.
Parameters
----------
other: DataArray
The other array with which the covariance will be computed
dim: The dimension along which the covariance will be computed
Returns
-------
covariance: DataArray
"""
# 1. Broadcast the two arrays
self, other = broadcast(self, other)
# 2. Ignore the nans
valid_values = self.notnull() & other.notnull()
self = self.where(valid_values, drop=True)
other = other.where(valid_values, drop=True)
valid_count = valid_values.sum(dim)
#3. Compute mean and standard deviation along the given dim
demeaned_self = self - self.mean(dim = dim)
demeaned_other = other - other.mean(dim = dim)
#4. Compute covariance along the given dim
cov = (demeaned_self*demeaned_other).sum(dim=dim)/(valid_count)
return cov
def corr(self, other, dim = None):
"""Compute correlation between two DataArray objects along a shared dimension.
Parameters
----------
other: DataArray
The other array with which the correlation will be computed
dim: The dimension along which the correlation will be computed
Returns
-------
correlation: DataArray
"""
# 1. Broadcast the two arrays
self, other = broadcast(self, other)
# 2. Ignore the nans
valid_values = self.notnull() & other.notnull()
self = self.where(valid_values, drop=True)
other = other.where(valid_values, drop=True)
# 3. Compute correlation based on standard deviations and cov()
self_std = self.std(dim=dim)
other_std = other.std(dim=dim)
return self.cov(other, dim = dim)/(self_std*other_std)
>>>>>>> added da.corr() and da.cov() to dataarray.py. Test added in test_dataarray.py, and tested using pytest.
```
Can you please comment my suggested changes (accepting either changes from master or both, if no conflicts). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
396102183 |