home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

8 rows where user = 30219501 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 7
  • open 1

type 1

  • issue 8

repo 1

  • xarray 8
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
332018176 MDU6SXNzdWUzMzIwMTgxNzY= 2231 Time bounds returned after an operation with resample-method rpnaut 30219501 open 0     8 2018-06-13T14:22:49Z 2022-04-17T23:43:48Z   NONE      

Problem description

For datamining with xarray there is always the following issue with the resampling-method. If i resample e.g. a timeseries with hourly values to monthly values, the netcdf-standards tell us to put into the result file information about:

  1. the bounds for each timestep over which the aggregation was taken (for each month the beginning and the end of the month)
  2. the method which was used for aggregation decoded by the variable attribute 'cell_method' (e.g. 'time: mean').

The recent implementation should be improved which is proven by the following data example.

Data example

I have a dataset with hourly values over a period of 5 month. python <xarray.Dataset> Dimensions: (bnds: 2, time: 3672) Coordinates: rlon float32 22.06 rlat float32 5.06 * time (time) datetime64[ns] 2006-05-01 2006-05-01T01:00:00 ... Dimensions without coordinates: bnds Data variables: rotated_pole int32 1 time_bnds (time, bnds) float64 1.304e+07 1.305e+07 1.305e+07 ... TOT_PREC (time) float64 nan nan nan nan nan nan nan nan nan nan nan ... Attributes: Doing a resample process using the mean operator gives

In [36]: frs Out[36]: <xarray.Dataset> Dimensions: (bnds: 2, time: 5) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... Dimensions without coordinates: bnds Data variables: rotated_pole (time) float64 1.0 1.0 1.0 1.0 1.0 time_bnds (time, bnds) float64 1.438e+07 1.438e+07 1.702e+07 ... TOT_PREC (time) float64 12.0 nan nan nan nan

Here the time_bnds is still in the file but the content is very strange: In [37]: frs["time_bnds"] Out[37]: <xarray.DataArray 'time_bnds' (time: 5, bnds: 2)> array([[ 1.438020e+07, 1.438380e+07], [ 1.701540e+07, 1.701900e+07], [ 1.965060e+07, 1.965420e+07], [ 2.232900e+07, 2.233260e+07], [ -6.330338e+10, -6.330338e+10]]) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... Dimensions without coordinates: bnds So, he still knows that time_bnds is related to the coordinate time. However, the values are not correct. The first time_bnds entry should be [1.5.2006 00:00,31.5.2006 23:00]. That is definitely not the case, i.e. the numbers here are related to the original file (seconds since 2005-12-01), but they do not match to my expection. 1.438020e+07 equals "Dienstag, 16. Mai 2006, 10:30:00" and 1.438380e+07 equals "Dienstag, 16. Mai 2006, 11:30:00". Moreover, the xarray's do not consider to change the unit of the time_bnds according the unit of the variable 'time' if data is written to netcdf. Output of the program ncdump reveals that time was changed to days since but time_bnds seems to be still coded in "seconds since".

``` ncdump -v time_bnds try.nc netcdf try { dimensions: time = 5 ; bnds = 2 ; variables: double rotated_pole(time) ; rotated_pole:_FillValue = NaN ; double time_bnds(time, bnds) ; time_bnds:_FillValue = NaN ; double TOT_PREC(time) ; TOT_PREC:_FillValue = NaN ; int64 time(time) ; time:units = "days since 2006-05-31 00:00:00" ; time:calendar = "proleptic_gregorian" ; data:

time_bnds = 14380200, 14383800, 17015400, 17019000, 19650600, 19654200, 22329000, 22332600, -63303379200, -63303379200 ; } ```

Is there a recommendation what to do?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2231/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
427644858 MDU6SXNzdWU0Mjc2NDQ4NTg= 2861 WHERE function, problems with memory operations? rpnaut 30219501 closed 0     8 2019-04-01T11:09:11Z 2022-04-09T15:41:51Z 2022-04-09T15:41:51Z NONE      

I am facing with the where-functionality in xarray. I have two datasets

ref = array([[14.82, 14.94, nan, ..., 16.21, 16.24, nan], [14.52, 14.97, nan, ..., 16.32, 16.34, nan], [15.72, 16.09, nan, ..., 17.38, 17.44, nan], ..., [ 6.55, 6.34, nan, ..., 6.67, 6.6 , nan], [ 8.76, 9.12, nan, ..., 9.07, 9.52, nan], [ 8.15, 8.97, nan, ..., 9.65, 9.52, nan]], dtype=float32) Coordinates: * height_WSS (height_WSS) float32 40.3 50.3 60.3 70.3 80.3 90.3 101.2 105.0 lat float32 54.01472 lon float32 6.5875 * time (time) datetime64[ns] 2006-10-31T00:10:00 ... 2006-11-03T23:10:00 Attributes: standard_name: wind_speed long_name: wind speed units: m s-1 cell_methods: time: mean comment: direction of the boom holding the measurement devices: 41... sensor: cup anemometer sensor_type: Vector Instruments Windspeed Ltd. A100LK/PC3/WR accuracy: 0.1 m s-1

and

proof= <xarray.DataArray 'WSS' (time: 96, height_WSS: 8)> array([[13.395692, 13.653825, 13.911958, ..., 14.511758, 14.703774, 14.770716], [14.740592, 15.010887, 15.281183, ..., 15.866542, 16.045753, 16.10823 ], [15.241853, 15.523318, 15.804785, ..., 16.417458, 16.605673, 16.67129 ], ..., [ 8.254081, 8.309716, 8.365352, ..., 8.46401 , 8.489728, 8.498694], [ 9.83241 , 9.895019, 9.957627, ..., 10.055538, 10.077768, 10.085519], [ 8.772054, 8.849378, 8.926702, ..., 9.065577, 9.102219, 9.114992]], dtype=float32) Coordinates: * time (time) datetime64[ns] 2006-10-31T00:10:00 ... 2006-11-03T23:10:00 lon float32 6.5875 lat float32 54.01472 * height_WSS (height_WSS) float32 40.3 50.3 60.3 70.3 80.3 90.3 101.2 105.0 Attributes: standard_name: wind_speed long_name: wind speed units: m s-1

Applying something like this: DSproof = proof["WSS"].where(ref["WSS"].notnull()).to_dataset(name="WSS")

gives me a dataarray of time length zero: <xarray.Dataset> Dimensions: (height_WSS: 8, time: 0) Coordinates: * time (time) datetime64[ns] lon float32 6.5875 lat float32 54.01472 * height_WSS (height_WSS) float32 40.3 50.3 60.3 70.3 80.3 90.3 101.2 105.0 Data variables: WSS (time, height_WSS) float32

Problem description

The problem seems to be that 'ref' and 'proof' are not entirely consistent somehow regarding coordinates. But if a subtract the coordinates from each other I do not get a difference. However, as I always fight with getting datasets consistent to each other for mathematical calculations with xarray, I have figured out following workarounds:

  1. One can drop the coordinates lon and lat from both datasets. Then everything works fine with 'where'.
  2. I am using WHERE in a large script with some operations done before WHERE is called. One operation is to make the data types and the coordinate names between 'ref' and 'proof' consistent (thatswhy the above output is very similar). If I save the files and reload them immediately before applying WHERE, it fixes my problem.
  3. Using a selection of all height levels proof["WSS"].isel(height=slice(0,9).where(ref["WSS"].isel(height=slice(0,9).notnull()).to_dataset(name="WSS") also fixes my problem.

Maybe, here I deal with a problem of incomplete operations in memory? The printout between datasets is maybe consistent but still an additional operation on the datasets is required to make the datasets consistent in memory?

Thanks in advance for your help

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2861/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
428180638 MDU6SXNzdWU0MjgxODA2Mzg= 2863 Memory Error for simple operations on NETCDF4 internally zipped files rpnaut 30219501 closed 0     3 2019-04-02T11:48:01Z 2022-04-09T02:15:45Z 2022-04-09T02:15:45Z NONE      

Assuming you want to make easy computations with a data array loaded from internally zipped NETCDF4 files, you need at first to load a dataset: In [2]: eobs = xarray.open_dataset("eObs_ens_mean_0.1deg_reg_v18.0e.T_2M.1950-2018.nc") In [3]: eobs Out[3]: <xarray.Dataset> Dimensions: (lat: 465, lon: 705, time: 25049) Coordinates: * time (time) datetime64[ns] 1950-01-01 1950-01-02 1950-01-03 ... * lon (lon) float64 -24.95 -24.85 -24.75 -24.65 -24.55 -24.45 -24.35 ... * lat (lat) float64 25.05 25.15 25.25 25.35 25.45 25.55 25.65 25.75 ... Data variables: T_2M (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ... Attributes: _NCProperties: version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17 E-OBS_version: 18.0e Conventions: CF-1.4 References: http://surfobs.climate.copernicus.eu/dataaccess/access_eo...

Afterwards I have tried to do this: ``` In [4]: datarray=eobs["T_2M"]+273.15


MemoryError Traceback (most recent call last) <ipython-input-4-eaff3bff5e27> in <module>() ----> 1 datarray=eobs["T_2M"]+273.15

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/dataarray.py in func(self, other) 1539 1540 variable = (f(self.variable, other_variable) -> 1541 if not reflexive 1542 else f(other_variable, self.variable)) 1543 coords = self.coords._merge_raw(other_coords)

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in func(self, other) 1139 if isinstance(other, (xr.DataArray, xr.Dataset)): 1140 return NotImplemented -> 1141 self_data, other_data, dims = _broadcast_compat_data(self, other) 1142 new_data = (f(self_data, other_data) 1143 if not reflexive

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in _broadcast_compat_data(self, other) 1379 else: 1380 # rely on numpy broadcasting rules -> 1381 self_data = self.data 1382 other_data = other 1383 dims = self.dims

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in data(self) 265 return self._data 266 else: --> 267 return self.values 268 269 @data.setter

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in values(self) 306 def values(self): 307 """The variable's data as a numpy.ndarray""" --> 308 return _as_array_or_item(self._data) 309 310 @values.setter

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/variable.py in _as_array_or_item(data) 182 TODO: remove this (replace with np.asarray) once these issues are fixed 183 """ --> 184 data = np.asarray(data) 185 if data.ndim == 0: 186 if data.dtype.kind == 'M':

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/numpy-1.11.2-py3.5-linux-x86_64.egg/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """ --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None):

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in array(self, dtype) 417 418 def array(self, dtype=None): --> 419 self._ensure_cached() 420 return np.asarray(self.array, dtype=dtype) 421

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in _ensure_cached(self) 414 def _ensure_cached(self): 415 if not isinstance(self.array, np.ndarray): --> 416 self.array = np.asarray(self.array) 417 418 def array(self, dtype=None):

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/numpy-1.11.2-py3.5-linux-x86_64.egg/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """ --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None):

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in array(self, dtype) 398 399 def array(self, dtype=None): --> 400 return np.asarray(self.array, dtype=dtype) 401 402 def getitem(self, key):

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/numpy-1.11.2-py3.5-linux-x86_64.egg/numpy/core/numeric.py in asarray(a, dtype, order) 480 481 """ --> 482 return array(a, dtype, copy=False, order=order) 483 484 def asanyarray(a, dtype=None, order=None):

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/core/indexing.py in array(self, dtype) 373 def array(self, dtype=None): 374 array = orthogonally_indexable(self.array) --> 375 return np.asarray(array[self.key], dtype=None) 376 377 def getitem(self, key):

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/conventions.py in getitem(self, key) 361 def getitem(self, key): 362 return mask_and_scale(self.array[key], self.fill_value, --> 363 self.scale_factor, self.add_offset, self._dtype) 364 365 def repr(self):

/sw/rhel6-x64/python/python-3.5.2-gcc49/lib/python3.5/site-packages/xarray-0.9.5-py3.5.egg/xarray/conventions.py in mask_and_scale(array, fill_value, scale_factor, add_offset, dtype) 57 """ 58 # by default, cast to float to ensure NaN is meaningful ---> 59 values = np.array(array, dtype=dtype, copy=True) 60 if fill_value is not None and not np.all(pd.isnull(fill_value)): 61 if getattr(fill_value, 'size', 1) > 1:

MemoryError: ``` I have uploaded the datafile to the following link:

https://swiftbrowser.dkrz.de/public/dkrz_c0725fe8741c474b97f291aac57f268f/GregorMoeller/

Do I use the wrong netcdf-engine?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2863/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
349077990 MDU6SXNzdWUzNDkwNzc5OTA= 2356 New Resample-Syntax leading to cancellation of dimensions rpnaut 30219501 closed 0     8 2018-08-09T10:56:29Z 2019-10-15T15:01:33Z 2019-10-15T15:01:33Z NONE      

Example

Starting with the dataset located here: https://swiftbrowser.dkrz.de/public/dkrz_c0725fe8741c474b97f291aac57f268f/GregorMoeller/, I want to calculate monthly sums of precipitation for each gridpoint in the daily data:

``` In [39]: data = array.open_dataset("eObs_gridded_0.22deg_rot_v14.0.TOT_PREC.1950-2016.nc_CutParamTimeUnitCor_FinalEvalGrid") In [40]: data Out[13]: <xarray.Dataset> Dimensions: (rlat: 136, rlon: 144, time: 153) Coordinates: * rlon (rlon) float32 -22.6 -22.38 -22.16 -21.94 -21.72 -21.5 ... * rlat (rlat) float32 -12.54 -12.32 -12.1 -11.88 -11.66 -11.44 ... * time (time) datetime64[ns] 2006-05-01T12:00:00 ... Data variables: rotated_pole int32 ... TOT_PREC (time, rlat, rlon) float32 ... Attributes: CDI: Climate Data Interface version 1.8.0 (http://m... Conventions: CF-1.6 history: Thu Jun 14 12:34:59 2018: cdo -O -s -P 4 remap... CDO: Climate Data Operators version 1.8.0 (http://m... cdo_openmp_thread_number: 4

In [41]: datamonth = data["TOT_PREC"].resample(time="M").sum() In [42]: datamonth Out[42]: <xarray.DataArray 'TOT_PREC' (time: 5)> array([ 551833.25 , 465640.09375, 328445.90625, 836892.1875 , 503601.5 ], dtype=float32) Coordinates: time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... ```

Problem description

The problem is that the dimensions 'rlon' and 'rlat' and the corresponding coordinates have not survived the resample process. Only the time is present in the result.

Expected Output

I expect to have the spatial dimensions still in the output of monthly sums. The surprise is, that this is the case using the old syntax: ``` In [41]: datamonth = data["TOT_PREC"].resample(dim="time",freq="M",how="sum") /usr/bin/ipython3:1: FutureWarning: .resample() has been modified to defer calculations. Instead of passing 'dim' and how="sum", instead consider using .resample(time="M").sum('time') #!/usr/bin/env python3

In [42]: datamonth Out[42]: <xarray.DataArray 'TOT_PREC' (time: 5, rlat: 136, rlon: 144)> array([[[ 0. , 0. , ..., 0. , 0. ], [ 0. , 0. , ..., 0. , 0. ], ..., [ 0. , 0. , ..., 44.900028, 41.400024], [ 0. , 0. , ..., 49.10001 , 46.5 ]]], dtype=float32) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... * rlon (rlon) float32 -22.6 -22.38 -22.16 -21.94 -21.72 -21.5 -21.28 ... * rlat (rlat) float32 -12.54 -12.32 -12.1 -11.88 -11.66 -11.44 -11.22 ...

```

What is wrong here?

And maybe I can also ask the question why the new syntax did not consider use cases with high complex scripting? I do not like to use in my programs a hardcoded dimension name, i.e. time=${freq} instead of dim=${dim}; freq=${freq}.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2356/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
331981984 MDU6SXNzdWUzMzE5ODE5ODQ= 2230 Inconsistency between Sum of NA's and Mean of NA's: resampling gives 0 or 'NA' rpnaut 30219501 closed 0     7 2018-06-13T12:54:47Z 2018-08-16T06:59:33Z 2018-08-16T06:59:33Z NONE      

Problem description

For datamining with xarray there is always the following issue with the resampling-method.
If i resample e.g. a daily timeseries over one month and if the data are 'NA' at each day, I get zero as a result. That is annoying considering a timeseries of precipitation. It is definitely a difference if the monthly precipitation is zero for one month (each day zero precipitation) or the monthly precipitation was not measured due to problems with the device (each day NA)

Data example

I have a dataset with hourly values for 5 month 'fcut'. python <xarray.Dataset> Dimensions: (bnds: 2, time: 3672) Coordinates: rlon float32 22.06 rlat float32 5.06 * time (time) datetime64[ns] 2006-05-01 2006-05-01T01:00:00 ... Dimensions without coordinates: bnds Data variables: rotated_pole int32 1 time_bnds (time, bnds) float64 1.304e+07 1.305e+07 1.305e+07 ... TOT_PREC (time) float64 nan nan nan nan nan nan nan nan nan nan nan ... Attributes: Doing a resample process gives only zero values for each month.

In [10]: fcut.resample(dim='time',freq='M',how='sum') Out[10]: <xarray.Dataset> Dimensions: (bnds: 2, time: 5) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... Dimensions without coordinates: bnds Data variables: rotated_pole (time) int64 1 1 1 1 1 time_bnds (time, bnds) float64 1.07e+10 1.07e+10 1.225e+10 1.225e+10 ... TOT_PREC (time) float64 0.0 0.0 0.0 0.0 0.0 But I expect to have NA for each month, as it is the case for the operator 'mean'

I know that there is an ongoing discussion about that topic (see for example https://github.com/pandas-dev/pandas/issues/9422).

For earth science it would be nice to have an option telling xarray what to do in case of a sum over values being all NA. Do you see a chance to have a fast fix for that issue in the model code?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2230/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
262696381 MDU6SXNzdWUyNjI2OTYzODE= 1604 Where functionality in xarray including else case (dask compability) rpnaut 30219501 closed 0     8 2017-10-04T07:51:39Z 2018-06-13T14:57:25Z 2017-12-14T17:49:56Z NONE      

I am faced with the flexibility needed to compute different types of skill scores using xarray. Thus, keeping in mind the attached code - a method for computing a modified mean squared error skill score ("AVSS") - I am fighting with the following problems: 1. I want to try to keep the code user-friendly regarding an extension of my program to other skill scores. Thus, the middle part of the attached method utilizing the if-then-else statement shall be outsourced. 2. There are three input datasets in case of skill scores: self.DSref = observations, self.DSrefmod = reference model, self.proof = model to evaluate. I have to combine all three with simple arithmetics (minus), but xarray does not allow simple arithmetics in case of small differences in the coordinates between the three datasets (also if the data type of the coordinates differ from float64 to float). Thus, my horrifying workaround is to make a loop over all variables I want to evaluate and to do for each variable the following: a) create a new dataset "DSnew" based on the dataset-variable "self.DSproof[varnsproof]", b) rename the variable in "DSnew" to the variable name I want to have for the evaulation result (e.g. Bias of temperature or skill score of temperature), c) create some help variables "DSnew['MSE_p1]" by copying and d) modifying the data of the variables to compute those mathematical operations of the related skill score invariant to temporal aggregation, e) applying grouping and resampling to compute climate statistics as monthly means or daily cycles and f) final mathematical operation of the related skill score which has to be done after temporal aggregation. Is there a better way to handle the operations / to prevent the strange process of creating new datasets and copying variables and to prevent the outer loop over the variables? What would be your short code to handle my problem? 3. The where functionality is sometimes needed to compute skill scores. I have used the where function of numpy, but as I read in your xarray-documentation, an explicit call of numpy functions is not compatible with dask-arrays? Is there an analogue in the xarray-package?

``` def squarefunc(x): return xarray.ufuncs.square(x) def AVSS_def(x): AVSS_p1 = x["MSE_p1"]/x["MSE_p2"] * (-1.0) + 1.0 AVSS_p2 = x["MSE_p2"]/x["MSE_p1"] - 1.0 x[varnsres].data = np.where( (x["MSE_p2"] - x["MSE_p1"]) > 0,AVSS_p1,AVSS_p2 ) return x

endresult = xarray.Dataset() for varnsrefmod,varnsproof,varnsref,varnsres in zip(self.varns_refmod,self.varns_proof,self.varns_ref,varns_result): DSnew = xarray.merge([xarray.Dataset(),self.DSproof[varnsproof]]) DSnew.rename({varnsproof : varnsres },inplace=True) DSnew["MSE_p1"] = DSnew[varnsres].copy() DSnew["MSE_p2"] = DSnew[varnsres].copy() DSnew["MSE_p1"].data = squarefunc(self.DSproof[varnsproof].data - self.DSref[varnsref].data) DSnew["MSE_p2"].data = squarefunc(self.DSrefmod[varnsrefmod].data - self.DSref[varnsref].data) coordtime = GeneralUtils.FromDimList2Pyxarray(dim_time[varnsref]) if aggregtime == 'fullperiod': DSnew = DSnew.mean(coordtime); self.RepairTime.update({'Needed' : False}); elif aggregtime == '-': DSnew = DSnew; self.RepairTime.update({'Needed' : False}); elif "overyears" in aggregtime: grpby_method=GeneralUtils.ConvertAggregationKey2XRgroupby(aggregtime) DSnew = DSnew.groupby(coordtime+'.'+grpby_method).mean(coordtime); self.RepairTime.update({'Needed' : True}); self.RepairTime.update({'start' : self.DSref[coordtime].data[0] }); self.RepairTime.update({'end' : self.DSref[coordtime].data[-1]}) elif "overyears" not in aggregtime: resamplefreq=GeneralUtils.ConvertAggregationKey2Resample(aggregtime) DSnew = DSnew.resample(resamplefreq, dim=coordtime, how='mean'); self.RepairTime.update({'Needed' : False}); AVSS_def(DSnew); self.Update_Attributes(Datasetobj=DSnew,variable=varnsres,stdname=varnsres,units=self.DSref[varnsref].attrs['units'], \ longname="temporal AVSS of "+self.DSref[varnsref].attrs['long_name']) endresult = xarray.merge([endresult,DSnew]) ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1604/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
249188875 MDU6SXNzdWUyNDkxODg4NzU= 1506 Support for basic math (multiplication, difference) on two xarray-Datasets rpnaut 30219501 closed 0     3 2017-08-09T23:16:09Z 2017-08-10T16:14:41Z 2017-08-10T16:14:41Z NONE      

Lets assume one has loaded two datasets 'datmod' and 'datref' containing daily data over one year. Data look like:

Dimensions: (bnds: 2, rlat: 228, rlon: 234, time: 365) Coordinates: * rlon (rlon) float64 -28.24 -28.02 -27.8 -27.58 -27.36 -27.14 ... * rlat (rlat) float64 -23.52 -23.3 -23.08 -22.86 -22.64 -22.42 ... * time (time) datetime64[ns] 2013-01-01T11:30:00 ... Dimensions without coordinates: bnds Data variables: rotated_pole |S1 '' time_bnds (time, bnds) float64 1.073e+09 1.073e+09 1.073e+09 ... ASWGLOB_S (time, rlat, rlon) float64 nan nan nan nan nan nan nan nan ...

Now I want to compute a more complex metric as the temporal correlation and combine it with the functionality of groupby or resample, i.e. determine the temporal correlation for each month seperately. So, starting with ``` def anomaly(x): return x - x.mean('time')

a = datref.groupby('time.month').apply(anomaly) b = datmod.groupby('time.month').apply(anomaly) gives me the anomalies for each time step with respect to monthly means. However, for the nominator of the correlation (the denominator is not discussed here) the elementwise multiplication is needed: corr = a*b and later on this product is grouped monthly and averaged over time. The problem is that the product 'a*b' gives an dataset with missing variables <xarray.Dataset> Dimensions: (rlat: 228, rlon: 234, time: 0) Coordinates: * time (time) datetime64[ns] * rlon (rlon) float64 -28.24 -28.02 -27.8 -27.58 -27.36 -27.14 -26.92 ... * rlat (rlat) float64 -23.52 -23.3 -23.08 -22.86 -22.64 -22.42 -22.2 ... Data variables: month (time) int64 ```

I can overcome the problem by doing something like corr=a[varname].data - b[varname.data]. But then I have an numpy.array which does not support groupby- and aggregation-functionality, i.e. I must clone the dataset 'datmod' and replace all the data with data of 'corr'. Then I can use again Dataset-aggregation functionality.

Is there a way to overcome the problem of elementwise multiplication (as well as subtraction) or should such a feature be added in the future?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1506/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
243270042 MDU6SXNzdWUyNDMyNzAwNDI= 1480 Time Dimension, Big problem with methods 'groupby' and 'to_netcdf' rpnaut 30219501 closed 0     4 2017-07-16T22:10:52Z 2017-07-20T19:46:15Z 2017-07-17T19:01:01Z NONE      

My problem is that I would like to use the easy functionality of the xarray-library in python, but I run into problems with the time dimension in case of aggregating data and in case of writing netcdf. I am using pandas version 0.17.1 and xarray 0.9.6.

I have opened a dataset, which contains daily data over the year 2013: datset=xr.open_dataset(filein).

The contents of the file are: <xarray.Dataset> Dimensions: (bnds: 2, rlat: 228, rlon: 234, time: 365) Coordinates: * rlon (rlon) float64 -28.24 -28.02 -27.8 -27.58 -27.36 -27.14 ... * rlat (rlat) float64 -23.52 -23.3 -23.08 -22.86 -22.64 -22.42 ... * time (time) datetime64[ns] 2013-01-01T11:30:00 ... Dimensions without coordinates: bnds Data variables: rotated_pole |S1 '' time_bnds (time, bnds) float64 1.073e+09 1.073e+09 1.073e+09 ... ASWGLOB_S (time, rlat, rlon) float64 nan nan nan nan nan nan nan nan ... Attributes: CDI: Climate Data Interface version 1.7.0 (http://m... Conventions: CF-1.4 When I use now the groupby method to compute the monthly means, the time dimension is destroyed:

``` datset.groupby('time.month').mean('time') <xarray.Dataset> Dimensions: (bnds: 2, month: 12, rlat: 228, rlon: 234) Coordinates: * rlon (rlon) float64 -28.24 -28.02 -27.8 -27.58 -27.36 -27.14 ... * rlat (rlat) float64 -23.52 -23.3 -23.08 -22.86 -22.64 -22.42 -22.2 ... * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12 Dimensions without coordinates: bnds Data variables: time_bnds (month, bnds) float64 1.074e+09 1.074e+09 1.077e+09 1.077e+09 ... ASWGLOB_S (month, rlat, rlon) float64 nan nan nan nan nan nan nan nan ...

```

Now I have instead of a time dimension a month dimension with values from 1 to 12. Is this a side effect of the 'mean' - function? As long as i do not use this mean function, the time variable is retained.

The examples given in the documentation seems to have a different behaviour. That is, the timestamps are retained and the first date of each month is used.

It seems to be impossible to reinvent my old time dimension.

  • Method A: I have tried to create my own time variable with endresult.assign_coords(time=pd.date_range(start='2013-01',end='2014-01',freq='M' . That perfectly gives me a new coordinate with the correct dates. Afterwards, I have to swap the dimensions from month to time. It was only possible by changing the dimension of the coordinate 'time' to the dimension of the coordinate 'month'. However, the netcdf file contained wrong dates as output, i.e. values from 1 to 12. Thus the first time step was at 31-January 2013 and the next one day later and the next one day later and so on. If I add the attributes 'calendar' and 'units' to the time-coordinate, then the output seems to be correct but type int64 is not readable by programs like ncview.
  • Method B: Create the own time variable by using pandas and then converting the datetime64-dates to the usual python datetime-object. Further, the datetime-object is converted to numbers with the netcdf4.datetime.date2num method. Further, I assign this numbers to the time-coordinate and add the encoded attributes for units and calendar. However, the encoded units are not writen to the netcdf-data. So I have to add them with an external program like ncatted.

How to improve method A und B in order to have a correct time stamp in my nc-file.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1480/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 22.638ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows