home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

21 rows where user = 30219501 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 9

  • Refactor nanops 6
  • Inconsistency between Sum of NA's and Mean of NA's: resampling gives 0 or 'NA' 3
  • WHERE function, problems with memory operations? 3
  • Where functionality in xarray including else case (dask compability) 2
  • Time bounds returned after an operation with resample-method 2
  • Memory Error for simple operations on NETCDF4 internally zipped files 2
  • Time Dimension, Big problem with methods 'groupby' and 'to_netcdf' 1
  • Support for basic math (multiplication, difference) on two xarray-Datasets 1
  • New Resample-Syntax leading to cancellation of dimensions 1

user 1

  • rpnaut · 21 ✖

author_association 1

  • NONE 21
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
658429636 https://github.com/pydata/xarray/issues/2231#issuecomment-658429636 https://api.github.com/repos/pydata/xarray/issues/2231 MDEyOklzc3VlQ29tbWVudDY1ODQyOTYzNg== rpnaut 30219501 2020-07-14T21:45:39Z 2020-07-14T21:45:39Z NONE

Maybe, I will look to create a wrapper to handle the time_bounds issue for files following the cf-conventions. Note, that not only resample operations should modify the time_bounds, but also the reselect process should take care about time_bounds. As an example, we assume to have in one file A instantenous data (two times a day at 00 UTC and 12 UTC) and in the other file B aggregated data (daily averages with time stamps defined at the end of the aggregation interval). The reselection process of A in B should pick up only the times 12 UTC from file A (or even better: no time steps because aggregation interval in file B is not compatible with instantenous values).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Time bounds returned after an operation with resample-method 332018176
479007709 https://github.com/pydata/xarray/issues/2863#issuecomment-479007709 https://api.github.com/repos/pydata/xarray/issues/2863 MDEyOklzc3VlQ29tbWVudDQ3OTAwNzcwOQ== rpnaut 30219501 2019-04-02T13:56:05Z 2019-04-02T15:09:34Z NONE

It could be really a memory problem. A smaller dataset with internally zipped NETCDF4 data could be read. I have 50 GB of memory available. Is there a memory leak for reading this types of files?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory Error for simple operations on NETCDF4 internally zipped files 428180638
478961666 https://github.com/pydata/xarray/issues/2863#issuecomment-478961666 https://api.github.com/repos/pydata/xarray/issues/2863 MDEyOklzc3VlQ29tbWVudDQ3ODk2MTY2Ng== rpnaut 30219501 2019-04-02T11:51:25Z 2019-04-02T11:52:30Z NONE

I even cannot access the data with eobs["T_2M"].data. Maybe the file is corrupt, but using 'ncdump' works for this file.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory Error for simple operations on NETCDF4 internally zipped files 428180638
478562188 https://github.com/pydata/xarray/issues/2861#issuecomment-478562188 https://api.github.com/repos/pydata/xarray/issues/2861 MDEyOklzc3VlQ29tbWVudDQ3ODU2MjE4OA== rpnaut 30219501 2019-04-01T12:39:17Z 2019-04-01T12:39:17Z NONE

I upload the two 'DSfile_ref' and 'DSfile_proof' to the following address:

wget -r -H -N --cut-dirs=3 --include-directories="/v1/" "https://swiftbrowser.dkrz.de/public/dkrz_c0725fe8741c474b97f291aac57f268f/GregorMoeller/?show_all"

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WHERE function, problems with memory operations? 427644858
478560102 https://github.com/pydata/xarray/issues/2861#issuecomment-478560102 https://api.github.com/repos/pydata/xarray/issues/2861 MDEyOklzc3VlQ29tbWVudDQ3ODU2MDEwMg== rpnaut 30219501 2019-04-01T12:32:31Z 2019-04-01T12:32:31Z NONE

The xarray coordinates-aware philosophy is nice to prevent from doing nothing useful. I have learned that also the 'data types' of the coordinates have to be identical, i.e. do not try to compare datasets with float32 coordinates and with float64 coordinates. Therefore, I was already educated by the Xarray's.

To provide you with an code example would mean that I have to extract all the steps done in the "BigScript" and the related files. It would mess up this feed here. However, you have asked so I try.

```

open and squeezing (for consistency between datasets)

self.DSref = xarray.open_dataset(DSfile_ref) self.DSproof = xarray.open_dataset(DSfile_proof) self.DSref = self.DSref.squeeze(); self.DSproof = self.DSproof.squeeze()

harmonize grids (the coordinates belonging together where copied from DSref to DSproof

self.DSproof = self.MetricCalcProg.HarmonizeHoriGrid(dsetref=self.DSref, \ dsetmod=self.DSproof,posdimnames=self.cfggeneral.PossibleDimNames, \ varnsref=self.varns_ref,varnsmod=self.varns_proof) self.DSproof, self.DSref = self.MetricCalcProg.HarmonizeVertGrid(dsetref=self.DSref, \ dsetmod=self.DSproof,posdimnames=self.cfggeneral.PossibleDimNames, \ varnsref=self.varns_ref,varnsmod=self.varns_proof) self.DSproof, self.DSref = self.MetricCalcProg.HarmonizeTempGrid(dsetref=self.DSref, \ dsetmod=self.DSproof,posdimnames=self.cfggeneral.PossibleDimNames,varnsref=self.varns_ref, \ varnsmod=self.varns_proof,unifreqme=self.cfgdatamining["target_EvalFrequency"]["method"])

to compute linear correlation, dataset A and B have to have equal sample sizes

self.DSproof = self.DSproof[varnsproof].where(self.DSref[varnsref].notnull().data).to_dataset( \ name=varnsproof) self.DSref = self.DSref[varnsref].where(self.DSproof[varnsproof].notnull().data).to_dataset( \ name=varnsref) ```

The methods for harmonization of the grids is defined as follows. Do not understand me wrong, but I have to deal with different datasets using different data types and variable names. I have to make the height-coordinate of dataset A consistent to the height-coordinate of dataset B (also the name). I would really like to have some tolerance options for making DataA-DataB. ``` def HarmonizeHoriGrid(self,dsetref=None,dsetmod=None,posdimnames=None,varnsref=None,varnsmod=None): """ Copy all the hor. coordinates from a reference-DS to the model dataset (needed due to inconsistencies in dtype, ..., i.e. small deviations) return model dataset but with harmonized horizontal grid; prone to errors because the check of coordinates has to be done for each variable; (e.g. the model contains WSS(lon1,lat1) and the obs has WSS(lon,lat)) --> however that should be harmonized by the cdo's interpolation) """

self.logger.debug("         Harmonization of horizontal grids prior evaluation.")
CoordInfref = self.FindCoordinatesOfVariables(datafile=None,dataset=dsetref,varnamelist=varnsref);
CoordInfmod = self.FindCoordinatesOfVariables(datafile=None,dataset=dsetmod,varnamelist=varnsmod);
dim_xyref = GenUti.SplitMetaDim(CoordInfref,mode='spatial',PossibleDimDict=posdimnames)
dim_xymod = GenUti.SplitMetaDim(CoordInfmod,mode='spatial',PossibleDimDict=posdimnames)
#
for varmod,varref in zip(varnsmod,varnsref):
    if varref in dim_xyref.keys() and varmod in dim_xymod.keys():
        for dimes in dim_xyref[varref]:
            if dimes in dim_xymod[varmod]:
                self.logger.debug("           Found for the variable "+varref+" the spat. dimension "+dimes+ \
                " in reference dataset and an equivalent in dataset to evaluate: "+varmod+","+dimes+ \
                ". -> Make datatype consistent now")
                dsetmod[dimes].data = dsetref[dimes].data
            else:
                self.logger.debug("           Found for the variable "+varref+" the spat. dimension "+dimes+ \
                " in reference dataset but not no equivalent in dataset to evaluate. ")
if ("rotated_pole" in dsetmod.data_vars) and ("rotated_pole" in dsetref.data_vars):
    dsetmod["rotated_pole"]=dsetref["rotated_pole"] # harmonize the type of rotated_pole
# return the model dataset with harmonized dimensions
return dsetmod

def HarmonizeTempGrid(self,dsetref=None,dsetmod=None,posdimnames=None,varnsref=None,varnsmod=None,unifreqme=None): """ Copy the values of the time coordinate from a reference-DS to the model dataset (needed due to inconsistencies in dtype, ...); return model dataset but with harmonized horizontal grid; the input datasets are already opened netcdf-files as xarray-datasets; """ self.logger.debug(" Harmonization of temporal grids prior evaluation.") DimInfref = self.FindDimensionsOfVariables(datafile=None,dataset=dsetref,varnamelist=varnsref) DimInfmod = self.FindDimensionsOfVariables(datafile=None,dataset=dsetmod,varnamelist=varnsmod) dim_tref = GenUti.SplitMetaDim(DimInfref,mode='temporal',PossibleDimDict=posdimnames) dim_tmod = GenUti.SplitMetaDim(DimInfmod,mode='temporal',PossibleDimDict=posdimnames) # for varmod,varref in zip(varnsmod,varnsref): if varref in dim_tref.keys() and varmod in dim_tmod.keys(): for dimes in dim_tref[varref]: if dimes in dim_tmod[varmod]: # helpstr=" Found for the variable "+varref+" the temp. dimension "+ \ dimes+" in reference dataset and an equaivalent in dataset to evaluate: "+ \ varmod+","+dimes+". -> Make datatype consistent depending on unifyfreqmethod"+\ unifreqme if unifreqme=="reselect": self.logger.debug(helpstr) timediff = (np.max(dsetmod[dimes].data-dsetref[dimes].data)) timediff = timediff / np.timedelta64(1,'s') if np.abs(int(timediff)) > 1: self.logger.warning(" The two datasets do not share the same "+\ "time-axis. Maximum difference is "+str(timediff)+' seconds') dsetmod[dimes].data = dsetref[dimes].data elif unifreqme=="resample" and (np.size(dsetref[dimes]) != np.size(dsetmod[dimes])): self.logger.debug(helpstr) inters = pandas.to_datetime(dsetref[dimes].data) inters = inters.intersection(pandas.to_datetime(dsetmod[dimes].data)) dsetmod = dsetmod.sel(time=inters,method='nearest') dsetref = dsetref.sel(time=inters,method='nearest') timediff = (np.max(dsetmod[dimes].data-dsetref[dimes].data)) timediff = timediff / np.timedelta64(1,'s') if np.abs(int(timediff)) > 1: self.logger.warning(" The two harmonized datasets still do not "+\ " share time axis. Max diff is "+str(timediff)+' seconds') dsetmod[dimes].data = dsetref[dimes].data else: self.logger.debug(" No harmonization needed here.") # return the model dataset with harmonized dimensions return dsetmod, dsetref

def HarmonizeVertGrid(self,dsetref=None,dsetmod=None,posdimnames=None,varnsref=None,varnsmod=None): """ Adapt the height coordinate from a reference-DS to the model dataset (needed due to different dim names) return model and reference dataset but with harmonized horizontal grid; the input datasets are already opened netcdf-files as xarray-datasets""" self.logger.debug(" Harmonization of vertical grids prior evaluation.") DimInfref = self.FindDimensionsOfVariables(datafile=None,dataset=dsetref,varnamelist=varnsref); DimInfmod = self.FindDimensionsOfVariables(datafile=None,dataset=dsetmod,varnamelist=varnsmod); dim_zref = GenUti.SplitMetaDim(DimInfref,mode='vertical',PossibleDimDict=posdimnames) dim_zmod = GenUti.SplitMetaDim(DimInfmod,mode='vertical',PossibleDimDict=posdimnames) for varref,varmod in zip(varnsref,varnsmod): if (dim_zref[varref] and dim_zmod[varmod]): self.logger.debug(" here we have to modify vert. coord. of "+varref+ " "+varmod) if len(dim_zref[varref])==1 and len(dim_zmod[varmod])==1: dsetmod = dsetmod.rename({ dim_zmod[varmod][0] : "height_"+varref }) dsetref = dsetref.rename({ dim_zref[varref][0] : "height_"+varref }) else: self.logger.error(" Many vertical dimensions found for the variable "+varref+" or "+varmod) self.logger.error(dsetmod[varref]) self.logger.error(dsetmod[varmod]) exit(); else: self.logger.debug(" No vertical dimensions found for the variable "+varref+" or "+varmod) return dsetmod, dsetref ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WHERE function, problems with memory operations? 427644858
478545314 https://github.com/pydata/xarray/issues/2861#issuecomment-478545314 https://api.github.com/repos/pydata/xarray/issues/2861 MDEyOklzc3VlQ29tbWVudDQ3ODU0NTMxNA== rpnaut 30219501 2019-04-01T11:42:11Z 2019-04-01T11:44:39Z NONE

Dear fmaussion, the '.data' does the trick. Up to now I never thought about that the 'notnull' method is acting on more than only the data itself. That is maybe the reason why the 'where' method behaves strange to me. However, the coordinates are already mathematically identical before DSproof = proof["WSS"].where(ref["WSS"].notnull()).to_dataset(name="WSS")`. I am still a little bit confused.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WHERE function, problems with memory operations? 427644858
412056121 https://github.com/pydata/xarray/issues/2356#issuecomment-412056121 https://api.github.com/repos/pydata/xarray/issues/2356 MDEyOklzc3VlQ29tbWVudDQxMjA1NjEyMQ== rpnaut 30219501 2018-08-10T11:30:45Z 2018-08-10T11:30:45Z NONE

Thank you @dcherian . Do you think, that two times giving the dimension time as argument is useful?

OR MAYBE i understand everything wrong: Is the argument time='M' only mean to be freqency='M'? And the name for the time dimension is now given by the argument "dim"? Or let me ask the question different: what would be the syntax of your command, if the time dimension has the name 'TIMES'?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  New Resample-Syntax leading to cancellation of dimensions 349077990
412054270 https://github.com/pydata/xarray/pull/2236#issuecomment-412054270 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDQxMjA1NDI3MA== rpnaut 30219501 2018-08-10T11:20:40Z 2018-08-10T11:24:37Z NONE

Ok. The strange thing with the spatial dimensions is, that the new syntax forces the user to tell exactly, on which dimension the mathematical operator for resampling (like sum) should be applied. The syntax is now data.resample(time="M").sum(dim="time",min_count=1).

That weird - two times giving the dimension. However, doing so the xarray is not summing up for, e.g. the first month, all values he finds in a specific dataaray, but only those values along the dimension time.

AND NOW, the good new is that I got the following picture with your 'min_count=0' and 'min_count=1':

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
411713980 https://github.com/pydata/xarray/pull/2236#issuecomment-411713980 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDQxMTcxMzk4MA== rpnaut 30219501 2018-08-09T10:34:19Z 2018-08-09T16:15:55Z NONE

To wrap it up. Your implementation works for timeseries - data. There is something strange with time-space data, which should be fixed. If this is fixed, it is worth to test in my evaluation environment. Do you have a feeling, why the new syntax is giving such strange behaviour? Shall we put the bug onto the issue list?

And maybe, it would be interesting to have in the future the min_count argument also available for the old syntax and not only the new. The reason: The dimension name is not flexible anymore - it cannot be a variable like dim=${dim}

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
411705656 https://github.com/pydata/xarray/pull/2236#issuecomment-411705656 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDQxMTcwNTY1Ng== rpnaut 30219501 2018-08-09T10:01:32Z 2018-08-09T10:30:15Z NONE

Thanks, @fujiisoup .

I have good news and i have bad news.

A) Your min_count argument still seems to work only if using the new syntax for resample, i.e. data.resample($dim=$freq).sum(). I guess, this is due to the cancellation of the old syntax in the future. Using the old syntax data.resample(dim=$dim,freq=$freq,how=$oper) your code seems to ignore the min_count argument.

B) Your min_count argument is not allowed for type 'dataset' but only for type 'dataarray'. Starting with the dataset located here: https://swiftbrowser.dkrz.de/public/dkrz_c0725fe8741c474b97f291aac57f268f/GregorMoeller/ I've got the following message: data.resample(time="M").sum(min_count=1) TypeError: sum() got an unexpected keyword argument 'min_count'

Thus, I have tested your implementation only on dataarrays. I take the netcdf - array 'TOT_PREC' and try to compute the monthly sum: In [39]: data = array.open_dataset("eObs_gridded_0.22deg_rot_v14.0.TOT_PREC.1950-2016.nc_CutParamTimeUnitCor_FinalEvalGrid") In [40]: datamonth = data["TOT_PREC"].resample(time="M").sum() In [41]: datamonth Out[41]: <xarray.DataArray 'TOT_PREC' (time: 5)> array([ 551833.25 , 465640.09375, 328445.90625, 836892.1875 , 503601.5 ], dtype=float32) Coordinates: time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ...

So, the xarray-package is still throwing away the dimensions in x- and y-direction. It has nothing to do with any min-count argument. THIS MUST BE A BUG OF XARRAY. The afore-mentioned dimensions only survive using the old-syntax: ``` In [41]: datamonth = data["TOT_PREC"].resample(dim="time",freq="M",how="sum") /usr/bin/ipython3:1: FutureWarning: .resample() has been modified to defer calculations. Instead of passing 'dim' and how="sum", instead consider using .resample(time="M").sum('time') #!/usr/bin/env python3

In [42]: datamonth Out[42]: <xarray.DataArray 'TOT_PREC' (time: 5, rlat: 136, rlon: 144)> array([[[ 0. , 0. , ..., 0. , 0. ], [ 0. , 0. , ..., 0. , 0. ], ..., [ 0. , 0. , ..., 44.900028, 41.400024], [ 0. , 0. , ..., 49.10001 , 46.5 ]]], dtype=float32) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... * rlon (rlon) float32 -22.6 -22.38 -22.16 -21.94 -21.72 -21.5 -21.28 ... * rlat (rlat) float32 -12.54 -12.32 -12.1 -11.88 -11.66 -11.44 -11.22 ...

```

Nevertheless, I have started to use your min_count argument only at one point (the x- and y-dimensions do not matter). In that case, your implementation works fine: ``` In [46]: pointdata = data.isel(rlon=10,rlat=10) In [47]: pointdata["TOT_PREC"] Out[47]: <xarray.DataArray 'TOT_PREC' (time: 153)> array([ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, ... nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], dtype=float32) Coordinates: rlon float32 -20.4 rlat float32 -10.34 * time (time) datetime64[ns] 2006-05-01T12:00:00 2006-05-02T12:00:00 ... Attributes: standard_name: precipitation_amount long_name: Precipitation units: kg m-2 grid_mapping: rotated_pole cell_methods: time: sum

In [48]: pointdata["TOT_PREC"].resample(time="M").sum(min_count=1) Out[48]: <xarray.DataArray 'TOT_PREC' (time: 5)> array([ nan, nan, nan, nan, nan]) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... rlon float32 -20.4 rlat float32 -10.34

In [49]: pointdata["TOT_PREC"].resample(time="M").sum(min_count=0) Out[49]: <xarray.DataArray 'TOT_PREC' (time: 5)> array([ 0., 0., 0., 0., 0.], dtype=float32) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... rlon float32 -20.4 rlat float32 -10.34

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
399076615 https://github.com/pydata/xarray/pull/2236#issuecomment-399076615 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDM5OTA3NjYxNQ== rpnaut 30219501 2018-06-21T11:50:19Z 2018-06-21T12:04:58Z NONE

Okay. Using the old resample-nomenclature I tried to compare the results with your modified code (min_count = 2) and with the tagged version 0.10.7 (no min_count argument). But I am not sure, if this also works. Do you examine the keywords given from the old-resample method?.

However, in the comparison I did not saw the nan's I expected over water.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
399071095 https://github.com/pydata/xarray/pull/2236#issuecomment-399071095 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDM5OTA3MTA5NQ== rpnaut 30219501 2018-06-21T11:26:45Z 2018-06-21T12:03:23Z NONE

!!!Correction!!!. The resample-example above gives also a missing lon-lat dimensions in case of unmodified model code. The resulting numbers are the same.

data_aggreg = data["TOT_PREC"].resample(time="M").sum() data_aggreg <xarray.DataArray 'TOT_PREC' (time: 5)> array([ 551833.25 , 465640.09375, 328445.90625, 836892.1875 , 503601.5 ], dtype=float32) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ...

Now I am a little bit puzzled. But ...

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!IT SEEMS TO BE A BUG!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

If I do the resample process using the old nomenclature: data_aggreg = data["TOT_PREC"].resample(dim="time",how="sum",freq="M") it works. DO we have a bug in xarray?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
399061170 https://github.com/pydata/xarray/pull/2236#issuecomment-399061170 https://api.github.com/repos/pydata/xarray/issues/2236 MDEyOklzc3VlQ29tbWVudDM5OTA2MTE3MA== rpnaut 30219501 2018-06-21T10:55:56Z 2018-06-21T11:03:16Z NONE

Hello from me hopefully contributing some needfull things.

At first, I would like to comment that I checked out your code.

I ran the following code example using a datafile uploaded under the following link: https://swiftbrowser.dkrz.de/public/dkrz_c0725fe8741c474b97f291aac57f268f/GregorMoeller/

``` import xarray import matplotlib.pyplot as plt

data = xarray.open_dataset("eObs_gridded_0.22deg_rot_v14.0.TOT_PREC.1950-2016.nc_CutParamTimeUnitCor_FinalEvalGrid") data <xarray.Dataset> Dimensions: (rlat: 136, rlon: 144, time: 153) Coordinates: * rlon (rlon) float32 -22.6 -22.38 -22.16 -21.94 -21.72 -21.5 ... * rlat (rlat) float32 -12.54 -12.32 -12.1 -11.88 -11.66 -11.44 ... * time (time) datetime64[ns] 2006-05-01T12:00:00 ... Data variables: rotated_pole int32 ... TOT_PREC (time, rlat, rlon) float32 ... Attributes: CDI: Climate Data Interface version 1.8.0 (http://m... Conventions: CF-1.6 history: Thu Jun 14 12:34:59 2018: cdo -O -s -P 4 remap... CDO: Climate Data Operators version 1.8.0 (http://m... cdo_openmp_thread_number: 4

data_aggreg = data["TOT_PREC"].resample(time="M").sum(min_count=0) data_aggreg2 = data["TOT_PREC"].resample(time="M").sum(min_count=1) I have recognized that the min_count option at recent state technically only works for DataArrays and not for DataSets. However, more interesting is the fact that the dimensions are destroyed: data_aggreg <xarray.DataArray 'TOT_PREC' (time: 5)> array([ 551833.25 , 465640.09375, 328445.90625, 836892.1875 , 503601.5 ], dtype=float32) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... no longitude and latitude survives your operation. If I would use the the sum-operator on the full dataset (where maybe the code was not modified?) I got

data_aggreg = data.resample(time="M").sum() data_aggreg <xarray.Dataset> Dimensions: (rlat: 136, rlon: 144, time: 5) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... * rlon (rlon) float32 -22.6 -22.38 -22.16 -21.94 -21.72 -21.5 ... * rlat (rlat) float32 -12.54 -12.32 -12.1 -11.88 -11.66 -11.44 ... Data variables: rotated_pole (time) int64 1 1 1 1 1 TOT_PREC (time, rlat, rlon) float32 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Refactor nanops 333248242
399059204 https://github.com/pydata/xarray/issues/2230#issuecomment-399059204 https://api.github.com/repos/pydata/xarray/issues/2230 MDEyOklzc3VlQ29tbWVudDM5OTA1OTIwNA== rpnaut 30219501 2018-06-21T10:48:21Z 2018-06-21T10:48:21Z NONE

Thank you for considering that issue in your pull request #2236. I will switch to comment your work in the related thread, but I would leave this issue open until a solution is found for the min_count option.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistency between Sum of NA's and Mean of NA's: resampling gives 0 or 'NA' 331981984
397313140 https://github.com/pydata/xarray/issues/2230#issuecomment-397313140 https://api.github.com/repos/pydata/xarray/issues/2230 MDEyOklzc3VlQ29tbWVudDM5NzMxMzE0MA== rpnaut 30219501 2018-06-14T14:20:10Z 2018-06-14T14:34:18Z NONE

I really have problems in reading the code in duck_array_ops.py. The program starts with defining 12 operators. One of them is:

sum = _create_nan_agg_method('sum', numeric_only=True)

I really do not understand where the train is going. Thats due to my limited programming skills for object-oriented code. No guess what '_create_nan_agg_method' is doing. I tried to change the code in method def _nansum_object(value, axis=None, **kwargs): """ In house nansum for object array """ return _dask_or_eager_func('sum')(value, axis=axis, **kwargs) #return np.array(np.nan) but it seems that he will not touch that method during the 'resample().sum()' process.

I need some help to really modify the operators. Is there any hint for me? For the pandas code it seems to be much easier.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistency between Sum of NA's and Mean of NA's: resampling gives 0 or 'NA' 331981984
396958080 https://github.com/pydata/xarray/issues/1604#issuecomment-396958080 https://api.github.com/repos/pydata/xarray/issues/1604 MDEyOklzc3VlQ29tbWVudDM5Njk1ODA4MA== rpnaut 30219501 2018-06-13T14:29:24Z 2018-06-13T14:29:24Z NONE

The where operator does only allow for an 'ifthen' construct, but not for an 'ifthenelse' construct. I cannot explicitly tell which values to write in the data at those places where the condition is not fullfilled. It is automatically a 'NA'. This leads to a lot of computation time and address a lot of memory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Where functionality in xarray including else case (dask compability) 262696381
396957166 https://github.com/pydata/xarray/issues/2231#issuecomment-396957166 https://api.github.com/repos/pydata/xarray/issues/2231 MDEyOklzc3VlQ29tbWVudDM5Njk1NzE2Ng== rpnaut 30219501 2018-06-13T14:26:53Z 2018-06-13T14:26:53Z NONE

I want to add that sometimes the variable time_bnds is already gone after resampling.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Time bounds returned after an operation with resample-method 332018176
396934730 https://github.com/pydata/xarray/issues/2230#issuecomment-396934730 https://api.github.com/repos/pydata/xarray/issues/2230 MDEyOklzc3VlQ29tbWVudDM5NjkzNDczMA== rpnaut 30219501 2018-06-13T13:21:40Z 2018-06-13T13:47:56Z NONE

I can overcome this by using In [14]: fcut.resample(dim='time',freq='M',how='mean',skipna=False) Out[14]: <xarray.Dataset> Dimensions: (bnds: 2, time: 5) Coordinates: * time (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ... Dimensions without coordinates: bnds Data variables: rotated_pole (time) float64 1.0 1.0 1.0 1.0 1.0 time_bnds (time, bnds) float64 1.438e+07 1.438e+07 1.702e+07 ... TOT_PREC (time) float64 nan nan nan nan nan BUT THE PROBLEM IS:

A) that this behaviour is in contradiction to the computation of a mean. I can always compute a mean with the default option 'skipna=True' regardless I have a few NA's in the timeseries (the output is a number not considering the NA's) or only NA's in the timeseries (the output is NA). This is what i would expect.

B) that setting `skipna=False' does not allow for computations if only one value of the timeseries is NA.

I would like to have the behaviour of the mean operator also for the sum operator.

Also for the climate data operators (CDO) the developers decided to give the users two options, skipna=True and skipna=False. But skipna == TRUE should result in the same behaviour for both operators (mean and sum).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistency between Sum of NA's and Mean of NA's: resampling gives 0 or 'NA' 331981984
335810243 https://github.com/pydata/xarray/issues/1604#issuecomment-335810243 https://api.github.com/repos/pydata/xarray/issues/1604 MDEyOklzc3VlQ29tbWVudDMzNTgxMDI0Mw== rpnaut 30219501 2017-10-11T13:31:35Z 2017-10-11T13:31:35Z NONE

Thank you very much, jhamman, for your comment on #1496 . I would really like that feature.

Hopefully, I will find also a way to overcome in my script the problem with simple arithmetic operators on DataSets or DataArrays. I do not like to always access only the data-stream (numpy-array) and not the DataSet or DataArray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Where functionality in xarray including else case (dask compability) 262696381
321474488 https://github.com/pydata/xarray/issues/1506#issuecomment-321474488 https://api.github.com/repos/pydata/xarray/issues/1506 MDEyOklzc3VlQ29tbWVudDMyMTQ3NDQ4OA== rpnaut 30219501 2017-08-10T07:31:35Z 2017-08-10T07:31:35Z NONE

A first check of your theory (shoyer) reveals that if the time stamps are exectly the same, i.e. the date and the time is equal for each time step in the two netcdf-files, then the operation is working sucessfully.

I think it is very consequent to allow elementwise operations only on datasets with not only the same shape of dimensions and coordinates but also with the same values for dimensions, spatial coordinates and times. I only have to find a way to harmonize both datasets for time values prior operations with xarray ( a hard task considering model and evaluation data).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for basic math (multiplication, difference) on two xarray-Datasets 249188875
316729712 https://github.com/pydata/xarray/issues/1480#issuecomment-316729712 https://api.github.com/repos/pydata/xarray/issues/1480 MDEyOklzc3VlQ29tbWVudDMxNjcyOTcxMg== rpnaut 30219501 2017-07-20T14:57:11Z 2017-07-20T14:57:11Z NONE

You are so right. I did not realize that there is the resample method, which hopefully can also be combined with the 'apply' functionality. The documentation I mentioned was from "nicolasfauchereau.github.io/climatecode/posts/xray" (look at In[24] and In[25]. As I understand he is getting monthly data out of groupby-method and in his example the "time" survives. It seems to be that the functionality of groupby-month changed during the years, because the groupby-method in Nicolas's example did not aggregate same calendar month to one time stamp.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Time Dimension, Big problem with methods 'groupby' and 'to_netcdf' 243270042

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2077.928ms · About: xarray-datasette