home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

14 rows where user = 167164 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, author_association, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 12
  • pull 2

state 2

  • closed 10
  • open 4

repo 1

  • xarray 14
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
187608079 MDU6SXNzdWUxODc2MDgwNzk= 1086 Is there a more efficient way to convert a subset of variables to a dataframe? naught101 167164 closed 0     21 2016-11-07T01:43:20Z 2023-12-15T20:47:53Z 2023-12-15T20:47:53Z NONE      

I have the following chunk of code that gets used a lot in my scripts:

```{python}

/data/documents/uni/phd/projects/pals_utils/pals_utils/data.py(291)pals_xr_to_df() 289 # TODO: This is not suitable for gridded datasets: 290 index_vars = {v: dataset.coords[v].values[0] for v in index_vars} 1-> 291 df = dataset.sel(**index_vars)[data_vars].to_dataframe()[data_vars] 292 293 if qc: ```

It basically extracts a few data_vars from a dataset, and converts it to a dataframe, limiting the axis to a single grid-cell (this particular data only has one location anyway). The first [data_vars] call massively improve the efficiency (by dropping most variables before converting to a dataframe), the second one is to get rid of the x, y, and z in the dataframe (side-issue: it would be nice to have a drop_dims= option in .to_dataframe that dropped all dimensions of length 1)

Here's an example of it in use:

```{python} ipdb> index_vars {'y': 1.0, 'x': 1.0, 'z': 1.0}

ipdb> data_vars ['Qle']

ipdb> dataset <xarray.Dataset> Dimensions: (time: 70128, x: 1, y: 1, z: 1) Coordinates: * x (x) float64 1.0 * y (y) float64 1.0 * time (time) datetime64[ns] 2002-01-01T00:30:00 ... * z (z) float64 1.0 Data variables: latitude (y, x) float64 -35.66 longitude (y, x) float64 148.2 elevation (y, x) float64 1.2e+03 reference_height (y, x) float64 70.0 NEE (time, y, x) float64 1.597 1.651 1.691 1.735 1.778 ... Qh (time, y, x) float64 -26.11 -25.99 -25.89 -25.78 ... Qle (time, y, x) float64 5.892 5.898 5.864 5.826 5.788 ... Attributes: Production_time: 2012-09-27 12:44:42 Production_source: PALS automated netcdf conversion PALS_fluxtower_template_version: 1.0.2 PALS_dataset_name: TumbaFluxnet PALS_dataset_version: 1.4 Contact: palshelp@gmail.com

ipdb> dataset.sel(**index_vars)[data_vars].to_dataframe()[data_vars].head() Qle time
2002-01-01 00:30:00 5.891888 2002-01-01 01:00:00 5.898049 2002-01-01 01:30:00 5.863696 2002-01-01 02:00:00 5.825712 2002-01-01 02:30:00 5.787727 ```

This particular line of code eventually calls pandas.tslib.array_to_timedelta64, which takes up a significant chunk of my script's run time. My line of code doesn't look like it's the best way to do things, and I'm wondering if there's any way to get the same resulting data that's more efficient. Any help would be greatly appreciated.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1086/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1444752393 I_kwDOAMm_X85WHSwJ 7278 remap_label_indexers removed without deprecation update? naught101 167164 closed 0     5 2022-11-11T00:38:30Z 2022-11-21T02:18:56Z 2022-11-16T01:54:24Z NONE      

What is your issue?

Not sure if this is a docs problem or a usage question. Our code was working on v0.19.0, and now isn't, because:

python E ImportError: cannot import name 'remap_label_indexers' from 'xarray.core.indexing' (/home/nedcr/miniconda3/lib/python3.9/site-packages/xarray/core/indexing.py)

Seems like this function was removed, but I can't find anything in the changelog on how to replace it, and the commit in which it was removed is huge and impenetrable.

The line we use it in is:

python nearest_point = remap_label_indexers(self.data, dict(x=x, y=y), method='nearest')[0]

I realist that this was probably a function intended for internal use only, but it was what I found at the time (years ago).

Is there a better way to do this? What replaces this function?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7278/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
945226829 MDExOlB1bGxSZXF1ZXN0NjkwNTg1ODI4 5607 Add option to pass callable assertion failure message generator naught101 167164 open 0     10 2021-07-15T10:17:42Z 2022-10-12T20:03:32Z   FIRST_TIME_CONTRIBUTOR   0 pydata/xarray/pulls/5607

It is nice to be able to write custom assertion error messages on failure sometimes. This allows that with the array comparison assertions, by allowing a fail_func(a, b) callable to be passed in to each assertion function.

Not tested yet, but I'm happy to add tests if this is something that would be appreciated.

  • [ ] Tests added
  • [ ] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5607/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
60303760 MDU6SXNzdWU2MDMwMzc2MA== 364 pd.Grouper support? naught101 167164 open 0     24 2015-03-09T06:25:14Z 2022-04-09T01:48:48Z   NONE      

In pandas, you can pas a pandas.TimeGrouper object to a .groupby() call, and it allows you to group by month, year, day, or other times, without manually creating a new index with those values first. It would be great if you could do this with xray, but at the moment, I get:

`` /usr/local/lib/python3.4/dist-packages/xray/core/groupby.py in __init__(self, obj, group, squeeze) 66 if the dimension is squeezed out. 67 """ ---> 68 if group.ndim != 1: 69 # TODO: remove this limitation? 70 raise ValueError('group` must be 1 dimensional')

AttributeError: 'TimeGrouper' object has no attribute 'ndim' ```

Not sure how this will work though, because pandas.TimeGrouper doesn't appear to work with multi-index dataframes yet anyway, so maybe there needs to be a feature request over there too, or maybe it's better to implement something from scratch...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/364/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
429572364 MDU6SXNzdWU0Mjk1NzIzNjQ= 2868 netCDF4: support for structured arrays as attribute values; serialize as "compound types" naught101 167164 open 0     3 2019-04-05T03:54:17Z 2022-04-07T15:23:25Z   NONE      

Code Sample, a copy-pastable example if possible

A "Minimal, Complete and Verifiable Example" will make it much easier for maintainers to help you: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

```python ds.attrs = dict(a=dict(b=2)) ds.to_netcdf(outfile)

...

~/miniconda3/envs/ana/lib/python3.6/site-packages/xarray/backends/api.py in check_attr(name, value) 158 'a string, an ndarray or a list/tuple of ' 159 'numbers/strings for serialization to netCDF ' --> 160 'files'.format(value)) 161 162 # Check attrs on the dataset itself

TypeError: Invalid value for attr: {'b': 2} must be a number, a string, an ndarray or a list/tuple of numbers/strings for serialization to netCDF files

```

Problem description

I'm not entirely sure if this should be possible, but it seems like it should be from this email: https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg10502.html

Nested attributes would be nice as a way to namespace metadata.

Expected Output

Netcdf with nested global attributes.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-16-lowlatency machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.0 pandas: 0.24.2 numpy: 1.16.2 scipy: 1.2.1 netCDF4: 1.4.3.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: None setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: 4.3.1 IPython: 7.3.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2868/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
924559401 MDU6SXNzdWU5MjQ1NTk0MDE= 5489 Misleading error when opening file that does not exist naught101 167164 closed 0     2 2021-06-18T05:37:39Z 2021-06-18T10:43:00Z 2021-06-18T10:43:00Z NONE      

What happened:

```python In [1]: import xarray as xr

In [2]: xr.version Out[2]: '0.18.2'

In [3]: xr.open_dataset('/not-a-real-file')

ValueError Traceback (most recent call last) <ipython-input-3-4cc5243e5a90> in <module> ----> 1 xr.open_dataset('/not-a-real-file')

~/miniconda3/envs/ana38/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, args, *kwargs) 478 479 if engine is None: --> 480 engine = plugins.guess_engine(filename_or_obj) 481 482 backend = plugins.get_backend(engine)

~/miniconda3/envs/ana38/lib/python3.8/site-packages/xarray/backends/plugins.py in guess_engine(store_spec) 109 installed = [k for k in engines if k != "store"] 110 if installed: --> 111 raise ValueError( 112 "did not find a match in any of xarray's currently installed IO " 113 f"backends {installed}. Consider explicitly selecting one of the "

ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'scipy']. Consider explicitly selecting one of the installed backends via the engine parameter to xarray.open_dataset(), or installing additional IO dependencies: http://xarray.pydata.org/en/stable/getting-started-guide/installing.html http://xarray.pydata.org/en/stable/user-guide/io.html ```

What you expected to happen:

Should produce a "FileNotFound" error first.

Engine hunting on a non-existent file is pointless, and the error message is pretty wordy, so my skim-reading originally misinterpreted it to think that for some reason my netcdf4 library wasn't installed, which lead me on to a 4-hour environment rebuild, with a sudden realisation that I'm an idiot at the end of it...

Possible solution:

assert(os.path.isfile(path)) before anything else.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5489/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
426340570 MDExOlB1bGxSZXF1ZXN0MjY1MjE2NTA4 2855 Add note about integer compression precision naught101 167164 closed 0     1 2019-03-28T07:34:02Z 2019-07-19T17:53:42Z 2019-07-19T17:53:42Z FIRST_TIME_CONTRIBUTOR   0 pydata/xarray/pulls/2855

16 bit integer compression is pretty lossy, by my calculations. Probably good enough for a lot of cases, but worth warning about.

  • [x] documentation changes only.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2855/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
446933504 MDU6SXNzdWU0NDY5MzM1MDQ= 2979 Reading single grid cells from a multi-file netcdf dataset? naught101 167164 open 0     1 2019-05-22T05:01:50Z 2019-05-23T16:15:54Z   NONE      

I have a multifile dataset made up of month-long 8-hourly netcdf datasets over nearly 30 years. The files are available from ftp://ftp.ifremer.fr/ifremer/ww3/HINDCAST/GLOBAL/, and I'm spcifically looking at e.g. 1990_CFSR/hs/ww3.199001_hs.nc for each year and month. Each file is about 45Mb, for about 15Gb total.

I want to calculate some lognormal distribution parameters of the Hs variable at each grid point (actually, only a smallish subset of points, using a mask). However, if I load the data with open_mfdataset and try to read a single lat/lon grid cell, my computer tanks, and python gets killed due to running out of memory (I have 16Gb, but even if I only try to open 1 year of data - ~500Mb, python ends up using 27% of my memory).

Is there a way in xarray/dask to force dask to only read single sub-arrays at a time? I have tried using lat/lon chunking, e.g.

python mfdata_glob = '/home/nedcr/cr/data/wave/*1990*.nc' global_ds = xr.open_mfdataset( mfdata_glob, chunks={'latitude': 1, 'longitude': 1}) but that doesn't seem to improve things.

Is there any way around this problem? I guess I could try using preprocess= to sub-select grid cells, and loop over that, but that seems like it would require opening and reading each file 317*720 times, which sounds like a recipe for a long wait.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2979/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
231235256 MDU6SXNzdWUyMzEyMzUyNTY= 1425 FutureWarning with recent pandas naught101 167164 closed 0     1 2017-05-25T04:06:48Z 2017-05-25T17:01:41Z 2017-05-25T17:01:41Z NONE      

{python} /home/naught101/miniconda3/envs/science/lib/python3.6/site-packages/xarray/core/formatting.py:16: FutureWarning: The pandas.tslib module is deprecated and will be removed in a future version. from pandas.tslib import OutOfBoundsDatetime

with pandas 0.20.1 from conda.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1425/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
118525173 MDU6SXNzdWUxMTg1MjUxNzM= 665 ValueError: Buffer has wrong number of dimensions (expected 1, got 2) naught101 167164 closed 0     14 2015-11-24T03:33:33Z 2016-05-04T17:12:02Z 2016-05-04T17:12:02Z NONE      

grab a copy of the file http://nh.id.au/data/ocean_vort.nc.gz, and gunzip it. It's a file with some ocean vorticity fields, from the MOM4 model. The ncdump -h ocean_vort.nc results don't look too odd to me.

If I run:

``` python import xray

ds = xray.open_dataset('ocean_vort.nc') ds ```

I get the following error:

``` python ValueError Traceback (most recent call last) /data/downloads/software/ipython/IPython/core/formatters.py in call(self, obj) 695 type_pprinters=self.type_printers, 696 deferred_pprinters=self.deferred_printers) --> 697 printer.pretty(obj) 698 printer.flush() 699 return stream.getvalue()

/data/downloads/software/ipython/IPython/lib/pretty.py in pretty(self, obj) 382 if callable(meth): 383 return meth(obj, self, cycle) --> 384 return _default_pprint(obj, self, cycle) 385 finally: 386 self.end_group()

/data/downloads/software/ipython/IPython/lib/pretty.py in default_pprint(obj, p, cycle) 502 if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs: 503 # A user-provided repr. Find newlines and replace them with p.break() --> 504 _repr_pprint(obj, p, cycle) 505 return 506 p.begin_group(1, '<')

/data/downloads/software/ipython/IPython/lib/pretty.py in repr_pprint(obj, p, cycle) 700 """A pprint that just redirects to the normal repr function.""" 701 # Find newlines and replace them with p.break() --> 702 output = repr(obj) 703 for idx,output_line in enumerate(output.splitlines()): 704 if idx:

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/core/dataset.py in repr(self) 885 886 def repr(self): --> 887 return formatting.dataset_repr(self) 888 889 @property

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/core/formatting.py in dataset_repr(ds) 271 272 summary.append(coords_repr(ds.coords, col_width=col_width)) --> 273 summary.append(vars_repr(ds.data_vars, col_width=col_width)) 274 if ds.attrs: 275 summary.append(attrs_repr(ds.attrs))

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/core/formatting.py in _mapping_repr(mapping, title, summarizer, col_width) 208 summary = ['%s:' % title] 209 if mapping: --> 210 summary += [summarizer(k, v, col_width) for k, v in mapping.items()] 211 else: 212 summary += [EMPTY_REPR]

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/core/formatting.py in <listcomp>(.0) 208 summary = ['%s:' % title] 209 if mapping: --> 210 summary += [summarizer(k, v, col_width) for k, v in mapping.items()] 211 else: 212 summary += [EMPTY_REPR]

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/core/formatting.py in summarize_var(name, var, col_width) 172 def summarize_var(name, var, col_width): 173 show_values = _not_remote(var) --> 174 return _summarize_var_or_coord(name, var, col_width, show_values) 175 176

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/core/formatting.py in _summarize_var_or_coord(name, var, col_width, show_values, marker, max_width) 154 front_str = first_col + dims_str + ('%s ' % var.dtype) 155 if show_values: --> 156 values_str = format_array_flat(var, max_width - len(front_str)) 157 else: 158 values_str = '...'

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/core/formatting.py in format_array_flat(items_ndarray, max_width) 130 # print at least one item 131 max_possibly_relevant = max(int(np.ceil(max_width / 2.0)), 1) --> 132 relevant_items = first_n_items(items_ndarray, max_possibly_relevant) 133 pprint_items = format_items(relevant_items) 134

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/core/formatting.py in first_n_items(x, n_desired) 54 indexer = _get_indexer_at_least_n_items(x.shape, n_desired) 55 x = x[indexer] ---> 56 return np.asarray(x).flat[:n_desired] 57 58

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/numpy/core/numeric.py in asarray(a, dtype, order) 472 473 """ --> 474 return array(a, dtype, copy=False, order=order) 475 476 def asanyarray(a, dtype=None, order=None):

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/core/common.py in array(self, dtype) 73 74 def array(self, dtype=None): ---> 75 return np.asarray(self.values, dtype=dtype) 76 77 def repr(self):

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/core/dataarray.py in values(self) 332 def values(self): 333 """The array's data as a numpy.ndarray""" --> 334 return self.variable.values 335 336 @values.setter

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/core/variable.py in values(self) 269 def values(self): 270 """The variable's data as a numpy.ndarray""" --> 271 return _as_array_or_item(self._data_cached()) 272 273 @values.setter

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/core/variable.py in _data_cached(self) 235 def _data_cached(self): 236 if not isinstance(self._data, np.ndarray): --> 237 self._data = np.asarray(self._data) 238 return self._data 239

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/numpy/core/numeric.py in asarray(a, dtype, order) 472 473 """ --> 474 return array(a, dtype, copy=False, order=order) 475 476 def asanyarray(a, dtype=None, order=None):

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/core/indexing.py in array(self, dtype) 292 def array(self, dtype=None): 293 array = orthogonally_indexable(self.array) --> 294 return np.asarray(array[self.key], dtype=None) 295 296 def getitem(self, key):

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/conventions.py in getitem(self, key) 416 417 def getitem(self, key): --> 418 return decode_cf_timedelta(self.array[key], units=self.units) 419 420

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/xray/conventions.py in decode_cf_timedelta(num_timedeltas, units) 166 num_timedeltas = _asarray_or_scalar(num_timedeltas) 167 units = _netcdf_to_numpy_timeunit(units) --> 168 result = pd.to_timedelta(num_timedeltas, unit=units, box=False) 169 # NaT is returned unboxed with wrong units; this should be fixed in pandas 170 if result.dtype != 'timedelta64[ns]':

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/pandas/util/decorators.py in wrapper(args, kwargs) 87 else: 88 kwargs[new_arg_name] = new_arg_value ---> 89 return func(args, **kwargs) 90 return wrapper 91 return _deprecate_kwarg

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/pandas/tseries/timedeltas.py in to_timedelta(arg, unit, box, errors, coerce) 64 return _convert_listlike(arg, box=box, unit=unit, name=arg.name) 65 elif is_list_like(arg): ---> 66 return _convert_listlike(arg, box=box, unit=unit) 67 68 # ...so it must be a scalar value. Return scalar.

/home/naught101/miniconda3/envs/science/lib/python3.4/site-packages/pandas/tseries/timedeltas.py in _convert_listlike(arg, box, unit, name) 47 value = arg.astype('timedelta64[{0}]'.format(unit)).astype('timedelta64[ns]', copy=False) 48 else: ---> 49 value = tslib.array_to_timedelta64(_ensure_object(arg), unit=unit, errors=errors) 50 value = value.astype('timedelta64[ns]', copy=False) 51

pandas/tslib.pyx in pandas.tslib.array_to_timedelta64 (pandas/tslib.c:47353)()

ValueError: Buffer has wrong number of dimensions (expected 1, got 2) ```

Any idea what might be causing that problem?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/665/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
111525165 MDU6SXNzdWUxMTE1MjUxNjU= 625 Best way to copy data layout? naught101 167164 closed 0     3 2015-10-15T01:09:47Z 2015-10-20T17:00:25Z 2015-10-20T17:00:25Z NONE      

I have a dataset that represents some observed variables at a particular site.

For example:

python <xray.Dataset> Dimensions: (time: 70128, x: 1, y: 1, z: 1) Coordinates: * x (x) float64 1.0 * y (y) float64 1.0 * time (time) datetime64[ns] 2003-01-01T00:30:00 ... * z (z) float64 1.0 Data variables: latitude (y, x) float64 41.9 longitude (y, x) float64 13.61 SWdown (time, y, x) float64 0.0 0.01205 0.0 0.0 0.0 0.0 0.0 ... Tair (time, z, y, x) float64 276.8 276.4 276.4 276.7 276.7 ... Rainf (time, y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... elevation (y, x) float64 884.2 reference_height (y, x) float64 4.0 Attributes: Production_time: 2012-09-27 11:34:52 Production_source: ...

I want to use this data as input to a model, and then output a data structure that is the same, except that it contains different output variables.

I tried doing

python new_ds = old_ds[['latitude', 'longitude', 'elevation', 'reference height']]

and then adding the new variables, but this drops the time coordinate, so you have to re-create it manually. Is there a better way? It would be quite nice if it was possible to do something like

python new_ds = old_ds[vars, drop_coords=False]

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/625/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
107139131 MDU6SXNzdWUxMDcxMzkxMzE= 582 dim_names, coord_names, var_names, attr_names convenience functions naught101 167164 closed 0     3 2015-09-18T05:57:54Z 2015-09-23T01:25:15Z 2015-09-23T01:25:15Z NONE      

It'd be nice to have some convenience functions for easy outout of dim/coord/variable/attr names, eg:

python ds.dim_names() == list(ds.dims.keys()) ds.coord_names() == list(ds.coords.keys()) ds.var_names() == list(ds.vars.keys()) ds.attr_names() == list(ds.attrs.keys())

just for reading/writing sanity.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/582/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
71772116 MDU6SXNzdWU3MTc3MjExNg== 404 Segfault on import naught101 167164 closed 0     2 2015-04-29T04:08:18Z 2015-04-29T06:09:44Z 2015-04-29T04:11:09Z NONE      

Using xray 0.4.0 from binstar:

``` python Python 3.4.3 |Continuum Analytics, Inc.| (default, Mar 6 2015, 12:03:53) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or "license" for more information.

import xray [1] 21893 segmentation fault (core dumped) python ```

This is in a conda env, which was transferred from a previous incarnation of this computer (crash+burn -> reinstall), so maybe there's something wrong with the installation. I've tried re-installing xray from binstar. I don't really know what else to try. Any ideas?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/404/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
62242132 MDU6SXNzdWU2MjI0MjEzMg== 374 Set coordinate resolution in ds.to_netcdf naught101 167164 closed 0     4 2015-03-17T00:10:22Z 2015-03-17T03:01:57Z 2015-03-17T02:09:15Z NONE      

I am trying to mutate some netcdf data, so that I can feed it to a Fortran model, but the ds.to_netcdf function results in a file with different time units (minutes instead of seconds).

Original file:

$ ncdump -h ~/path/to/met_forcings.nc netcdf met_forcings { dimensions: x = 1 ; y = 1 ; time = UNLIMITED ; // (70128 currently) z = 1 ; variables: ... double time(time) ; time:units = "seconds since 2002-01-01 00:30:00" ;

New file:

$ ncdump -h ../projects/synthetic_forcings/data/tumba_site_mean_2_year.nc netcdf tumba_site_mean_2_year { dimensions: time = 35088 ; y = 1 ; x = 1 ; z = 1 ; variables: ... float time(time) ; time:calendar = "proleptic_gregorian" ; time:units = "minutes since 2000-01-01 00:30:00" ;

The model is expecting second units, so all of the time-based calculations are amusingly out of whack. In my case, the Fortran model is probably to blame, because it's time reading function should be more robust. However, it would be generally useful to be able to specify the time units when saving to netcdf. I can't see a way of doing that currently.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/374/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 21.902ms · About: xarray-datasette