home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

42 rows where author_association = "NONE" and user = 167164 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 14

  • pd.Grouper support? 9
  • Is there a more efficient way to convert a subset of variables to a dataframe? 7
  • ValueError: Buffer has wrong number of dimensions (expected 1, got 2) 6
  • Multidimensional groupby 4
  • Add a "drop" option to squeeze 2
  • Set coordinate resolution in ds.to_netcdf 2
  • Best way to copy data layout? 2
  • Add `Dataset.drop_dims` 2
  • Add option to pass callable assertion failure message generator 2
  • remap_label_indexers removed without deprecation update? 2
  • Segfault on import 1
  • dim_names, coord_names, var_names, attr_names convenience functions 1
  • "weird" plot 1
  • open_rasterio does not read coordinates from netCDF file properly with netCDF4>=1.4.2 1

user 1

  • naught101 · 42 ✖

author_association 1

  • NONE · 42 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1321361968 https://github.com/pydata/xarray/issues/7278#issuecomment-1321361968 https://api.github.com/repos/pydata/xarray/issues/7278 IC_kwDOAMm_X85OwmIw naught101 167164 2022-11-21T02:18:56Z 2022-11-21T02:18:56Z NONE

If it takes another couple of years to break, and need another re-write that's probably OK.

But if there is a better/more standard way to do what I'm trying to do, please let me know.

The underlying issue is that I have a netcdf with a RotatedPole grid (e.g. any of the CORDEX data), and I need to find the nearest grid point to a certain point. To do this, I map my point onto the RotatedPole projection (converting it from lat, lon, to x, y, or rlat, rlon), and then use the index mapping's nearest method to find the nearest grid point.

It didn't seem like there was an obvious way to do this with the actual API functions, so that's what I landed on.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  remap_label_indexers removed without deprecation update? 1444752393
1316170523 https://github.com/pydata/xarray/issues/7278#issuecomment-1316170523 https://api.github.com/repos/pydata/xarray/issues/7278 IC_kwDOAMm_X85Ocysb naught101 167164 2022-11-16T01:54:24Z 2022-11-16T01:54:24Z NONE

Thank you, this is sufficient for me for now. I was able to replace

python nearest_point = remap_label_indexers(self.data, dict(x=x, y=y), method='nearest')[0]

with

python nearest_point = map_index_queries(self.data, dict(x=x, y=y), method='nearest').dim_indexers

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  remap_label_indexers removed without deprecation update? 1444752393
881859823 https://github.com/pydata/xarray/pull/5607#issuecomment-881859823 https://api.github.com/repos/pydata/xarray/issues/5607 IC_kwDOAMm_X840kBzv naught101 167164 2021-07-17T08:51:12Z 2021-07-17T08:51:12Z NONE

@shoyer That would either not work, or be needlessly expensive, I think. The message generation might be expensive (e.g. if I want a sum or mean of the differences). With a call back it only happens if it is needed. With a pre-computed message it would be computed every time.. Correct me if I'm wrong.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add option to pass callable assertion failure message generator 945226829
881170326 https://github.com/pydata/xarray/pull/5607#issuecomment-881170326 https://api.github.com/repos/pydata/xarray/issues/5607 IC_kwDOAMm_X840hZeW naught101 167164 2021-07-16T04:37:24Z 2021-07-16T11:35:40Z NONE

@TomNicholas My particular use case is that have datasets that are large enough that I can't see the full diff, so might miss major changes. I'm wanting to pass in something like lambda a, b: f"Largest difference in data is {abs(a-b).max().item()}", so I can quickly see if the changes are meaningful. Obviously a more complex function might also be useful, like a summary/describe table output of the differences..

I know I could set the tolerances higher, but the changes are not numerical errors, and I want to see them before updating the test data that they are comparing against.

Entirely possible that there are better ways to do this, of course :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add option to pass callable assertion failure message generator 945226829
570993776 https://github.com/pydata/xarray/issues/3185#issuecomment-570993776 https://api.github.com/repos/pydata/xarray/issues/3185 MDEyOklzc3VlQ29tbWVudDU3MDk5Mzc3Ng== naught101 167164 2020-01-06T03:54:03Z 2020-01-06T03:54:03Z NONE

I'm seeing a problem with geotiffs that I think might be related:

sh $ gdalinfo /home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif Driver: GTiff/GeoTIFF Files: /home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif /home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif.aux.xml Size is 10, 10 Coordinate System is: PROJCS["GDA94 / MGA zone 55", GEOGCS["GDA94", DATUM["Geocentric_Datum_of_Australia_1994", SPHEROID["GRS 1980",6378137,298.257222101, AUTHORITY["EPSG","7019"]], TOWGS84[0,0,0,0,0,0,0], AUTHORITY["EPSG","6283"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.0174532925199433, AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4283"]], PROJECTION["Transverse_Mercator"], PARAMETER["latitude_of_origin",0], PARAMETER["central_meridian",147], PARAMETER["scale_factor",0.9996], PARAMETER["false_easting",500000], PARAMETER["false_northing",10000000], UNIT["metre",1, AUTHORITY["EPSG","9001"]], AXIS["Easting",EAST], AXIS["Northing",NORTH], AUTHORITY["EPSG","28355"]] Origin = (-406507.954209543997422,7588834.152862589806318) Pixel Size = (263.500000000000000,-265.500000000000000) Metadata: AREA_OR_POINT=Area Image Structure Metadata: INTERLEAVE=BAND Corner Coordinates: Upper Left ( -406507.954, 7588834.153) (138d16' 6.99"E, 21d34'24.95"S) Lower Left ( -406507.954, 7586179.153) (138d16' 1.84"E, 21d35'50.30"S) Upper Right ( -403872.954, 7588834.153) (138d17'37.56"E, 21d34'29.73"S) Lower Right ( -403872.954, 7586179.153) (138d17'32.42"E, 21d35'55.09"S) Center ( -405190.454, 7587506.653) (138d16'49.70"E, 21d35'10.02"S) Band 1 Block=10x10 Type=Float32, ColorInterp=Gray Min=0.069 Max=8.066 Minimum=0.069, Maximum=8.066, Mean=2.556, StdDev=1.749 NoData Value=-3.40282306073709653e+38 Metadata: STATISTICS_MAXIMUM=8.06591796875 STATISTICS_MEAN=2.5563781395387 STATISTICS_MINIMUM=0.068740844726562 STATISTICS_STDDEV=1.7493082797107 STATISTICS_VALID_PERCENT=89

```python import xarray as xr from osgeo.gdal import Open

ras = Open('/home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif') ras.GetGeoTransform()

(-406507.954209544, 263.5, 0.0, 7588834.15286259, 0.0, -265.5)

ds = xr.open_rasterio('/home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif') ds.transform

(263.5, 0.0, -406507.954209544, 0.0, -265.5, 7588834.15286259)

```

The transform in the xarray dataset is transposed, and not really useful anymore.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-37-lowlatency machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.14.1 pandas: 0.25.1 numpy: 1.16.4 scipy: 1.3.0 netCDF4: 1.5.1.2 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: 1.0.22 cfgrib: None iris: None bottleneck: None dask: 1.2.0 distributed: 1.27.1 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: 4.3.1 IPython: 7.3.0 sphinx: 2.1.2
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_rasterio does not read coordinates from netCDF file properly with netCDF4>=1.4.2 477081946
482468492 https://github.com/pydata/xarray/pull/2767#issuecomment-482468492 https://api.github.com/repos/pydata/xarray/issues/2767 MDEyOklzc3VlQ29tbWVudDQ4MjQ2ODQ5Mg== naught101 167164 2019-04-12T07:26:07Z 2019-04-12T07:26:07Z NONE

I guess ds = ds.drop([c for c in ds.coords if c not in ds.dims]) works. Might be nice to have a convenience function for that though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add `Dataset.drop_dims` 409618228
482468103 https://github.com/pydata/xarray/pull/2767#issuecomment-482468103 https://api.github.com/repos/pydata/xarray/issues/2767 MDEyOklzc3VlQ29tbWVudDQ4MjQ2ODEwMw== naught101 167164 2019-04-12T07:24:43Z 2019-04-12T07:24:43Z NONE

Is there currently a way to drop unused coordinates, like this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add `Dataset.drop_dims` 409618228
267510339 https://github.com/pydata/xarray/issues/242#issuecomment-267510339 https://api.github.com/repos/pydata/xarray/issues/242 MDEyOklzc3VlQ29tbWVudDI2NzUxMDMzOQ== naught101 167164 2016-12-16T03:43:58Z 2016-12-16T03:43:58Z NONE

Awesome, thanks shoyer! :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add a "drop" option to squeeze 44594982
259044958 https://github.com/pydata/xarray/issues/1086#issuecomment-259044958 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1OTA0NDk1OA== naught101 167164 2016-11-08T04:47:56Z 2016-11-08T04:47:56Z NONE

Ok, no worries. I'll try it if it gets desperate :)

Thanks for your help, shoyer!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259041491 https://github.com/pydata/xarray/issues/1086#issuecomment-259041491 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1OTA0MTQ5MQ== naught101 167164 2016-11-08T04:16:26Z 2016-11-08T04:16:26Z NONE

So it would be more efficient to concat all of the datasets (subset for the relevant variables), and then just use a single .to_dataframe() call on the entire dataset? If so, that would require quite a bit of refactoring on my part, but it could be worth it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259033970 https://github.com/pydata/xarray/issues/1086#issuecomment-259033970 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1OTAzMzk3MA== naught101 167164 2016-11-08T03:14:50Z 2016-11-08T03:14:50Z NONE

Yeah, I'm loading each file separately with xr.open_dataset(), since it's not really a multi-file dataset (it's a lot of single-site datasets, some of which have different variables, and overlapping time dimensions). I don't think I can avoid loading them separately...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
259026069 https://github.com/pydata/xarray/issues/1086#issuecomment-259026069 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1OTAyNjA2OQ== naught101 167164 2016-11-08T02:19:01Z 2016-11-08T02:19:01Z NONE

Not easily - most scripts require multiple (up to 200, of which the linked one is one of the smallest, some are up to 10Mb) of these datasets in a specific directory structure, and rely on a couple of private python modules. I was just asking because I thought I might have been missing something obvious, but now I guess that isn't the case. Probably not worth spending too much time on this - if it starts becoming a real problem for me, I will try to generate something self-contained that shows the problem. Until then, maybe it's best to assume that xarray/pandas are doing the best they can given the requirements, and close this for now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258774196 https://github.com/pydata/xarray/issues/1086#issuecomment-258774196 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1ODc3NDE5Ng== naught101 167164 2016-11-07T08:30:25Z 2016-11-07T08:30:25Z NONE

I loaded it from a netcdf file. There's an example you can play with at https://dl.dropboxusercontent.com/u/50684199/MitraEFluxnet.1.4_flux.nc

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258755061 https://github.com/pydata/xarray/issues/1086#issuecomment-258755061 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1ODc1NTA2MQ== naught101 167164 2016-11-07T06:12:27Z 2016-11-07T06:12:27Z NONE

Slightly slower (using %timeit in ipython)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
258753366 https://github.com/pydata/xarray/issues/1086#issuecomment-258753366 https://api.github.com/repos/pydata/xarray/issues/1086 MDEyOklzc3VlQ29tbWVudDI1ODc1MzM2Ng== naught101 167164 2016-11-07T05:56:26Z 2016-11-07T05:56:26Z NONE

Squeeze is pretty much identical in efficiency. Seems very slightly better (2-5%) on smaller datasets. (I still need to add the final [data_vars] to get rid of the extraneous index_var columns, but that doesn't affect performance much).

I'm not calling pandas.tslib.array_to_timedelta64, to_dataframe is - the caller list is (sorry, I'm not sure of a better way to show this):

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Is there a more efficient way to convert a subset of variables to a dataframe? 187608079
218675077 https://github.com/pydata/xarray/pull/818#issuecomment-218675077 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxODY3NTA3Nw== naught101 167164 2016-05-12T06:54:53Z 2016-05-12T06:54:53Z NONE

forcing_data.isel(lat=lat, lon=lon).values() returns a ValuesView, which scikit-learn doesn't like. However, forcing_data.isel(lat=lat, lon=lon).to_array().T seems to work..

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
218667702 https://github.com/pydata/xarray/pull/818#issuecomment-218667702 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxODY2NzcwMg== naught101 167164 2016-05-12T06:02:55Z 2016-05-12T06:02:55Z NONE

@shoyer: Where does times come from in that code?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
218654978 https://github.com/pydata/xarray/pull/818#issuecomment-218654978 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxODY1NDk3OA== naught101 167164 2016-05-12T04:02:43Z 2016-05-12T04:03:01Z NONE

Example forcing data:

<xarray.Dataset> Dimensions: (lat: 360, lon: 720, time: 2928) Coordinates: * lon (lon) float64 0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4.75 ... * lat (lat) float64 -89.75 -89.25 -88.75 -88.25 -87.75 -87.25 -86.75 ... * time (time) datetime64[ns] 2012-01-01 2012-01-01T03:00:00 ... Data variables: SWdown (time, lat, lon) float64 446.5 444.9 445.3 447.8 452.4 456.3 ...

Where there might be an arbitrary number of data variables, and the scikit-learn input would be time (rows) by data variables (columns). I'm currently doing this:

``` python def predict_gridded(model, forcing_data, flux_vars): """predict model results for gridded data

:model: TODO
:data: TODO
:returns: TODO

"""
# set prediction metadata
prediction = forcing_data[list(forcing_data.coords)]

# Arrays like (var, lon, lat, time)
result = np.full([len(flux_vars),
                  forcing_data.dims['lon'],
                  forcing_data.dims['lat'],
                  forcing_data.dims['time']],
                 np.nan)
print("predicting for lon: ")
for lon in range(len(forcing_data['lon'])):
    print(lon, end=', ')
    for lat in range(len(forcing_data['lat'])):
        result[:, lon, lat, :] = model.predict(
            forcing_data.isel(lat=lat, lon=lon)
                        .to_dataframe()
                        .drop(['lat', 'lon'], axis=1)
        ).T
print("")
for i, fv in enumerate(flux_vars):
    prediction.update(
        {fv: xr.DataArray(result[i, :, :, :], 
                          dims=['lon', 'lat', 'time'],
                          coords=forcing_data.coords)
        }
    )

return prediction

```

and I think it's working (still debugging, and it's pretty slow running)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
218372591 https://github.com/pydata/xarray/pull/818#issuecomment-218372591 https://api.github.com/repos/pydata/xarray/issues/818 MDEyOklzc3VlQ29tbWVudDIxODM3MjU5MQ== naught101 167164 2016-05-11T06:24:11Z 2016-05-11T06:24:11Z NONE

I want to be able to run a scikit-learn model over a bunch of variables in a 3D (lat/lon/time) dataset, and return values for each coordinate point. Is something like this multi-dimensional groupby required (I'm thinking groupby(lat, lon) => 2D matrices that can be fed straight into scikit-learn), or is there already some other mechanism that could achieve something like this? Or is the best way at the moment just to create a null dataset, and loop over lat/lon and fill in the blanks as you go?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multidimensional groupby 146182176
216445714 https://github.com/pydata/xarray/issues/242#issuecomment-216445714 https://api.github.com/repos/pydata/xarray/issues/242 MDEyOklzc3VlQ29tbWVudDIxNjQ0NTcxNA== naught101 167164 2016-05-03T06:04:47Z 2016-05-03T06:04:47Z NONE

Not sure if it's worth a separate feature request, but it'd be great to have a drop option in .isel() as well. For example:

``` python

ds <xarray.Dataset> Dimensions: (lat: 360, lon: 720, time: 2928) Coordinates: * lon (lon) float64 0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4.75 ... * lat (lat) float64 -89.75 -89.25 -88.75 -88.25 -87.75 -87.25 -86.75 ... * time (time) datetime64[ns] 2012-01-01 2012-01-01T03:00:00 ... Data variables: dswrf (time, lat, lon) float64 446.5 444.9 445.3 447.8 452.4 456.3 ...

ds.isel(lat=0, lon=0, drop=True) <xarray.Dataset> Dimensions: (time: 2928) Coordinates: * time (time) datetime64[ns] 2012-01-01 2012-01-01T03:00:00 ... Data variables: dswrf (time) float64 446.5 444.9 445.3 447.8 452.4 456.3 ... ```

This would be useful for making .to_dataframe() more efficient, too, I suspect, since I have to currently do forcing.isel(lat=0, lon=0).to_dataframe()[['dswrf']] to remove the (constant) lat/lon data from the dataframe.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add a "drop" option to squeeze 44594982
159423524 https://github.com/pydata/xarray/issues/665#issuecomment-159423524 https://api.github.com/repos/pydata/xarray/issues/665 MDEyOklzc3VlQ29tbWVudDE1OTQyMzUyNA== naught101 167164 2015-11-24T22:15:36Z 2015-11-24T22:15:36Z NONE

Sorry, this is just something I stumbled across in a workshop, it's not a problem that's affecting my work, so I don't have much time for it. But I'll make the MOM developers aware of it - they might have an interest in helping out.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ValueError: Buffer has wrong number of dimensions (expected 1, got 2) 118525173
159421893 https://github.com/pydata/xarray/issues/665#issuecomment-159421893 https://api.github.com/repos/pydata/xarray/issues/665 MDEyOklzc3VlQ29tbWVudDE1OTQyMTg5Mw== naught101 167164 2015-11-24T22:10:19Z 2015-11-24T22:10:19Z NONE

@shoyer: the file is publically available, have you tried checking it on your set-up? The file is about 35Mb..

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ValueError: Buffer has wrong number of dimensions (expected 1, got 2) 118525173
159418844 https://github.com/pydata/xarray/issues/665#issuecomment-159418844 https://api.github.com/repos/pydata/xarray/issues/665 MDEyOklzc3VlQ29tbWVudDE1OTQxODg0NA== naught101 167164 2015-11-24T22:03:25Z 2015-11-24T22:03:25Z NONE

Same problem with scipy - I'm not sure what I need to install to get pydap and h5netcdf working...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ValueError: Buffer has wrong number of dimensions (expected 1, got 2) 118525173
159153806 https://github.com/pydata/xarray/issues/665#issuecomment-159153806 https://api.github.com/repos/pydata/xarray/issues/665 MDEyOklzc3VlQ29tbWVudDE1OTE1MzgwNg== naught101 167164 2015-11-24T05:31:43Z 2015-11-24T05:31:43Z NONE

I updated to 1.2.1 from pypi, it still fails.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ValueError: Buffer has wrong number of dimensions (expected 1, got 2) 118525173
159146583 https://github.com/pydata/xarray/issues/665#issuecomment-159146583 https://api.github.com/repos/pydata/xarray/issues/665 MDEyOklzc3VlQ29tbWVudDE1OTE0NjU4Mw== naught101 167164 2015-11-24T04:28:52Z 2015-11-24T04:28:52Z NONE

$ conda list |grep -i cdf libnetcdf 4.3.3.1 1 netcdf4 1.1.9 np110py34_0

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ValueError: Buffer has wrong number of dimensions (expected 1, got 2) 118525173
159141364 https://github.com/pydata/xarray/issues/665#issuecomment-159141364 https://api.github.com/repos/pydata/xarray/issues/665 MDEyOklzc3VlQ29tbWVudDE1OTE0MTM2NA== naught101 167164 2015-11-24T03:34:30Z 2015-11-24T03:34:30Z NONE

xray.version.version == '0.6.1' from anaconda, on python 3.4

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ValueError: Buffer has wrong number of dimensions (expected 1, got 2) 118525173
159140295 https://github.com/pydata/xarray/issues/663#issuecomment-159140295 https://api.github.com/repos/pydata/xarray/issues/663 MDEyOklzc3VlQ29tbWVudDE1OTE0MDI5NQ== naught101 167164 2015-11-24T03:26:54Z 2015-11-24T03:26:54Z NONE

and?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  "weird" plot 117881465
148271725 https://github.com/pydata/xarray/issues/625#issuecomment-148271725 https://api.github.com/repos/pydata/xarray/issues/625 MDEyOklzc3VlQ29tbWVudDE0ODI3MTcyNQ== naught101 167164 2015-10-15T03:28:59Z 2015-10-15T03:28:59Z NONE

The lat/x, long/y is a bit odd, yeah, I think the file is set up a bit wrong, but I'm trying to stick to the format I'm given.

Reference height, at least, isn't really a co-ordinate. It's the value of the tower measurements. The other data is really (implicitly) at elevation+screen height, I think.

But yeah, that's basically what I came up with, just copy the relevant coords for each variable from the old dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Best way to copy data layout? 111525165
148252414 https://github.com/pydata/xarray/issues/625#issuecomment-148252414 https://api.github.com/repos/pydata/xarray/issues/625 MDEyOklzc3VlQ29tbWVudDE0ODI1MjQxNA== naught101 167164 2015-10-15T01:47:56Z 2015-10-15T01:47:56Z NONE

Hrm, I guess I can just do

python new_ds = old_ds[GEO_VARS] for c in old_ds.coords: if c not in new_ds.coords: new_ds.coords[c] = old_ds.coords[c]

Still, it might be nice to have this built in.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Best way to copy data layout? 111525165
142467890 https://github.com/pydata/xarray/issues/582#issuecomment-142467890 https://api.github.com/repos/pydata/xarray/issues/582 MDEyOklzc3VlQ29tbWVudDE0MjQ2Nzg5MA== naught101 167164 2015-09-23T01:24:34Z 2015-09-23T01:25:07Z NONE

Fair enough, I wasn't aware list(ds.dims) worked..

.keys() returns a keysview though, not a list:

``` python In [4]: ds.dims.keys() Out[4]: KeysView(Frozen(SortedKeysDict({'time': 70128, 'z': 1, 'x': 1, 'y': 1})))

In [5]: ds.dims.keys()[0]

TypeError Traceback (most recent call last) <ipython-input-5-874ed46e2639> in <module>() ----> 1 ds.dims.keys()[0]

TypeError: 'KeysView' object does not support indexing ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dim_names, coord_names, var_names, attr_names convenience functions 107139131
97298869 https://github.com/pydata/xarray/issues/404#issuecomment-97298869 https://api.github.com/repos/pydata/xarray/issues/404 MDEyOklzc3VlQ29tbWVudDk3Mjk4ODY5 naught101 167164 2015-04-29T04:11:09Z 2015-04-29T04:11:09Z NONE

Other envs aren't affected. I guess this is a conda problem, not an xray problem. Easy enough to re-create an env, I guess.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Segfault on import 71772116
82035634 https://github.com/pydata/xarray/issues/374#issuecomment-82035634 https://api.github.com/repos/pydata/xarray/issues/374 MDEyOklzc3VlQ29tbWVudDgyMDM1NjM0 naught101 167164 2015-03-17T02:09:15Z 2015-03-17T02:09:30Z NONE

blegh... just noticed the different dates. Never mind :P

Thanks for the help again

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Set coordinate resolution in ds.to_netcdf 62242132
82034427 https://github.com/pydata/xarray/issues/374#issuecomment-82034427 https://api.github.com/repos/pydata/xarray/issues/374 MDEyOklzc3VlQ29tbWVudDgyMDM0NDI3 naught101 167164 2015-03-17T02:04:17Z 2015-03-17T02:04:17Z NONE

Ah, cool, that looks good, however, I think there might be a bug somewhere. Here's what I was getting originally:

``` In [62]: new_data['time'].encoding

Out[62]: {}

In [60]: new_data.to_netcdf('data/tumba_site_mean_2_year.nc') ```

resulted in:

``` $ ncdump ../projects/synthetic_forcings/data/tumba_site_mean_2_year.nc|grep time --context=2 netcdf tumba_site_mean_2_year { dimensions: time = 35088 ; y = 1 ; x = 1 ; variables: ... float time(time) ; time:calendar = "proleptic_gregorian" ; time:units = "minutes since 2000-01-01 00:30:00" ; ...

time = 0, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360, 390, 420, 450, 480, 510, 540, 570, 600, 630, 660, 690, 720, 750, 780, 810, 840, ```

And if I copy the encoding from the original file, as it was loaded:

``` In [65]: new_data['time'].encoding = data['time'].encoding

In [66]: new_data['time'].encoding

Out[66]: {'dtype': dtype('>f8'), 'units': 'seconds since 2002-01-01 00:30:00'}

In [67]: new_data.to_netcdf('data/tumba_site_mean_2_year.nc') ```

results in

``` $ ncdump ../projects/synthetic_forcings/data/tumba_site_mean_2_year.nc|grep time --context=1 dimensions: time = 35088 ; y = 1 ; -- variables: ... double time(time) ; time:calendar = "proleptic_gregorian" ; time:units = "seconds since 2002-01-01T00:30:00" ;

... time = -63158400, -63156600, -63154800, -63153000, -63151200, -63149400, -63147600, -63145800, -63144000, -63142200, -63140400, -63138600, ```

Now the units are right, but the values are way off. I can't see anything obvious missing from the encoding, compared to the xray docs, but I'm not sure how it works.

Also, since seconds are the base SI unit for time, I think it would be sensible to use seconds by default, if no encoding is given.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Set coordinate resolution in ds.to_netcdf 62242132
78239807 https://github.com/pydata/xarray/issues/364#issuecomment-78239807 https://api.github.com/repos/pydata/xarray/issues/364 MDEyOklzc3VlQ29tbWVudDc4MjM5ODA3 naught101 167164 2015-03-11T10:38:05Z 2015-03-11T10:38:05Z NONE

Ah, yep, making the dimension using data.coords['timeofday'] = ('time', [np.timedelta64(60 * int(h) + int(m), 'm') for h,m in zip(data['time.hour'], data['time.minute'])]) works. Thanks for all the help :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pd.Grouper support? 60303760
78211171 https://github.com/pydata/xarray/issues/364#issuecomment-78211171 https://api.github.com/repos/pydata/xarray/issues/364 MDEyOklzc3VlQ29tbWVudDc4MjExMTcx naught101 167164 2015-03-11T06:17:10Z 2015-03-11T06:17:10Z NONE

Ok, weird. That example works for me, but even if I take a really short slice of my data set, the same thing won't work:

``` In [61]: d = data.sel(time=slice('2002-01-01','2002-01-03')) d

Out[61]:

<xray.Dataset> Dimensions: (time: 143, timeofday: 70128, x: 1, y: 1, z: 1) Coordinates: * x (x) >f8 1.0 * y (y) >f8 1.0 * z (z) >f8 1.0 * time (time) datetime64[ns] 2002-01-01T00:30:00 ... * timeofday (timeofday) timedelta64[ns] 1800000000000 nanoseconds ... Data variables: SWdown (time, y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 14.58 ... Rainf_qc (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 ... SWdown_qc (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 ... Tair (time, z, y, x) float64 282.9 282.9 282.7 282.6 282.4 281.7 281.0 ... Tair_qc (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 ... LWdown (time, y, x) float64 296.7 297.3 297.3 297.3 297.2 295.9 294.5 ... PSurf_qc (time, y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... latitude (y, x) float64 -35.66 Wind (time, z, y, x) float64 2.2 2.188 1.9 2.2 2.5 2.5 2.5 2.25 2.0 2.35 ... LWdown_qc (time, y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... Rainf (time, y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... Qair_qc (time, y, x) float64 1.0 0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 ... longitude (y, x) float64 148.2 PSurf (time, y, x) float64 8.783e+04 8.783e+04 8.782e+04 8.781e+04 ... reference_height (y, x) float64 70.0 elevation (y, x) float64 1.2e+03 Qair (time, z, y, x) float64 0.00448 0.004608 0.004692 0.004781 ... Wind_qc (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 ... Attributes: Production_time: 2012-09-27 12:44:42 Production_source: PALS automated netcdf conversion Contact: palshelp@gmail.com PALS_fluxtower_template_version: 1.0.2 PALS_dataset_name: TumbaFluxnet PALS_dataset_version: 1.4

In [62]: d.groupby('timeofday').mean('time') ```

That last command will not complete - it will run for minutes. Not really sure how to debug that behaviour.

Perhaps it's to do with the long/lat/height variables that really should be coordinates (I'm just using the data as it came, but I can clean that, if necessary)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pd.Grouper support? 60303760
78191526 https://github.com/pydata/xarray/issues/364#issuecomment-78191526 https://api.github.com/repos/pydata/xarray/issues/364 MDEyOklzc3VlQ29tbWVudDc4MTkxNTI2 naught101 167164 2015-03-11T03:00:03Z 2015-03-11T03:00:03Z NONE

same problem with numpy.timedelta64 too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pd.Grouper support? 60303760
78036587 https://github.com/pydata/xarray/issues/364#issuecomment-78036587 https://api.github.com/repos/pydata/xarray/issues/364 MDEyOklzc3VlQ29tbWVudDc4MDM2NTg3 naught101 167164 2015-03-10T11:30:10Z 2015-03-10T11:30:10Z NONE

Dunno if this is related to the ds['time.time'] problem, but I tried creating the daily_cycle using a pandas.Timedelta as the index (timeofday), but it also appeared to just hang indefinitely when doing the data.groupby('timeofday').mean('time') call..

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pd.Grouper support? 60303760
78008962 https://github.com/pydata/xarray/issues/364#issuecomment-78008962 https://api.github.com/repos/pydata/xarray/issues/364 MDEyOklzc3VlQ29tbWVudDc4MDA4OTYy naught101 167164 2015-03-10T07:51:45Z 2015-03-10T07:51:45Z NONE

Nice.

Ok, I have hit a stumbling block, and this is much more of a support request, so feel free to direct me else where, but since we're on the topic, I want to do something like:

start = 2002 n_years = 4 new_data = [] for year in range(start, start + n_years): days = 365 if year%4 else 365 for d in range(days): day_data = mean + annual_cycle.isel(dayofyear=d) + daily_cycle day_data.coords['time'] = datetime.datetime(year,1,1) + datetime.timedelta(day=d, hour=day_data.timeofday.hour, minute=day_data.timeofday.minute) new_data.append(day_data) xray.concat(new_data)

where mean, annual_cycle, and daily_cycle are overall mean, annual cycle at daily resolution, and daily cycle at 30 minute resolution (the latter two bias corrected by subtracting the mean). I'm trying to make a synthetic dataset 4 years long that only includes the mean, seasonal, and daily cycles, but no other variability.

The assignment of day_data['time'] fails because the day_data.timeofday.hour (and .minute) don't work. These are datetime.times, is there an efficient way of converting these to datetime.timedeltas, without first manually taking them out of the DataArray?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pd.Grouper support? 60303760
77978458 https://github.com/pydata/xarray/issues/364#issuecomment-77978458 https://api.github.com/repos/pydata/xarray/issues/364 MDEyOklzc3VlQ29tbWVudDc3OTc4NDU4 naught101 167164 2015-03-10T01:16:25Z 2015-03-10T01:16:25Z NONE

Ah, cool, thanks for that link, I missed that in the docs.

One thing that would be nice (in both pandas and xray) is a time.timeofday. I can't figure out how to do it with time.hour and time.minute - I need half-hourly resolution averaging. time.time does something in xray, but it seems to never complete, and it doesn't work at all in pandas.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pd.Grouper support? 60303760
77824657 https://github.com/pydata/xarray/issues/364#issuecomment-77824657 https://api.github.com/repos/pydata/xarray/issues/364 MDEyOklzc3VlQ29tbWVudDc3ODI0NjU3 naught101 167164 2015-03-09T09:46:15Z 2015-03-09T09:46:15Z NONE

Heh, I meant the pandas docs - they don't specify the rule argument format either

time.month and time.hour do exactly what I need. They aren't mentioned in the docs at http://xray.readthedocs.org/en/stable/groupby.html, and I'm not sure how I'd guess that they exist, so perhaps they should be added to that page? It doesn't appear to be something that exists in pandas..

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pd.Grouper support? 60303760
77810787 https://github.com/pydata/xarray/issues/364#issuecomment-77810787 https://api.github.com/repos/pydata/xarray/issues/364 MDEyOklzc3VlQ29tbWVudDc3ODEwNzg3 naught101 167164 2015-03-09T07:34:49Z 2015-03-09T07:34:49Z NONE

Unfortunately I'm not familiar enough with pd.resample and pd.TeimGrouper to know the difference in what they can do. resample looks like it would cover my use-cases, although the docs are pretty limited, and don't actually specify the format of the rule format...

One thing that I would like to be able to do that is not covered by resample, and might be covered by TimeGrouper is to group over month only (not month and year), in order to create a plot of mean seasonal cycle (at monthly resolution), or similarly, a daily cycle at hourly resolution. I haven't figured out if I can do that with TimeGrouper yet though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pd.Grouper support? 60303760
77807590 https://github.com/pydata/xarray/issues/364#issuecomment-77807590 https://api.github.com/repos/pydata/xarray/issues/364 MDEyOklzc3VlQ29tbWVudDc3ODA3NTkw naught101 167164 2015-03-09T06:49:55Z 2015-03-09T06:49:55Z NONE

Looks good to me. I don't know enough to be able to comment on the API question.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pd.Grouper support? 60303760

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2802.848ms · About: xarray-datasette