html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2912#issuecomment-542369777,https://api.github.com/repos/pydata/xarray/issues/2912,542369777,MDEyOklzc3VlQ29tbWVudDU0MjM2OTc3Nw==,668201,2019-10-15T19:32:50Z,2019-10-15T19:32:50Z,NONE,"Thanks for the explanations @jhamman and @shoyer :)
Actually it turns out that I was not using particularly small chunks, but the filesystem for /tmp was faulty... After trying on a reliable filesystem, the results are much more reasonable.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,435535284
https://github.com/pydata/xarray/issues/2912#issuecomment-533801682,https://api.github.com/repos/pydata/xarray/issues/2912,533801682,MDEyOklzc3VlQ29tbWVudDUzMzgwMTY4Mg==,668201,2019-09-21T14:21:17Z,2019-09-21T14:21:17Z,NONE,"> There are ways to side step some of these challenges (`save_mfdataset` and the distributed dask scheduler)
@jhamman Could you elaborate on these ways ?
I am having severe slow-downs when writing Datasets by blocks (backed by dask). I have also noticed that the slowdowns do not occur when writing to ramdisk. Here are the timings of `to_netcdf`, which uses default engine and encoding (the nc file is 4.3 GB) :
- When writing to ramdisk (`/dev/shm/`) : 2min 1s
- When writing to `/tmp/` : 27min 28s
- When writing to `/tmp/` after `.load()`, as suggested here : 34s (`.load` takes 1min 43s)
The workaround suggested here works, but the datasets may not always fit in memory, and it fails the essential purpose of dask...
Note: I am using dask 2.3.0 and xarray 0.12.3","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,435535284
https://github.com/pydata/xarray/issues/1378#issuecomment-295657656,https://api.github.com/repos/pydata/xarray/issues/1378,295657656,MDEyOklzc3VlQ29tbWVudDI5NTY1NzY1Ng==,668201,2017-04-20T09:50:19Z,2017-04-20T09:53:33Z,NONE,"I cannot see a use case in which repeated dims actually make sense.
In my case this situation originates from h5 files which indeed contains repeated dimensions (`variables(dimensions): uint16 B0(phony_dim_0,phony_dim_0), ..., uint8 VAA(phony_dim_1,phony_dim_1)`), thus xarray is not to blame here.
These are ""dummy"" dimensions, not associated with physical values. What we do to circumvent this problem is ""re-dimension"" all variables.
Maybe a safe approach would be for open_dataset to raise a warning by default when encountering such variables, with possibly an option to perform automatic or custom dimension naming to avoid repeated dims.
I also agree with @shoyer that failing loudly when operating on such DataArrays instead of providing confusing results would be an improvement.","{""total_count"": 5, ""+1"": 1, ""-1"": 4, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,222676855
https://github.com/pydata/xarray/issues/1378#issuecomment-295593740,https://api.github.com/repos/pydata/xarray/issues/1378,295593740,MDEyOklzc3VlQ29tbWVudDI5NTU5Mzc0MA==,668201,2017-04-20T06:11:02Z,2017-04-20T06:11:02Z,NONE,"Right, also positional indexing works unexpectedly in this case, though I understand it's tricky and should probably be discouraged:
```python
A[0,:] # returns A
A[:,0] # returns A.isel(dim0=0)
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,222676855