issue_comments
13 rows where author_association = "CONTRIBUTOR" and user = 1492047 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: issue_url, created_at (date), updated_at (date)
user 1
- Thomas-Z · 13 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1532601237 | https://github.com/pydata/xarray/issues/7516#issuecomment-1532601237 | https://api.github.com/repos/pydata/xarray/issues/7516 | IC_kwDOAMm_X85bWaOV | Thomas-Z 1492047 | 2023-05-03T07:58:22Z | 2023-05-03T07:58:22Z | CONTRIBUTOR | Hello, I'm not sure performances problematics were fully addressed (we're now forced to fully compute/load the selection expression) but changes made in the last versions makes this issue irrelevant and I think we can close it. Thank you! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset.where performances regression. 1575938277 | |
1451754167 | https://github.com/pydata/xarray/issues/7516#issuecomment-1451754167 | https://api.github.com/repos/pydata/xarray/issues/7516 | IC_kwDOAMm_X85WiAK3 | Thomas-Z 1492047 | 2023-03-02T11:59:47Z | 2023-03-02T11:59:47Z | CONTRIBUTOR | The TypeError: cond argument is <xarray.Variable (num_lines: 5761870, num_pixels: 71)> ... but must be a <class 'xarray.core.dataset.Dataset'> or <class 'xarray.core.dataarray.DataArray'> ``` Doing it like this seems to be working correctly (and is fast enough):
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset.where performances regression. 1575938277 | |
1449714522 | https://github.com/pydata/xarray/issues/7516#issuecomment-1449714522 | https://api.github.com/repos/pydata/xarray/issues/7516 | IC_kwDOAMm_X85WaONa | Thomas-Z 1492047 | 2023-03-01T09:43:27Z | 2023-03-01T09:43:27Z | CONTRIBUTOR |
I know xarray has to keep more information regarding coordinates and dimensions but doing this (just dask arrays) :
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset.where performances regression. 1575938277 | |
1447798846 | https://github.com/pydata/xarray/issues/7516#issuecomment-1447798846 | https://api.github.com/repos/pydata/xarray/issues/7516 | IC_kwDOAMm_X85WS6g- | Thomas-Z 1492047 | 2023-02-28T08:54:16Z | 2023-02-28T11:24:11Z | CONTRIBUTOR | Just tried it and it does not seem identical at all to what was happening earlier. This is the kind of dataset I'm working
With this selection:
Old xarray takes a little less that 1 minute and less than 6GB of memory. New xarray with compute did not finish and had to be stopped before consuming my 16GB of memory. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset.where performances regression. 1575938277 | |
863271758 | https://github.com/pydata/xarray/issues/1329#issuecomment-863271758 | https://api.github.com/repos/pydata/xarray/issues/1329 | MDEyOklzc3VlQ29tbWVudDg2MzI3MTc1OA== | Thomas-Z 1492047 | 2021-06-17T14:08:54Z | 2021-06-17T14:11:47Z | CONTRIBUTOR | Hello, Using the same code sample: ``` import numpy import xarray ds = xarray.Dataset( {"a": ("x", [])}, coords={"x": numpy.zeros(shape=0, dtype="M8[ns]")}) ds.to_netcdf("/tmp/test.nc") xarray.open_dataset("/tmp/test.nc") ``` It works on xarray 0.17 but does not work anymore with xarray 0.18 & 0.18.2. This addition seems to be responsible (coming from this commit). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Cannot open NetCDF file if dimension with time coordinate has length 0 (`ValueError` when decoding CF datetime) 217216935 | |
674361521 | https://github.com/pydata/xarray/pull/4333#issuecomment-674361521 | https://api.github.com/repos/pydata/xarray/issues/4333 | MDEyOklzc3VlQ29tbWVudDY3NDM2MTUyMQ== | Thomas-Z 1492047 | 2020-08-15T07:20:53Z | 2020-08-15T07:20:53Z | CONTRIBUTOR | My pleasure. I've been a user for a few years now, I'll gladly give something back whenever I can. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Support explicitly setting a dimension order with to_dataframe() 676696822 | |
672084936 | https://github.com/pydata/xarray/pull/4333#issuecomment-672084936 | https://api.github.com/repos/pydata/xarray/issues/4333 | MDEyOklzc3VlQ29tbWVudDY3MjA4NDkzNg== | Thomas-Z 1492047 | 2020-08-11T16:49:19Z | 2020-08-11T16:49:19Z | CONTRIBUTOR | Do we want DataArray.to_dataframe to be consistent with Dataset.to_dataframe regarding the default dimension ordering (i.e. alphabetically) or do we want to keep the current behavior (DataArray.dims order)? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Support explicitly setting a dimension order with to_dataframe() 676696822 | |
672030936 | https://github.com/pydata/xarray/pull/4333#issuecomment-672030936 | https://api.github.com/repos/pydata/xarray/issues/4333 | MDEyOklzc3VlQ29tbWVudDY3MjAzMDkzNg== | Thomas-Z 1492047 | 2020-08-11T15:51:02Z | 2020-08-11T15:54:40Z | CONTRIBUTOR | Hello, I actually followed @shoyer suggestion to use to_dask_dataframe parameter name. And I just realized I only did half the work. I'll add this parameter to DataArray.to_dataframe if you validate this name. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Support explicitly setting a dimension order with to_dataframe() 676696822 | |
410792506 | https://github.com/pydata/xarray/issues/2304#issuecomment-410792506 | https://api.github.com/repos/pydata/xarray/issues/2304 | MDEyOklzc3VlQ29tbWVudDQxMDc5MjUwNg== | Thomas-Z 1492047 | 2018-08-06T17:47:23Z | 2019-01-09T15:18:36Z | CONTRIBUTOR | To explain the full context and why it became some kind of a problem to us : We're experimenting with the parquet format (via pyarrow) and we first did something like : netcdf file -> netcdf4 -> pandas -> pyarrow -> pandas (when read later on). We're now looking at xarray and the huge ease of access it offers to netcdf like data and we tried something similar : netcdf file -> xarray -> pandas -> pyarrow -> pandas (when read later on). Our problem appears when we're reading and comparing the data stored with these 2 approches. The difference between the 2 was - sometimes - larger than what expected/acceptable (10e-6 for float32 if I'm not mistaken). We're not constraining any type and letting the system and modules decide how to encode what and in the end we have significantly different values. There might be something wrong in our process but it originate here with this float32/float64 choice so we thought it might be a problem. Thanks for taking the time to look into this. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 | |
432546977 | https://github.com/pydata/xarray/issues/2501#issuecomment-432546977 | https://api.github.com/repos/pydata/xarray/issues/2501 | MDEyOklzc3VlQ29tbWVudDQzMjU0Njk3Nw== | Thomas-Z 1492047 | 2018-10-24T07:38:31Z | 2018-10-24T07:38:31Z | CONTRIBUTOR | Thank you for looking into this. I just want to point out that I'm not that much concerned with the "slow performance" but much more with the memory consumption and the limitation it implies. ```python from glob import glob import xarray as xr all_files = glob('...TP110.nc') display(xr.open_dataset(all_files[0])) display(xr.open_dataset(all_files[1])) ```
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset usage and limitations. 372848074 | |
411385081 | https://github.com/pydata/xarray/issues/2304#issuecomment-411385081 | https://api.github.com/repos/pydata/xarray/issues/2304 | MDEyOklzc3VlQ29tbWVudDQxMTM4NTA4MQ== | Thomas-Z 1492047 | 2018-08-08T12:18:02Z | 2018-08-22T07:14:58Z | CONTRIBUTOR | So, a more complete example showing this problem. NetCDF file used in the example : test.nc.zip ````python from netCDF4 import Dataset import xarray as xr import numpy as np import pandas as pd d = Dataset("test.nc") v = d.variables['var'] print(v) <class 'netCDF4._netCDF4.Variable'>int16 var(idx)_FillValue: 32767scale_factor: 0.01unlimited dimensions:current shape = (2,)filling ondf_nc = pd.DataFrame(data={'var': v[:]}) print(df_nc) var0 21.941 27.04ds = xr.open_dataset("test.nc") df_xr = ds['var'].to_dataframe() Comparing both dataframes with float32 precision (1e-6)mask = np.isclose(df_nc['var'], df_xr['var'], rtol=0, atol=1e-6) print(mask) [False True]print(df_xr) varidx0 21.9399991 27.039999Changing the type and rounding the xarray dataframedf_xr2 = df_xr.astype(np.float64).round(int(np.ceil(-np.log10(ds['var'].encoding['scale_factor'])))) mask = np.isclose(df_nc['var'], df_xr2['var'], rtol=0, atol=1e-6) print(mask) [ True True]print(df_xr2) varidx0 21.941 27.04```` As you can see, the problem appears early in the process (not related to the way data are stored in parquet later on) and yes, rounding values does solve it. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 | |
410963647 | https://github.com/pydata/xarray/issues/2346#issuecomment-410963647 | https://api.github.com/repos/pydata/xarray/issues/2346 | MDEyOklzc3VlQ29tbWVudDQxMDk2MzY0Nw== | Thomas-Z 1492047 | 2018-08-07T07:37:06Z | 2018-08-07T07:37:06Z | CONTRIBUTOR | I was kind of expecting to get the order shown when looking at the Two things are still bothering me though:
- For the first point I don't think anything should be done, it's a special case and even if it could be easily tested it might be ugly.
For the second point I would not change anything to the way the order is defined now, it's consistent and easily predictable.
Instead I would add an additional optional parameter to |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset/DataArray to_dataframe() dimensions order mismatch. 347895055 | |
410675562 | https://github.com/pydata/xarray/issues/2304#issuecomment-410675562 | https://api.github.com/repos/pydata/xarray/issues/2304 | MDEyOklzc3VlQ29tbWVudDQxMDY3NTU2Mg== | Thomas-Z 1492047 | 2018-08-06T11:19:30Z | 2018-08-06T11:19:30Z | CONTRIBUTOR | You're right when you say
You'll have a float64 in the end but you won't get your precision back and it might be a problem in some case. I understand the benefits of using float32 on the memory side but it is kind of a problem for us each time we have variables using scale factors. I'm surprised this issue (if considered as one) does not bother more people. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 343659822 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
issue 6