issue_comments
4 rows where issue = 412180435 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Automatic dtype encoding in to_netcdf · 4 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1112552981 | https://github.com/pydata/xarray/issues/2780#issuecomment-1112552981 | https://api.github.com/repos/pydata/xarray/issues/2780 | IC_kwDOAMm_X85CUDYV | DWesl 22566757 | 2022-04-28T18:57:26Z | 2022-04-28T19:01:34Z | CONTRIBUTOR | I found a way to get the sample dataset to save to a smaller netCDF: ```python import os import numpy as np import numpy.testing as np_tst import pandas as pd import xarray as xr Original exampleCreate pandas DataFramedf = pd.DataFrame( np.random.randint(low=0, high=10, size=(100000, 5)), columns=["a", "b", "c", "d", "e"], ) Make 'e' a column of stringsdf["e"] = df["e"].astype(str) Make 'f' a column of floatsDIGITS = 1 df["f"] = np.around(10 ** DIGITS * np.random.random(size=df.shape[0]), DIGITS) Save to csvdf.to_csv("df.csv") Convert to an xarray's Datasetds = xr.Dataset.from_dataframe(df) Save NetCDF fileds.to_netcdf("ds.nc") Additionsdef dtype_for_int_array(arry: "array of integers") -> np.dtype: """Find the smallest integer dtype that will encode arry.
def dtype_for_str_array( arry: "xr.DataArray of strings", for_disk: bool = True ) -> np.dtype: """Find a good string dtype for encoding arry.
Set up encoding for saving to netCDFencoding = {} for name, var in ds.items(): encoding[name] = {}
ds.to_netcdf("ds_encoded.nc", encoding=encoding) Display resultsstat_csv = os.stat("df.csv") stat_nc = os.stat("ds.nc") stat_enc = os.stat("ds_encoded.nc") sizes = pd.Series( index=["CSV", "default netCDF", "encoded netCDF"], data=[stats.st_size for stats in [stat_csv, stat_nc, stat_enc]], name="File sizes", ) print("File sizes (kB):", np.right_shift(sizes, 10), sep="\n", end="\n\n") print("Sizes relative to CSV:", sizes / sizes.iloc[0], sep="\n", end="\n\n") Check that I didn't break the floatsfrom_disk = xr.open_dataset("ds_encoded.nc")
np_tst.assert_allclose(ds["f"], from_disk["f"], rtol=10-DIGITS, atol=10-DIGITS)
Sizes relative to CSV: CSV 1.000000 default netCDF 5.230366 encoded netCDF 0.708063 Name: File sizes, dtype: float64 10M ds.nc 1.9M df.csv 1.4M ds_encoded.nc ``` I added a column of floats with one digit before and after the decimal point to the example dataset, because why not. Does this satisfy your use-case? Should I turn the giant loop into a function to go into xarray somewhere? If so, I should probably tie the float handling in with the new |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic dtype encoding in to_netcdf 412180435 | |
1112215165 | https://github.com/pydata/xarray/issues/2780#issuecomment-1112215165 | https://api.github.com/repos/pydata/xarray/issues/2780 | IC_kwDOAMm_X85CSw59 | stale[bot] 26384082 | 2022-04-28T13:37:57Z | 2022-04-28T13:37:57Z | NONE | In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic dtype encoding in to_netcdf 412180435 | |
633296515 | https://github.com/pydata/xarray/issues/2780#issuecomment-633296515 | https://api.github.com/repos/pydata/xarray/issues/2780 | MDEyOklzc3VlQ29tbWVudDYzMzI5NjUxNQ== | DWesl 22566757 | 2020-05-24T20:45:43Z | 2020-05-24T20:45:43Z | CONTRIBUTOR | For the example given, this would mean finding For the character/string variables, the smallest representation varies a bit more: a fixed-width encoding ( Doing this correctly for floating-point types would be difficult, but I think that's outside the scope of this issue. Hopefully this gives you something to work with. ```python import numpy as np def dtype_for_int_array(arry: "array of integers") -> np.dtype: """Find the smallest integer dtype that will encode arry.
``` Looking at
It looks like pandas always uses object dtype for string arrays, so the numbers in that column likely reflect the size of an array of pointers. XArray lets you use a dtype of "S1" or "U1", but I haven't found the equivalent of the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic dtype encoding in to_netcdf 412180435 | |
465362210 | https://github.com/pydata/xarray/issues/2780#issuecomment-465362210 | https://api.github.com/repos/pydata/xarray/issues/2780 | MDEyOklzc3VlQ29tbWVudDQ2NTM2MjIxMA== | nedclimaterisk 43126798 | 2019-02-20T00:00:41Z | 2019-02-20T00:00:41Z | CONTRIBUTOR | { "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Automatic dtype encoding in to_netcdf 412180435 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 3