issues
1 row where user = 9569132 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date), closed_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1286995366 | I_kwDOAMm_X85Mtf2m | 6733 | CFMaskCoder creates unnecessary copy for `uint16` variables | davidorme 9569132 | closed | 0 | 11 | 2022-06-28T08:38:34Z | 2023-09-13T12:43:26Z | 2023-09-13T12:43:25Z | NONE | What is your issue?Hi, I have a bunch of global gridded data as a 20 year sequence of daily Matlab files that I am consolidating into annual netcdfs using https://docs.xarray.dev/en/stable/user-guide/io.html#scaling-and-type-conversions The problem I then get is that the memory usage spikes unpredictably. I've been using ```python def report_mem(process, prefix=''):
``` The actual data ingestion and creation of the DataArray seems to be absolutely fine. With the DataArray created, the overall process memory of 35.69GB as expected.
The next bit of the script is then: ```python if pack: encoding = { canonical_name: { "zlib": True, "complevel": 6, "dtype": "uint16", "scale_factor": scale_factor, "_FillValue": 65535, } } else: encoding = {canonical_name: {"zlib": True, "complevel": 6}} xds.to_netcdf(out_file, encoding=encoding) ``` When The list below shows the job reported memory usage for one test run of the script over 19 years. The first number in the list below is the peak RAM usage in GB. I'm running these on an HPC cluster and anything over 96GB gets killed, so only a handful of these are actually completing, so it could be that the memory requirements might run even higher, but are getting killed. Another odd thing is that the files that complete are unpredictable - the memory usage is not stable for a particular year.
The files that do run end up with exactly the expected structure:
Any suggestions? It sounds like this should work! Python version and package versions``` (python3.10) $ python --version Python 3.10.4 (python3.10) $ pip list Package Version Bottleneck 1.3.4 certifi 2022.6.15 cftime 1.5.1.1 mkl-fft 1.3.1 mkl-random 1.2.2 mkl-service 2.4.0 netCDF4 1.5.7 numexpr 2.8.1 numpy 1.22.3 packaging 21.3 pandas 1.4.2 pip 21.2.4 psutil 5.8.0 pyparsing 3.0.4 python-dateutil 2.8.2 pytz 2022.1 scipy 1.7.3 setuptools 61.2.0 six 1.16.0 typing_extensions 4.1.1 wheel 0.37.1 xarray 0.20.1 (python3.10) [dorme@login-c SNU_Ryu_FPAR_LAI]$ python Python 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] on linux Type "help", "copyright", "credits" or "license" for more information.
INSTALLED VERSIONScommit: None python: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.18.0-348.20.1.el8_5.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.6.1 xarray: 0.20.1 pandas: 1.4.2 numpy: 1.22.3 scipy: 1.7.3 netCDF4: 1.5.7 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.4 dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 61.2.0 pip: 21.2.4 conda: None pytest: None IPython: None sphinx: None ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6733/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);