github: issues: 5 rows where state = "open" and user = 7799184 sorted by updated

5 rows where state = "open" and user = 7799184 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	author_association	body	reactions	repo	type
1268630439	I_kwDOAMm_X85LncOn	6688	2D extrapolation not working	rafa-guedes 7799184	open	3	2022-06-12T16:11:04Z	2022-06-14T06:19:20Z	CONTRIBUTOR	What happened? Extrapolation does not seem to be working on 2D data arrays. The area outside the input grid is NaN in the interpolated data when using `kwargs={"fill_value": None}` as arguments to the `interp` function (the extrapolation does work when using `scipy.interpolate.interpn` and passing `fill_value=None` along with `bounds_error=False`). This figure shows the example data arrays from the code snippet provided here: What did you expect to happen? Area outside the input grid filled with extrapolated data. Minimal Complete Verifiable Example ```Python import xarray as xr da = xr.DataArray( data=[[1, 2, 3], [3, 4, 5]], coords=dict(y=[0, 1], x=[10, 20, 30]), dims=("y", "x") ) dai = da.interp(x=[25, 30, 35], y=[0, 1], kwargs={"fill_value": None}) ``` MVCE confirmation [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [ ] Complete example — the example is self-contained, including all data and the text of any traceback. [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [ ] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.7.12 \| packaged by conda-forge \| (default, Oct 26 2021, 06:08:53) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.13.0-1031-gcp machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.20.2 pandas: 1.3.5 numpy: 1.19.5 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: 2.11.3 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.02.0 distributed: None matplotlib: 3.5.2 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: 0.18 sparse: None setuptools: 59.8.0 pip: 22.1.1 conda: 4.12.0 pytest: 7.1.2 IPython: 7.33.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6688/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
595492608	MDU6SXNzdWU1OTU0OTI2MDg=	3942	Time dtype encoding defaulting to `int64` when writing netcdf or zarr	rafa-guedes 7799184	open	8	2020-04-06T23:36:37Z	2021-11-11T12:32:06Z	CONTRIBUTOR	Time `dtype` encoding defaults to `"int64"` for datasets with only zero-hour times when writing to netcdf or zarr. This results in these datasets having a precision constrained by how the time units are defined (in the example below `daily` precision, given units are defined as `'days since ...'`). If we for instance create a zarr dataset using this default encoding option with such datasets, and subsequently append some non-zero times onto it, we loose the hour/minute/sec information from the appended bits. MCVE Code Sample ```python In [1]: ds = xr.DataArray( ...: data=[0.5], ...: coords={"time": [datetime.datetime(2012,1,1)]}, ...: dims=("time",), ...: name="x", ...: ).to_dataset() In [2]: ds Out[2]: <xarray.Dataset> Dimensions: (time: 1) Coordinates: * time (time) datetime64[ns] 2012-01-01 Data variables: x (time) float64 0.5 In [3]: ds.to_zarr("/tmp/x.zarr") In [4]: ds1 = xr.open_zarr("/tmp/x.zarr") In [5]: ds1.time.encoding Out[5]: {'chunks': (1,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'units': 'days since 2012-01-01 00:00:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')} In [6]: dsnew = xr.DataArray( ...: data=[1.5], ...: coords={"time": [datetime.datetime(2012,1,1,3,0,0)]}, ...: dims=("time",), ...: name="x", ...: ).to_dataset() In [7]: dsnew.to_zarr("/tmp/x.zarr", append_dim="time") In [8]: ds1 = xr.open_zarr("/tmp/x.zarr") In [9]: ds1.time.values Out[9]: array(['2012-01-01T00:00:00.000000000', '2012-01-01T00:00:00.000000000'], dtype='datetime64[ns]') ``` Expected Output `In [9]: ds1.time.values Out[9]: array(['2012-01-01T00:00:00.000000000', '2012-01-01T03:00:00.000000000'], dtype='datetime64[ns]')` Problem Description Perhaps it would be useful defaulting time `dtype` to `"float64"`. Another option could be using a finer time resolution by default than that automatically defined from xarray based on the dataset times (for instance, if the units are automatically defined as "days since ...", use "seconds since...". ``` Versions Output of `xr.show_versions()` In [10]: xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.5 (default, Nov 20 2019, 09:21:52) [GCC 9.2.1 20191008] python-bits: 64 OS: Linux OS-release: 5.3.0-45-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_NZ.UTF-8 LOCALE: en_NZ.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.15.0 pandas: 1.0.1 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.8.0 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.1.0 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.3 cfgrib: None iris: None bottleneck: None dask: 2.14.0 distributed: 2.12.0 matplotlib: 3.2.0 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 45.3.0 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.13.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3942/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
517799069	MDU6SXNzdWU1MTc3OTkwNjk=	3486	Should performance be equivalent when opening with chunks or re-chunking a dataset?	rafa-guedes 7799184	open	2	2019-11-05T14:14:58Z	2021-08-31T15:28:04Z	CONTRIBUTOR	I was wondering if the chunking behaviour would be expected to be equivalent under two different use cases: (1) When opening a dataset using the `chunks` option; (2) When re-chunking an existing dataset using `Dataset.chunk` method. I'm interested in performance for slicing across different dimensions. In my case the performance is quite different, please see the example below: Open dataset with one single chunk along `station` dimension (fast for slicing one time) ``` In [1]: import xarray as xr In [2]: dset = xr.open_dataset( ...: "/source/wavespectra/tests/sample_files/spec20170101T00_spec.nc", ...: chunks={"station": None} ...: ) In [3]: dset Out[3]: <xarray.Dataset> Dimensions: (direction: 24, frequency: 25, station: 14048, time: 249) Coordinates: * time (time) datetime64[ns] 2017-01-01 ... 2017-02-01 * station (station) float64 1.0 2.0 3.0 ... 1.405e+04 1.405e+04 * frequency (frequency) float32 0.04118 0.045298003 ... 0.40561208 * direction (direction) float32 90.0 75.0 60.0 45.0 ... 135.0 120.0 105.0 Data variables: longitude (time, station) float32 dask.array<chunksize=(249, 14048), meta=np.ndarray> latitude (time, station) float32 dask.array<chunksize=(249, 14048), meta=np.ndarray> efth (time, station, frequency, direction) float32 dask.array<chunksize=(249, 14048, 25, 24), meta=np.ndarray> In [4]: %time lats = dset.latitude.isel(time=0).values CPU times: user 171 ms, sys: 49.2 ms, total: 220 ms Wall time: 219 ms ``` Open dataset with many size=1 chunks along `station` dimension (fast for slicing one station, slow for slicing one time) ``` In [5]: dset = xr.open_dataset( ...: "/source/wavespectra/tests/sample_files/spec20170101T00_spec.nc", ...: chunks={"station": 1} ...: ) In [6]: dset Out[6]: <xarray.Dataset> Dimensions: (direction: 24, frequency: 25, station: 14048, time: 249) Coordinates: * time (time) datetime64[ns] 2017-01-01 ... 2017-02-01 * station (station) float64 1.0 2.0 3.0 ... 1.405e+04 1.405e+04 * frequency (frequency) float32 0.04118 0.045298003 ... 0.40561208 * direction (direction) float32 90.0 75.0 60.0 45.0 ... 135.0 120.0 105.0 Data variables: longitude (time, station) float32 dask.array<chunksize=(249, 1), meta=np.ndarray> latitude (time, station) float32 dask.array<chunksize=(249, 1), meta=np.ndarray> efth (time, station, frequency, direction) float32 dask.array<chunksize=(249, 1, 25, 24), meta=np.ndarray> In [7]: %time lats = dset.latitude.isel(time=0).values CPU times: user 13.1 s, sys: 1.94 s, total: 15 s Wall time: 11.1 s ``` Try rechunk `station` into one single chunk (still slow to slice one time) ``` In [8]: dset = dset.chunk({"station": None}) In [8]: dset Out[8]: <xarray.Dataset> Dimensions: (direction: 24, frequency: 25, station: 14048, time: 249) Coordinates: * time (time) datetime64[ns] 2017-01-01 ... 2017-02-01 * station (station) float64 1.0 2.0 3.0 ... 1.405e+04 1.405e+04 * frequency (frequency) float32 0.04118 0.045298003 ... 0.40561208 * direction (direction) float32 90.0 75.0 60.0 45.0 ... 135.0 120.0 105.0 Data variables: longitude (time, station) float32 dask.array<chunksize=(249, 14048), meta=np.ndarray> latitude (time, station) float32 dask.array<chunksize=(249, 14048), meta=np.ndarray> efth (time, station, frequency, direction) float32 dask.array<chunksize=(249, 14048, 25, 24), meta=np.ndarray> In [9]: %time lats = dset.latitude.isel(time=0).values CPU times: user 9.06 s, sys: 1.13 s, total: 10.2 s Wall time: 7.7 s ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3486/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
223231729	MDU6SXNzdWUyMjMyMzE3Mjk=	1379	xr.concat consuming too much resources	rafa-guedes 7799184	open	4	2017-04-20T23:33:52Z	2021-07-08T17:42:18Z	CONTRIBUTOR	Hi, I am reading in several (~1000) small ascii files into Dataset objects and trying to concatenate them over one specific dimension but I eventually blow my memory up. The file glob is not huge (~700M, my computer has ~16G) and I can do it fine if I only read in the Datasets appending them to a list without concatenating them (my memory increases by 5% only or so by the time I had read them all). However, when trying to concatenate each file into one single Dataset upon reading over a loop, the processing speeds drastically reduce before I have read 10% of the files or so and my memory usage keeps going up until it eventually blows up before I read and concatenate 30% of these files (the screenshot below was taken before it blew up, the memory usage was under 20% by the start of the processing). I was wondering if this is expected, or if there something that could be improved to make that work more efficiently please. I'm changing my approach now by extracting numpy arrays from the individual Datasets, concatenating these numpy arrays and defining the final Dataset only at the end. Thanks.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1379/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
129630652	MDU6SXNzdWUxMjk2MzA2NTI=	733	coordinate variable not written in netcdf file in some cases	rafa-guedes 7799184	open	5	2016-01-29T00:55:54Z	2020-12-25T16:49:54Z	CONTRIBUTOR	I came across a situation where my coordinate variable was not dumped as a variable in the output netcdf file using `dataset.to_netcdf`. In my case I managed to fix it by simply adding variable attributes to this coordinate variable (which didn't have any). The situation where that happened was while creating a sliced dataset with `dataset.isel_points` method which automatically defines a new coordinate called `points` in the sliced dataset. If I dump that dataset as is, the coordinate isn't written as a variable in the netcdf. adding attributes to `points` however changes that. Here is an example: ``` In [1]: import xarray as xr In [2]: ds = xr.open_dataset('netcdf_file_with_longitude_and_latitude.nc') In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (latitude: 576, longitude: 1152, time: 745) Coordinates: * latitude (latitude) float64 -89.76 -89.45 -89.14 -88.83 -88.52 -88.2 ... * longitude (longitude) float64 0.0 0.3125 0.625 0.9375 1.25 1.562 1.875 ... * time (time) datetime64[ns] 1979-01-01 1979-01-01T01:00:00 ... Data variables: ugrd10m (time, latitude, longitude) float64 0.2094 0.25 0.2799 0.3183 ... vgrd10m (time, latitude, longitude) float64 -5.929 -5.918 -5.918 ... In [4]: ds2 = ds.isel_points(longitude=[0], latitude=[0]).reset_coords() In [5]: ds2 Out[5]: <xarray.Dataset> Dimensions: (points: 1, time: 745) Coordinates: * time (time) datetime64[ns] 1979-01-01 1979-01-01T01:00:00 ... * points (points) int64 0 Data variables: latitude (points) float64 -89.76 vgrd10m (points, time) float64 -5.929 -6.078 -6.04 -5.958 -5.858 ... ugrd10m (points, time) float64 0.2094 0.109 0.008546 -0.09828 -0.2585 ... longitude (points) float64 0.0 In [6]: ds2['points'].attrs Out[6]: OrderedDict() In [7]: ds2.to_netcdf('/home/rafael/ncout1.nc') In [8]: ds2['points'].attrs.update({'standard_name': 'site'}) In [9]: ds2['points'].attrs Out[9]: OrderedDict([('standard_name', 'site')]) In [10]: ds2.to_netcdf('/home/rafael/ncout2.nc') ``` Here is the ncdump output for these two files: $ ncdump -h /home/rafael/ncout1.nc netcdf ncout1 { dimensions: time = 745 ; points = 1 ; variables: double time(time) ; time:_FillValue = 9.999e+20 ; string time:long_name = "verification time generated by wgrib2 function verftime()" ; time:reference_time = 283996800. ; time:reference_time_type = 0 ; string time:reference_date = "1979.01.01 00:00:00 UTC" ; string time:reference_time_description = "kind of product unclear, reference date is variable, min found reference date is given" ; string time:time_step_setting = "auto" ; time:time_step = 3600. ; string time:units = "seconds since 1970-01-01" ; time:calendar = "proleptic_gregorian" ; double latitude(points) ; string latitude:units = "degrees_north" ; string latitude:long_name = "latitude" ; double vgrd10m(points, time) ; string vgrd10m:short_name = "vgrd10m" ; string vgrd10m:long_name = "V-Component of Wind" ; string vgrd10m:level = "10 m above ground" ; string vgrd10m:units = "m/s" ; double ugrd10m(points, time) ; string ugrd10m:short_name = "ugrd10m" ; string ugrd10m:long_name = "U-Component of Wind" ; string ugrd10m:level = "10 m above ground" ; string ugrd10m:units = "m/s" ; double longitude(points) ; string longitude:units = "degrees_east" ; string longitude:long_name = "longitude" ; } $ ncdump -h /home/rafael/ncout2.nc netcdf ncout2 { dimensions: time = 745 ; points = 1 ; variables: double time(time) ; time:_FillValue = 9.999e+20 ; string time:long_name = "verification time generated by wgrib2 function verftime()" ; time:reference_time = 283996800. ; time:reference_time_type = 0 ; string time:reference_date = "1979.01.01 00:00:00 UTC" ; string time:reference_time_description = "kind of product unclear, reference date is variable, min found reference date is given" ; string time:time_step_setting = "auto" ; time:time_step = 3600. ; string time:units = "seconds since 1970-01-01" ; time:calendar = "proleptic_gregorian" ; double latitude(points) ; string latitude:units = "degrees_north" ; string latitude:long_name = "latitude" ; double vgrd10m(points, time) ; string vgrd10m:short_name = "vgrd10m" ; string vgrd10m:long_name = "V-Component of Wind" ; string vgrd10m:level = "10 m above ground" ; string vgrd10m:units = "m/s" ; double ugrd10m(points, time) ; string ugrd10m:short_name = "ugrd10m" ; string ugrd10m:long_name = "U-Component of Wind" ; string ugrd10m:level = "10 m above ground" ; string ugrd10m:units = "m/s" ; double longitude(points) ; string longitude:units = "degrees_east" ; string longitude:long_name = "longitude" ; int64 points(points) ; points:standard_name = "site" ; }	{ "url": "https://api.github.com/repos/pydata/xarray/issues/733/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

5 rows where state = "open" and user = 7799184 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

MCVE Code Sample

Expected Output

Problem Description

Versions

Open dataset with one single chunk along `station` dimension (fast for slicing one time)

Open dataset with many size=1 chunks along `station` dimension (fast for slicing one station, slow for slicing one time)

Try rechunk `station` into one single chunk (still slow to slice one time)

Advanced export

issues

5 rows where state = "open" and user = 7799184 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

MCVE Code Sample

Expected Output

Problem Description

Versions

Open dataset with one single chunk along station dimension (fast for slicing one time)

Open dataset with many size=1 chunks along station dimension (fast for slicing one station, slow for slicing one time)

Try rechunk station into one single chunk (still slow to slice one time)

Advanced export

Open dataset with one single chunk along `station` dimension (fast for slicing one time)

Open dataset with many size=1 chunks along `station` dimension (fast for slicing one station, slow for slicing one time)

Try rechunk `station` into one single chunk (still slow to slice one time)