github: issues: 1 row where repo = 13221727, state = "open" and user = 4753005 sorted by updated

1 row where repo = 13221727, state = "open" and user = 4753005 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1694671281	I_kwDOAMm_X85lAqGx	7812	Appending to existing zarr store writes mostly NaN from dask arrays, but not numpy arrays	grahamfindlay 4753005	open	0			1	2023-05-03T19:30:13Z	2023-11-15T18:56:09Z		NONE				What is your issue? I am using `xarray` to consolidate ~24 pre-existing, moderately large netCDF files into a single zarr store. Each file contains a `DataArray` with dimensions `(channel, time)`, and no values are `nan`. Each file's timeseries picks up right where the previous one's left off, making this a perfect use case for out-of-memory file concatenation. `for i, f in enumerate(tqdm(files)): da = xr.open_dataarray(f) # Open the netCDF file da = da.chunk({'channel': da.channel.size, 'time': 'auto'}) # Chunk along the time dimension if i == 0: da.to_zarr(zarr_file, mode="w") else: da.to_zarr(zarr_file, append_dim='time') da.close()` This always writes the first file correctly, and every other file appends without warning or error, but when I read the resulting zarr store, ~25% of all timepoints (probably, time chunks) derived from files `i > 0` are `nan`. Admittedly, the above code seems dangerous, since there is no guarantee that `da.chunk({'time': 'auto'})` will always return chunks of the same size, even though the files are nearly identical in size, and I don't know what the expected behavior is if the dask chunksizes don't match the chunksizes of the pre-existing zarr store. I checked the docs but didn't find the answer. Even if the chunksizes always do match, I am not sure what will happen when appending to an existing store. If the last chunk in the store before appending is not a full chunk, will it be "filled in" when new data are appended to the store? Presumably, but this seems like it could cause problems with parallel writing, since the source chunks from a dask array almost certainly won't line up with the new chunks in the zarr store, unless you've been careful to make it so. In any case, the following change seems to solve the issue, and the zarr store no longer contains `nan`. `for i, f in enumerate(tqdm(files)): da = xr.open_dataarray(f) # Open the netCDF file if i == 0: da = da.chunk({'channel': da.channel.size, 'time': 'auto'}) # Chunk along the time dimension da.to_zarr(zarr_file, mode="w") else: da.to_zarr(zarr_file, append_dim='time') da.close()` I didn't file this as a bug, because I was doing something that was a bad idea, but it does seem like `to_zarr` should have stopped me from doing it in the first place.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7812/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

1 row where repo = 13221727, state = "open" and user = 4753005 sorted by updated_at descending

What is your issue?

Advanced export