issues
2 rows where user = 35689176 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2004250796 | I_kwDOAMm_X853dnCs | 8473 | Regular (linspace) Coordinates/Index | JulienBrn 35689176 | open | 0 | 9 | 2023-11-21T13:08:08Z | 2024-04-18T22:11:39Z | NONE | Is your feature request related to a problem?Most of my dimension coordinates fall into three categories:
- Categorical coordinates
- Pandas multiindex
- Regular coordinates, that is of the form I feel the way the latter is currently handled in xarray is suboptimal (unless I'm misusing this great library) as it has the following drawbacks:
- Visually: It is not obvious that the coordinate is a linear space: when printing the dataset/array we see some of the values.
- Computation Usage: applying scipy functions that require a regular sampling (for example scipy spectrogram is very annoying as one has to extract the fs and check that the coordinate is indeed regularly sampled. I currently use Note: It is not obvious for me from the documentation whether this is more of a "coordinate" enhancement or an "index" enhancement (index being to my knowledge discussed only in this part of the documentation ). Describe the solution you'd likeA new type of index/coordinate where only the "start" and "fs" are stored. The _repr_inline may look like "RegularIndex(start, end, step=1/fs)".
Perhaps another more generic possibility would be a type of coordinate system that is expressed as a transform from Describe alternatives you've consideredI have tried writing an Index subclass but I struggle on the There may be many other problems as I have just quickly tried. Should this be a viable approach I may be open to writing a version myself and post it for review. However, I am relatively new to xarray and I would appreciate to first know if I am on the right track. Additional contextNo response |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8473/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
2109989289 | I_kwDOAMm_X859w-Gp | 8687 | Add a filter option to stack | JulienBrn 35689176 | open | 0 | 1 | 2024-01-31T12:28:15Z | 2024-01-31T18:15:43Z | NONE | Is your feature request related to a problem?I currently have a dataset where one of my dimensions (let's call it x) is of size 10^5. Later in my analysis, I want to consider pairs of values of that dimension but not all of them. Note that considering all of them would lead to 10^10 entries (not viable memory usage), when in practice I only want to consider around 10^6 of them. Therefore, the final dataset should have a dimension x_pair which is the stacking of dimensions x_1 and x_2. However, it seems I have no straightforward way of using stack for that purpose: whatever I do it will create a 10^8 array that I can then filter it using where(drop=True). Should this problem be unclear, I could provide a minimal example, but hopefully the explanation of the issue is enough (and my current code is provided as additional context). Describe the solution you'd likeHave a filter parameter to stack. The filter function should take a dataset and return the set of elements that should appear in the final multiindex. Describe alternatives you've consideredCurrently, I have solved my problem by dividing the dataset into many smaller datasets, stacking and filtering each of these datasets separately and then merging the filtered datasets together. Note: the stacking time without any parallelization of all the smaller datasets still feels very along (almost 2h). I do not know whether this is sensible. Additional contextCurrently, my code looks like the following and I have three initial dimensions to my dataset, Contact, sig_preprocessing, f. Both Contact and sig_preprocessing should be transformed into pairs. ```python signal_pairs = xr.merge([ signals.rename({x:f"{x}_1" for x in signals.coords if not x=="f"}, {x:f"{x}_1" for x in signals.data_vars}), signals.rename({x:f"{x}_2" for x in signals.coords if not x=="f"}, {x:f"{x}_2" for x in signals.data_vars}) ]) def stack_dataset(dataset): dataset=dataset.copy() dataset["common_duration"] = xr.where(dataset["start_time_1"] > dataset["start_time_2"], xr.where(dataset["end_time_1"] > dataset["end_time_2"], dataset["end_time_2"]- dataset["start_time_1"], dataset["end_time_1"]- dataset["start_time_1"] ), xr.where(dataset["end_time_1"] > dataset["end_time_2"], dataset["end_time_2"]- dataset["start_time_2"], dataset["end_time_1"]- dataset["start_time_2"] ) ) dataset["relevant_pair"] = ( (dataset["Session_1"] == dataset["Session_2"]) & (dataset["Contact_1"] != dataset["Contact_2"]) & (dataset["Structure_1"] == dataset["Structure_2"]) & (dataset["sig_type_1"] =="bua") & (dataset["sig_type_2"] =="spike_times") & (~dataset["resampled_continuous_path_1"].isnull()) & (~dataset["resampled_continuous_path_2"].isnull()) & (dataset["common_duration"] >10) ) dataset=dataset.stack(sig_preprocessing_pair=("sig_preprocessing_1","sig_preprocessing_2"), Contact_pair=("Contact_1", "Contact_2")) dataset = dataset.where(dataset["relevant_pair"].any("sig_preprocessing_pair"), drop=True) dataset = dataset.where(dataset["relevant_pair"].any("Contact_pair"), drop=True) return dataset stack_size = 100 signal_pairs_split = [signal_pairs.isel(dict(Contact_1=slice(stack_sizei, stack_size(i+1)), Contact_2=slice(stack_sizej, stack_size(j+1)))) for i in range(int(np.ceil(signal_pairs.sizes["Contact_1"]/stack_size))) for j in range(int(np.ceil(signal_pairs.sizes["Contact_2"]/stack_size))) ] import concurrent.futures with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor: futures = [executor.submit(stack_dataset, dataset) for dataset in signal_pairs_split] signal_pairs_split_stacked = [future.result() for future in tqdm.tqdm(concurrent.futures.as_completed(futures), total=len(futures), desc="Stacking")] signal_pairs = xr.merge(signal_pairs_split_stacked) ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8687/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);