github: issues: 2 rows where user = 35689176 sorted by updated

2 rows where user = 35689176 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
2004250796	I_kwDOAMm_X853dnCs	8473	Regular (linspace) Coordinates/Index	JulienBrn 35689176	open	0			9	2023-11-21T13:08:08Z	2024-04-18T22:11:39Z		NONE				Is your feature request related to a problem? Most of my dimension coordinates fall into three categories: - Categorical coordinates - Pandas multiindex - Regular coordinates, that is of the form `start + np.arange(n)/fs` for some start, fs I feel the way the latter is currently handled in xarray is suboptimal (unless I'm misusing this great library) as it has the following drawbacks: - Visually: It is not obvious that the coordinate is a linear space: when printing the dataset/array we see some of the values. - Computation Usage: applying scipy functions that require a regular sampling (for example scipy spectrogram is very annoying as one has to extract the fs and check that the coordinate is indeed regularly sampled. I currently use `step=np.diff(a)[0], assert (np.abs(np.diff(a)-step))<epsilon).all(), fs=1/step` - Rounding errors: sometimes one gets rounding errors in the values for the coordinate - Memory/Disk performance: when storing a dataset with few arrays, the storing of the coordinate values does take up some non negligible space (I have an example where one of my raw data is a one dimensional time array of 3gb and I like adding a coordinate system as soon as possible, thus doubling its size) - Speed: I would expect joins/alignment/rolling/... to be very fast on such coordinates Note: It is not obvious for me from the documentation whether this is more of a "coordinate" enhancement or an "index" enhancement (index being to my knowledge discussed only in this part of the documentation ). Describe the solution you'd like A new type of index/coordinate where only the "start" and "fs" are stored. The _repr_inline may look like "RegularIndex(start, end, step=1/fs)". Perhaps another more generic possibility would be a type of coordinate system that is expressed as a transform from`np.arange(s, e)` by the bijective function f (with the inverse of f also provided). `RegularIndex(start, end, fs)` would then be an instance with`f = lambda x: x/fs, inv(f) = lambda y: yfs, s=round(startfs), e = round(endfs)+1` The advantage of this approach is that joins/alignment/selection/... could be handled generically on the `np.arange(s, e)` and this would also work on non linear spaces (for example log spaces) Describe alternatives you've considered I have tried writing an Index subclass but I struggle on the `create_variables` method. If I do not return a coordinate for the current dimension, then `a.set_xindex(["t"], RegularIndex)` keeps the previous coordinates and if I do, then I need to provide a Variable from the np.array that I do not want to create (for memory efficiency). I have tried to drop the coordinate after setting my custom index, but that seems to remove the index as well... There may be many other problems as I have just quickly tried. Should this be a viable approach I may be open to writing a version myself and post it for review. However, I am relatively new to xarray and I would appreciate to first know if I am on the right track. Additional context No response*	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8473/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			xarray 13221727	issue
2109989289	I_kwDOAMm_X859w-Gp	8687	Add a filter option to stack	JulienBrn 35689176	open	0			1	2024-01-31T12:28:15Z	2024-01-31T18:15:43Z		NONE				Is your feature request related to a problem? I currently have a dataset where one of my dimensions (let's call it x) is of size 10^5. Later in my analysis, I want to consider pairs of values of that dimension but not all of them. Note that considering all of them would lead to 10^10 entries (not viable memory usage), when in practice I only want to consider around 10^6 of them. Therefore, the final dataset should have a dimension x_pair which is the stacking of dimensions x_1 and x_2. However, it seems I have no straightforward way of using stack for that purpose: whatever I do it will create a 10^8 array that I can then filter it using where(drop=True). Should this problem be unclear, I could provide a minimal example, but hopefully the explanation of the issue is enough (and my current code is provided as additional context). Describe the solution you'd like Have a filter parameter to stack. The filter function should take a dataset and return the set of elements that should appear in the final multiindex. Describe alternatives you've considered Currently, I have solved my problem by dividing the dataset into many smaller datasets, stacking and filtering each of these datasets separately and then merging the filtered datasets together. Note: the stacking time without any parallelization of all the smaller datasets still feels very along (almost 2h). I do not know whether this is sensible. Additional context Currently, my code looks like the following and I have three initial dimensions to my dataset, Contact, sig_preprocessing, f. Both Contact and sig_preprocessing should be transformed into pairs. ```python signal_pairs = xr.merge([ signals.rename({x:f"{x}_1" for x in signals.coords if not x=="f"}, {x:f"{x}_1" for x in signals.data_vars}), signals.rename({x:f"{x}_2" for x in signals.coords if not x=="f"}, {x:f"{x}_2" for x in signals.data_vars}) ]) def stack_dataset(dataset): dataset=dataset.copy() dataset["common_duration"] = xr.where(dataset["start_time_1"] > dataset["start_time_2"], xr.where(dataset["end_time_1"] > dataset["end_time_2"], dataset["end_time_2"]- dataset["start_time_1"], dataset["end_time_1"]- dataset["start_time_1"] ), xr.where(dataset["end_time_1"] > dataset["end_time_2"], dataset["end_time_2"]- dataset["start_time_2"], dataset["end_time_1"]- dataset["start_time_2"] ) ) dataset["relevant_pair"] = ( (dataset["Session_1"] == dataset["Session_2"]) & (dataset["Contact_1"] != dataset["Contact_2"]) & (dataset["Structure_1"] == dataset["Structure_2"]) & (dataset["sig_type_1"] =="bua") & (dataset["sig_type_2"] =="spike_times") & (~dataset["resampled_continuous_path_1"].isnull()) & (~dataset["resampled_continuous_path_2"].isnull()) & (dataset["common_duration"] >10) ) dataset=dataset.stack(sig_preprocessing_pair=("sig_preprocessing_1","sig_preprocessing_2"), Contact_pair=("Contact_1", "Contact_2")) dataset = dataset.where(dataset["relevant_pair"].any("sig_preprocessing_pair"), drop=True) dataset = dataset.where(dataset["relevant_pair"].any("Contact_pair"), drop=True) return dataset stack_size = 100 signal_pairs_split = [signal_pairs.isel(dict(Contact_1=slice(stack_sizei, stack_size(i+1)), Contact_2=slice(stack_sizej, stack_size(j+1)))) for i in range(int(np.ceil(signal_pairs.sizes["Contact_1"]/stack_size))) for j in range(int(np.ceil(signal_pairs.sizes["Contact_2"]/stack_size))) ] import concurrent.futures with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor: futures = [executor.submit(stack_dataset, dataset) for dataset in signal_pairs_split] signal_pairs_split_stacked = [future.result() for future in tqdm.tqdm(concurrent.futures.as_completed(futures), total=len(futures), desc="Stacking")] signal_pairs = xr.merge(signal_pairs_split_stacked) ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8687/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

2 rows where user = 35689176 sorted by updated_at descending

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Advanced export