html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/7221#issuecomment-1294262457,https://api.github.com/repos/pydata/xarray/issues/7221,1294262457,IC_kwDOAMm_X85NJOC5,1217238,2022-10-28T00:27:22Z,2022-10-28T00:27:22Z,MEMBER,"I no longer remember why I added these checks, but I certainly did not expect to see this sort of performance penalty!","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423312198 https://github.com/pydata/xarray/pull/7221#issuecomment-1293860075,https://api.github.com/repos/pydata/xarray/issues/7221,1293860075,IC_kwDOAMm_X85NHrzr,4160723,2022-10-27T17:40:52Z,2022-10-27T17:40:52Z,MEMBER,"Thanks @hmaarrfk! > I haven't fully understood why we had that code though? Me neither. I don't remember ever seeing this assertion error raised while refactoring things. Any idea @shoyer? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423312198 https://github.com/pydata/xarray/pull/7221#issuecomment-1293815240,https://api.github.com/repos/pydata/xarray/issues/7221,1293815240,IC_kwDOAMm_X85NHg3I,14371165,2022-10-27T16:58:45Z,2022-10-27T16:58:45Z,MEMBER,"``` before after ratio [c000690c] [24753f1f] - 3.17±0.02ms 1.94±0.01ms 0.61 merge.DatasetAddVariable.time_variable_insertion(100) - 81.5±2ms 17.0±0.2ms 0.21 merge.DatasetAddVariable.time_variable_insertion(1000) SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. PERFORMANCE INCREASED. ``` Nice improvements. :) I haven't fully understood why we had that code though?","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423312198 https://github.com/pydata/xarray/pull/7221#issuecomment-1291948502,https://api.github.com/repos/pydata/xarray/issues/7221,1291948502,IC_kwDOAMm_X85NAZHW,90008,2022-10-26T12:19:49Z,2022-10-26T12:23:46Z,CONTRIBUTOR,"I know it is not comparable, but I was really curious what ""dictionary insertion"" costs, in order to be able to understand if my comparisons were fair:
code ```python from tqdm import tqdm import xarray as xr from time import perf_counter import numpy as np N = 1000 # Everybody is lazy loading now, so lets force modules to get instantiated dummy_dataset = xr.Dataset() dummy_dataset['a'] = 1 dummy_dataset['b'] = 1 del dummy_dataset time_elapsed = np.zeros(N) # dataset = xr.Dataset() dataset = {} for i in tqdm(range(N)): # for i in range(N): time_start = perf_counter() dataset[f""var{i}""] = i time_end = perf_counter() time_elapsed[i] = time_end - time_start # %% from matplotlib import pyplot as plt plt.plot(np.arange(N), time_elapsed * 1E6, label='Time to add one variable') plt.xlabel(""Number of existing variables"") plt.ylabel(""Time to add a variables (us)"") plt.ylim([0, 10]) plt.title(""Dictionary insertion"") plt.grid(True) ```
![image](https://user-images.githubusercontent.com/90008/198024147-0965787a-32be-409b-959c-1b87adbc633a.png) I think xarray gives me 3 order of magnitude of ""thinking"" benefit, so I'll take it! ``` python --version Python 3.9.13 ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423312198 https://github.com/pydata/xarray/pull/7221#issuecomment-1291894024,https://api.github.com/repos/pydata/xarray/issues/7221,1291894024,IC_kwDOAMm_X85NAL0I,90008,2022-10-26T11:32:32Z,2022-10-26T11:32:32Z,CONTRIBUTOR,"Ok. I'll want to rethink them. I know it looks quadratic time, but i really would like to test n=1000 and i have an idea","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423312198 https://github.com/pydata/xarray/pull/7221#issuecomment-1291523800,https://api.github.com/repos/pydata/xarray/issues/7221,1291523800,IC_kwDOAMm_X85M-xbY,14371165,2022-10-26T05:27:11Z,2022-10-26T05:27:11Z,MEMBER,Now the asv finishes at least! Could you make a separate PR for the asv? I don't think it runs it when comparing to the main branch.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423312198 https://github.com/pydata/xarray/pull/7221#issuecomment-1291501993,https://api.github.com/repos/pydata/xarray/issues/7221,1291501993,IC_kwDOAMm_X85M-sGp,14371165,2022-10-26T04:56:39Z,2022-10-26T04:57:37Z,MEMBER,"I like large datasets as well. I seem to remember getting caught in similar places when creating my datasets. I think I solved it by using Variable instead, does doing something like this improve the performance for you? ```python import xarray as xr dataset = xr.Dataset() dataset['a'] = xr.Variable(dims=""time"", data=[1]) dataset['b'] = xr.Variable(dims=""time"", data=[2]) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423312198 https://github.com/pydata/xarray/pull/7221#issuecomment-1291493769,https://api.github.com/repos/pydata/xarray/issues/7221,1291493769,IC_kwDOAMm_X85M-qGJ,14371165,2022-10-26T04:44:43Z,2022-10-26T04:44:43Z,MEMBER,"``` Error: [ 75.90%] ··· dataset_creation.Creation.time_dataset_creation failed [ 75.90%] ···· asv: benchmark timed out (timeout 60.0s) ``` Maybe 1000 loops is too much. Start with 100 maybe? We still want these benchmarks to be decently fast in the CI.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423312198 https://github.com/pydata/xarray/pull/7221#issuecomment-1291450556,https://api.github.com/repos/pydata/xarray/issues/7221,1291450556,IC_kwDOAMm_X85M-fi8,90008,2022-10-26T03:32:53Z,2022-10-26T03:32:53Z,CONTRIBUTOR,"I'm somewhat ocnfused, I can run the benchmark locally ``` [ 1.80%] ··· dataset_creation.Creation.time_dataset_creation 4.37±0s ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423312198 https://github.com/pydata/xarray/pull/7221#issuecomment-1291447746,https://api.github.com/repos/pydata/xarray/issues/7221,1291447746,IC_kwDOAMm_X85M-e3C,90008,2022-10-26T03:27:36Z,2022-10-26T03:27:36Z,CONTRIBUTOR,":/ not fun, the benchmark is failing. not sure why.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423312198 https://github.com/pydata/xarray/pull/7221#issuecomment-1291399714,https://api.github.com/repos/pydata/xarray/issues/7221,1291399714,IC_kwDOAMm_X85M-TIi,90008,2022-10-26T02:14:40Z,2022-10-26T02:14:40Z,CONTRIBUTOR,"> Would be interesting to see whether this was covered by our existing asv benchmarks. I wasn't able to find something that really benchmarked ""large"" datasets. > Would be a good benchmark to add if we don't have one already. Added one. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423312198 https://github.com/pydata/xarray/pull/7221#issuecomment-1291389702,https://api.github.com/repos/pydata/xarray/issues/7221,1291389702,IC_kwDOAMm_X85M-QsG,90008,2022-10-26T01:59:57Z,2022-10-26T01:59:57Z,CONTRIBUTOR,"> out of interest, how did you find this? Spyder profiler","{""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423312198 https://github.com/pydata/xarray/pull/7221#issuecomment-1291388733,https://api.github.com/repos/pydata/xarray/issues/7221,1291388733,IC_kwDOAMm_X85M-Qc9,5635139,2022-10-26T01:58:00Z,2022-10-26T01:58:00Z,MEMBER,"Gosh, that's quite dramatic! Impressive find @hmaarrfk. (out of interest, how did you find this?) I can see how that's quadratic when looping like that. I wonder whether using `.assign(var1=1, var2=2, ...)` has the same behavior? Would be interesting to see whether this was covered by our existing asv benchmarks. Would be a good benchmark to add if we don't have one already.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1423312198