github: issue_comments: 34 rows where user = 743508 sorted by updated

34 rows where user = 743508 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1311919228	https://github.com/pydata/xarray/issues/7280#issuecomment-1311919228	https://api.github.com/repos/pydata/xarray/issues/7280	IC_kwDOAMm_X85OMkx8	mangecoeur 743508	2022-11-11T16:27:57Z	2022-11-11T16:27:57Z	CONTRIBUTOR	@keewis using your solution things seem to more or less work, except that every operation of course 'loses' the `__array_namespace__` attr so anything like slicing only half works, plus a lot of indexing operations are not implemented on scipy sparse arrays.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Support for Scipy Sparse Arrays 1445486904
1311902588	https://github.com/pydata/xarray/issues/7280#issuecomment-1311902588	https://api.github.com/repos/pydata/xarray/issues/7280	IC_kwDOAMm_X85OMgt8	mangecoeur 743508	2022-11-11T16:14:12Z	2022-11-11T16:14:12Z	CONTRIBUTOR	Ok I had assumed that scipy would have directly implemented the array interface, I will see if there is already an issue open there. Then we can slowly see what else does/doesn't work.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Support for Scipy Sparse Arrays 1445486904
795114188	https://github.com/pydata/xarray/issues/4380#issuecomment-795114188	https://api.github.com/repos/pydata/xarray/issues/4380	MDEyOklzc3VlQ29tbWVudDc5NTExNDE4OA==	mangecoeur 743508	2021-03-10T09:00:48Z	2021-03-10T09:00:48Z	CONTRIBUTOR	Running into the same issue, when I: Load input from a Zarr data source Queue some processing (delayed dask ufuncs) Re-chunk using `chunk()` to get the dask task size I want use to_zarr to trigger the calculation (dask distributed backend) and save to a new file on disk I get the chunk size mismatch error which I solve by manually overwriting the `encoding['chunks']` value, which seems unintuitive to me. Since I'm going from->to a zarr, I assumed that calling `chunk()` would set the chunk size for both the dask arrays and the zarr output, since calling `to_zarr` on a dask array will only work if the dask and zarr encoding chunk size match. I didn't realize the `overwrite_encoded_chunks` option existed but it's also a bit confusing that to get the right chunksize on the output i need to set the overwrite option on the input.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Error when rechunking from Zarr store 686608969
602795869	https://github.com/pydata/xarray/issues/1378#issuecomment-602795869	https://api.github.com/repos/pydata/xarray/issues/1378	MDEyOklzc3VlQ29tbWVudDYwMjc5NTg2OQ==	mangecoeur 743508	2020-03-23T19:02:26Z	2020-03-23T19:02:26Z	CONTRIBUTOR	Just wondering what the status of this is. I've been running into bugs trying to model symmetric distance matrices using the same dimension. Interestingly, it does work very well for selecting, e.g. if use `.sel(nodes=node_list)` on a square matrix i correctly get a square matrix subset 👍 But unfortunately a lot of other things seems to break, e.g. concatenating fails with `ValueError: axes don't match array` :( What would need to happen to make this work?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Many methods are broken (e.g., concat/stack/sortby) when using repeated dimensions 222676855
584701023	https://github.com/pydata/xarray/issues/2049#issuecomment-584701023	https://api.github.com/repos/pydata/xarray/issues/2049	MDEyOklzc3VlQ29tbWVudDU4NDcwMTAyMw==	mangecoeur 743508	2020-02-11T15:47:28Z	2020-02-11T15:48:08Z	CONTRIBUTOR	Just run into this issue, present in 0.15, also does not respect the option `keep_attrs=True`	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Keeping attributes when using DataArray.astype 313010564
583488834	https://github.com/pydata/xarray/issues/3761#issuecomment-583488834	https://api.github.com/repos/pydata/xarray/issues/3761	MDEyOklzc3VlQ29tbWVudDU4MzQ4ODgzNA==	mangecoeur 743508	2020-02-07T16:37:05Z	2020-02-07T16:37:05Z	CONTRIBUTOR	I think it makes sense to support the conversion. Perhaps a better example is with a dataset: ```python x = np.arange(10) y = np.arange(10) data = np.zeros((len(x), len(y))) ds = xr.Dataset({k: xr.DataArray(data, coords=[x, y], dims=['x', 'y']) for k in ['a', 'b', 'c']}) ds.sel(x=1,y=1) <xarray.Dataset> Dimensions: () Coordinates: x int64 1 y int64 1 Data variables: a float64 0.0 b float64 0.0 c float64 0.0 ``` The output is a dataset of scalars, which converts fairly intuitively to a single row dataframe. But the folloiwing throws the same error. `python ds.sel(x=1,y=1).to_dataframe()` Or think of it another way - isn't it very un-intuitive that converting a single-item dataset to a dataframe works only if the item was selected using a length-1 list? To me that seems like a very arbitrary restriction. Following that logic, it also makes sense to have consistent behaviour between Datasets and DataArrays (even if you end up producing a single-element table).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	to_dataframe fails if dataarray has dimension 1 561539035
460174589	https://github.com/pydata/xarray/issues/2531#issuecomment-460174589	https://api.github.com/repos/pydata/xarray/issues/2531	MDEyOklzc3VlQ29tbWVudDQ2MDE3NDU4OQ==	mangecoeur 743508	2019-02-04T09:06:14Z	2019-02-04T09:06:43Z	CONTRIBUTOR	Perhaps related - I was running into MemoryErrors with a large array and also noticed that chunksizes were not respected (basically xarray tried to process the array in one go) - but it turned out that i'd forgotten to install both `bottleneck` and `numexpr` and after installing both (just installing bottleneck was not enough), everything worked as expected.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	DataArray.rolling() does not preserve chunksizes in some cases 376154741
311621960	https://github.com/pydata/xarray/issues/1467#issuecomment-311621960	https://api.github.com/repos/pydata/xarray/issues/1467	MDEyOklzc3VlQ29tbWVudDMxMTYyMTk2MA==	mangecoeur 743508	2017-06-28T10:33:33Z	2017-06-28T10:33:33Z	CONTRIBUTOR	I think I do mean 'years' in the CF convention sense, in this case the time dimension is: `double time(time=145); :standard_name = "time"; :units = "years since 1860-1-1 12:00:00"; :calendar = "proleptic_gregorian";` This is correctly interpreted by the NASA Panoply NetCDF file viewer. From glancing at the `xarray` code, it seems it depends on the pandas Timedelta object which in turn doesn't support years as delta objects (although date ranges can be generated at year intervals so it should be possible to implement).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CF conventions for time doesn't support years 238990919
303857073	https://github.com/pydata/xarray/issues/1424#issuecomment-303857073	https://api.github.com/repos/pydata/xarray/issues/1424	MDEyOklzc3VlQ29tbWVudDMwMzg1NzA3Mw==	mangecoeur 743508	2017-05-24T21:28:44Z	2017-05-24T21:28:44Z	CONTRIBUTOR	Dataset isn't chunked, and yes I am using cartopy to draw coastlines following the example in the docs: `python p = heatwaves_pop.plot(x='longitude', y='latitude', col='time', col_wrap=3, cmap='RdBu_r', vmin=-v_both, vmax=v_both, size=2, subplot_kws=dict(projection=crs.PlateCarree()) ) for ax in p.axes.flat: ax.coastlines()` where `heatwaves_pop` is calculated from a bunch of other xarray datasets. What surprised me was that they should all have been loaded into memory so I did not expect further increase in memory use.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Huge memory use when using FacetGrid 231061878
303748239	https://github.com/pydata/xarray/issues/1424#issuecomment-303748239	https://api.github.com/repos/pydata/xarray/issues/1424	MDEyOklzc3VlQ29tbWVudDMwMzc0ODIzOQ==	mangecoeur 743508	2017-05-24T14:51:06Z	2017-05-24T14:51:06Z	CONTRIBUTOR	16 maps, although like you say, I'm not sure if this is coming from xarray or matplotlib	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Huge memory use when using FacetGrid 231061878
285052725	https://github.com/pydata/xarray/issues/1301#issuecomment-285052725	https://api.github.com/repos/pydata/xarray/issues/1301	MDEyOklzc3VlQ29tbWVudDI4NTA1MjcyNQ==	mangecoeur 743508	2017-03-08T14:20:30Z	2017-03-08T14:20:30Z	CONTRIBUTOR	My 2cents - I've found that with big files any `%prun` tends to show `method 'acquire' of '_thread.lock'` as one of the highest time but it's not necessarily indicative of where the perf issue comes from because it's effectively just waiting for IO which is always slow. One thing that helps get a better profile is setting `dask` backend to the non-parallel `sync` option which gives cleaner profiles.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
274602298	https://github.com/pydata/xarray/pull/1162#issuecomment-274602298	https://api.github.com/repos/pydata/xarray/issues/1162	MDEyOklzc3VlQ29tbWVudDI3NDYwMjI5OA==	mangecoeur 743508	2017-01-23T20:09:24Z	2017-01-23T20:09:24Z	CONTRIBUTOR	Crickey. Fixed merge hopefully it works (I hate merge conflicts)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	#1161 WIP to vectorize isel_points 195125296
274567523	https://github.com/pydata/xarray/pull/1162#issuecomment-274567523	https://api.github.com/repos/pydata/xarray/issues/1162	MDEyOklzc3VlQ29tbWVudDI3NDU2NzUyMw==	mangecoeur 743508	2017-01-23T18:04:09Z	2017-01-23T18:04:09Z	CONTRIBUTOR	OK added a performance improvements section to the docs	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	#1161 WIP to vectorize isel_points 195125296
274564256	https://github.com/pydata/xarray/pull/1162#issuecomment-274564256	https://api.github.com/repos/pydata/xarray/issues/1162	MDEyOklzc3VlQ29tbWVudDI3NDU2NDI1Ng==	mangecoeur 743508	2017-01-23T17:52:33Z	2017-01-23T17:52:33Z	CONTRIBUTOR	Note - waiting for 0.9.0 to be released before updating whats new, don't want to end up with conflicts in docs	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	#1161 WIP to vectorize isel_points 195125296
272844516	https://github.com/pydata/xarray/pull/1162#issuecomment-272844516	https://api.github.com/repos/pydata/xarray/issues/1162	MDEyOklzc3VlQ29tbWVudDI3Mjg0NDUxNg==	mangecoeur 743508	2017-01-16T11:59:01Z	2017-01-16T11:59:01Z	CONTRIBUTOR	Ok will wait for 0.9.0 to be released	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	#1161 WIP to vectorize isel_points 195125296
272715240	https://github.com/pydata/xarray/pull/1162#issuecomment-272715240	https://api.github.com/repos/pydata/xarray/issues/1162	MDEyOklzc3VlQ29tbWVudDI3MjcxNTI0MA==	mangecoeur 743508	2017-01-15T18:53:26Z	2017-01-15T18:53:26Z	CONTRIBUTOR	Completed changes based on recommendations and cleaned up old code and comments. As for benchmarks, I don't have anything rigourous but I do have the following example `dataset` weather data from the CFSR dataset, 7 variables at hourly resolution, collected in one netCDF3 file per variable per month. In the particular case the difference is striking! `python %%time data = dataset.isel_points(time=np.arange(0,1000), lat=np.ones(1000, dtype=int), lon=np.ones(1000, dtype=int)) data.load()` Results: ``` xarray 0.8.2 CPU times: user 1min 21s, sys: 41.5 s, total: 2min 2s Wall time: 47.8 s master CPU times: user 385 ms, sys: 238 ms, total: 623 ms Wall time: 288 ms ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	#1161 WIP to vectorize isel_points 195125296
269093854	https://github.com/pydata/xarray/pull/1162#issuecomment-269093854	https://api.github.com/repos/pydata/xarray/issues/1162	MDEyOklzc3VlQ29tbWVudDI2OTA5Mzg1NA==	mangecoeur 743508	2016-12-24T17:49:10Z	2016-12-24T17:49:10Z	CONTRIBUTOR	@shoyer Tidied up based on recommendations, now everything done in a single loop (still need to make distinction between variables and coordinates for output but still a lot neater)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	#1161 WIP to vectorize isel_points 195125296
269026887	https://github.com/pydata/xarray/pull/1162#issuecomment-269026887	https://api.github.com/repos/pydata/xarray/issues/1162	MDEyOklzc3VlQ29tbWVudDI2OTAyNjg4Nw==	mangecoeur 743508	2016-12-23T18:13:52Z	2016-12-23T18:25:03Z	CONTRIBUTOR	OK I adjusted for the new behaviour and all tests pass locally, hopefully travis agrees... Edit: Looks like it's green	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	#1161 WIP to vectorize isel_points 195125296
268927305	https://github.com/pydata/xarray/pull/1162#issuecomment-268927305	https://api.github.com/repos/pydata/xarray/issues/1162	MDEyOklzc3VlQ29tbWVudDI2ODkyNzMwNQ==	mangecoeur 743508	2016-12-23T01:42:03Z	2016-12-23T01:42:03Z	CONTRIBUTOR	@shoyer I'm down to 1 test failing locally in `sel_points` but not sure what the desired behaviour is. I get: `<xarray.Dataset> Dimensions: (points: 3) Coordinates: * points (points) int64 0 1 2 Data variables: foo (points) int64 0 4 8` instead of `AssertionError: <xarray.Dataset> Dimensions: (points: 3) Coordinates: o points (points) - Data variables: foo (points) int64 0 4 8` But here I'm not sure if my code is wrong or the test. It seems that the test requires `sel_points` NOT to generate a new coordinate values for points - however I'm pretty sure `isel_points` does require this (it passes in any case). Don't really see a way in my code to generate subsets without having a matching coordinate array (I don't know how to use the Dataset constructors without one for instance). I've updated the test according to how I think it should be working, but please correct me if i misunderstood.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	#1161 WIP to vectorize isel_points 195125296
266995169	https://github.com/pydata/xarray/pull/1162#issuecomment-266995169	https://api.github.com/repos/pydata/xarray/issues/1162	MDEyOklzc3VlQ29tbWVudDI2Njk5NTE2OQ==	mangecoeur 743508	2016-12-14T10:10:11Z	2016-12-14T10:10:36Z	CONTRIBUTOR	So it seems to work fine in the Dask case, but I don't have a deep understanding of how DataArrays are constructed from arrays and dims so it fails in the non-dask case. Also not sure how you feel about making a special case for the dask backend here (since up till now it was all backend agnostic).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	#1161 WIP to vectorize isel_points 195125296
266598007	https://github.com/pydata/xarray/issues/1161#issuecomment-266598007	https://api.github.com/repos/pydata/xarray/issues/1161	MDEyOklzc3VlQ29tbWVudDI2NjU5ODAwNw==	mangecoeur 743508	2016-12-13T00:29:16Z	2016-12-13T00:29:16Z	CONTRIBUTOR	Seems to run a lot faster for me too...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Generated Dask graph is huge - performance issue? 195050684
266596464	https://github.com/pydata/xarray/issues/1161#issuecomment-266596464	https://api.github.com/repos/pydata/xarray/issues/1161	MDEyOklzc3VlQ29tbWVudDI2NjU5NjQ2NA==	mangecoeur 743508	2016-12-13T00:20:12Z	2016-12-13T00:20:12Z	CONTRIBUTOR	Done with PR #1162	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Generated Dask graph is huge - performance issue? 195050684
266587849	https://github.com/pydata/xarray/issues/1161#issuecomment-266587849	https://api.github.com/repos/pydata/xarray/issues/1161	MDEyOklzc3VlQ29tbWVudDI2NjU4Nzg0OQ==	mangecoeur 743508	2016-12-12T23:32:19Z	2016-12-12T23:33:03Z	CONTRIBUTOR	Thanks, I've been looking around and I think i'm getting close, however i'm not sure the best way to turn the array slice i get from vindex into a DataArray variable. I'm thinking I might but together a draft PR for comments. This is what i have so far: ```python def isel_points(self, dim='points', indexers): """Returns a new dataset with each array indexed pointwise along the specified dimension(s). This method selects pointwise values from each array and is akin to the NumPy indexing behavior of `arr[[0, 1], [0, 1]]`, except this method does not require knowing the order of each array's dimensions. Parameters ---------- dim : str or DataArray or pandas.Index or other list-like object, optional Name of the dimension to concatenate along. If dim is provided as a string, it must be a new dimension name, in which case it is added along axis=0. If dim is provided as a DataArray or Index or list-like object, its name, which must not be present in the dataset, is used as the dimension to concatenate along and the values are added as a coordinate. indexers : {dim: indexer, ...} Keyword arguments with names matching dimensions and values given by array-like objects. All indexers must be the same length and 1 dimensional. Returns ------- obj : Dataset A new Dataset with the same contents as this dataset, except each array and dimension is indexed by the appropriate indexers. With pointwise indexing, the new Dataset will always be a copy of the original. See Also -------- Dataset.sel Dataset.isel Dataset.sel_points DataArray.isel_points """ from .dataarray import DataArray indexer_dims = set(indexers) def relevant_keys(mapping): return [k for k, v in mapping.items() if any(d in indexer_dims for d in v.dims)] data_vars = relevant_keys(self.data_vars) coords = relevant_keys(self.coords) # all the indexers should be iterables keys = indexers.keys() indexers = [(k, np.asarray(v)) for k, v in iteritems(indexers)] # Check that indexers are valid dims, integers, and 1D for k, v in indexers: if k not in self.dims: raise ValueError("dimension %s does not exist" % k) if v.dtype.kind != 'i': raise TypeError('Indexers must be integers') if v.ndim != 1: raise ValueError('Indexers must be 1 dimensional') # all the indexers should have the same length lengths = set(len(v) for k, v in indexers) if len(lengths) > 1: raise ValueError('All indexers must be the same length') # Existing dimensions are not valid choices for the dim argument if isinstance(dim, basestring): if dim in self.dims: # dim is an invalid string raise ValueError('Existing dimension names are not valid ' 'choices for the dim argument in sel_points') elif hasattr(dim, 'dims'): # dim is a DataArray or Coordinate if dim.name in self.dims: # dim already exists raise ValueError('Existing dimensions are not valid choices ' 'for the dim argument in sel_points') if not utils.is_scalar(dim) and not isinstance(dim, DataArray): dim = as_variable(dim, name='points') variables = OrderedDict() indexers_dict = dict(indexers) non_indexed = list(set(self.dims) - indexer_dims) # TODO need to figure out how to make sure we get the indexed vs non indexed dimensions in the right order for name, var in self.variables.items(): slc = [] for k in var.dims: if k in indexers_dict: slc.append(indexers_dict[k]) else: slc.append(slice(None, None)) if hasattr(var.data, 'vindex'): variables[name] = DataArray(var.data.vindex[tuple(slc)], name=name) else: variables[name] = var[tuple(slc)] points_len = lengths.pop() new_variables = OrderedDict() for name, var in variables.items(): if name not in self.dims: coords = [variables[k] for k in non_indexed] new_variables[name] = DataArray(var, coords=[np.arange(points_len)] + coords, dims=[dim] + non_indexed) return xr.merge([v for k,v in new_variables.items() if k not in selection.dims]) # TODO: This would be sped up with vectorized indexing. This will # require dask to support pointwise indexing as well. return concat([self.isel(*d) for d in [dict(zip(keys, inds)) for inds in zip([v for k, v in indexers])]], dim=dim, coords=coords, data_vars=data_vars) ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Generated Dask graph is huge - performance issue? 195050684
266519121	https://github.com/pydata/xarray/issues/1161#issuecomment-266519121	https://api.github.com/repos/pydata/xarray/issues/1161	MDEyOklzc3VlQ29tbWVudDI2NjUxOTEyMQ==	mangecoeur 743508	2016-12-12T18:59:15Z	2016-12-12T18:59:15Z	CONTRIBUTOR	Ok I will have a look, where is this implemented (I always seem to have trouble pinpointing the dask-specific bits in the codebase :S )	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Generated Dask graph is huge - performance issue? 195050684
265966887	https://github.com/pydata/xarray/pull/1128#issuecomment-265966887	https://api.github.com/repos/pydata/xarray/issues/1128	MDEyOklzc3VlQ29tbWVudDI2NTk2Njg4Nw==	mangecoeur 743508	2016-12-09T09:08:48Z	2016-12-09T09:08:48Z	CONTRIBUTOR	@shoyer thanks, with a little testing it seems `lock=False` is fine (so don't automatically need dask dev for `lock=dask.utils.SerializableLock()`). Using spawning pool is necessary, just doesn't work without. Also looks like using dask distributed ipython backend works fine (works similar to spawn pool in that the worker engines aren't forked but kinda live in their own little world) - this is really nice because ipython in turn has good support for HPC systems (SGE batch scheduling + MPI for process handling).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Remove caching logic from xarray.Variable 189817033
265875012	https://github.com/pydata/xarray/pull/1128#issuecomment-265875012	https://api.github.com/repos/pydata/xarray/issues/1128	MDEyOklzc3VlQ29tbWVudDI2NTg3NTAxMg==	mangecoeur 743508	2016-12-08T22:28:25Z	2016-12-08T22:28:25Z	CONTRIBUTOR	I'm trying out the latest code to subset a set of netcdf4 files with dask.multiprocessing using `set_options(get=dask.multiprocessing.get)` but I'm still getting `TypeError: can't pickle _thread.lock objects` - this expect or there something specific I need to do to make it work?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Remove caching logic from xarray.Variable 189817033
230289863	https://github.com/pydata/xarray/issues/894#issuecomment-230289863	https://api.github.com/repos/pydata/xarray/issues/894	MDEyOklzc3VlQ29tbWVudDIzMDI4OTg2Mw==	mangecoeur 743508	2016-07-04T13:23:53Z	2016-07-04T13:23:53Z	CONTRIBUTOR	I think this is also a bug if you load a multifile dataset, since when you rename it you get a new dataset but when you trigger a read that goes back to the original files which haven't been renamed on-disk.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Dataset variable reference fails after renaming 163414759
223918870	https://github.com/pydata/xarray/issues/463#issuecomment-223918870	https://api.github.com/repos/pydata/xarray/issues/463	MDEyOklzc3VlQ29tbWVudDIyMzkxODg3MA==	mangecoeur 743508	2016-06-06T10:09:48Z	2016-06-06T10:09:48Z	CONTRIBUTOR	So using a cleaner minimal example it does appear that the files are closed after the dataset is closed. However, they are all open during dataset loading - this is what blows past the OSX default max open file limit. I think this could be a real issue when using Xarray to handle too-big-for-ram datasets - you could easily be trying to access 1000s of files (especially with weather data), so Xarray should limit the number it holds open at any one time during data load. Not being familiar with the internals I'm not sure if this is an issue in Xarray itself or in the Dask backend.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset too many files 94328498
223905394	https://github.com/pydata/xarray/issues/463#issuecomment-223905394	https://api.github.com/repos/pydata/xarray/issues/463	MDEyOklzc3VlQ29tbWVudDIyMzkwNTM5NA==	mangecoeur 743508	2016-06-06T09:06:33Z	2016-06-06T09:06:33Z	CONTRIBUTOR	@shoyer thanks - here's how i'm using mfdataset - not using any options. I'm going to try using the `h5netcdf` backend to see if I get the same results. I'm still not 100% confident that I'm tracking open files correctly with `lsof` so I'm going to try to make a minimal example to investigate. ``` python def weather_dataset(root_path: Path, *, start_date: datetime = None, end_date: datetime = None): flat_files_paths = get_dset_file_paths(root_path, start_date=start_date, end_date=end_date) # Convert Paths to list of strings for xarray dataset = xr.open_mfdataset([str(f) for f in flat_files_paths]) return dataset def cfsr_weather_loader(db, site_lookup_fn=None, dset_start=None, dset_end=None, site_conf=None): # Pull values out of the dt_conf = site_conf if site_conf else WEATHER_CFSR dset_start = dset_start if dset_start else dt_conf['start_dt'] dset_end = dset_end if dset_end else dt_conf['end_dt'] if site_lookup_fn is None: site_lookup_fn = site_lookup_postcode_district def weather_loader(site_id, start_date, end_date, resample=None): # using the tuple because always getting mixed up with lon/lat geo_lookup = site_lookup_fn(site_id, db) # With statement should ensure dset is closed after loading. with weather_dataset(WEATHER_CFSR['path'], start_date=dset_start, end_date=dset_end) as weather: data = weighted_regional_timeseries(weather, start_date, end_date, lon=geo_lookup.lon, lat=geo_lookup.lat, weights=geo_lookup.weights) # RENAME from CFSR standard data = data.rename(columns=WEATHER_RENAME) if resample is not None: data = data.resample(resample).mean() data.irradiance /= 1000.0 # convert irradiance to kW return data return weather_loader ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset too many files 94328498
223837612	https://github.com/pydata/xarray/issues/463#issuecomment-223837612	https://api.github.com/repos/pydata/xarray/issues/463	MDEyOklzc3VlQ29tbWVudDIyMzgzNzYxMg==	mangecoeur 743508	2016-06-05T21:05:40Z	2016-06-05T21:05:40Z	CONTRIBUTOR	So on investigation, even though my dataset creation is wrapped in a `with` block, using lsof to check the file handles held by my iPython kernel suggests that all the input files are still open. Are you certain that the backend correctly closes files in a multifile dataset? Is there a way to explicitly force this to happen?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset too many files 94328498
223810723	https://github.com/pydata/xarray/issues/463#issuecomment-223810723	https://api.github.com/repos/pydata/xarray/issues/463	MDEyOklzc3VlQ29tbWVudDIyMzgxMDcyMw==	mangecoeur 743508	2016-06-05T12:34:11Z	2016-06-05T12:34:11Z	CONTRIBUTOR	I still hit this issue after wrapping my open_mfdataset in a with statement. I'm suspecting to be an OSX problem, MacOS has a very low default max-open-files limit for applications started from the shell (like 256). It's not yet clear to me whether my datasets are being correctly closed, investigating...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset too many files 94328498
223687053	https://github.com/pydata/xarray/issues/463#issuecomment-223687053	https://api.github.com/repos/pydata/xarray/issues/463	MDEyOklzc3VlQ29tbWVudDIyMzY4NzA1Mw==	mangecoeur 743508	2016-06-03T20:31:56Z	2016-06-03T20:31:56Z	CONTRIBUTOR	It seems to happen even with a freshly restarted notebook, but I'll try a with statement to see if helps. On 3 Jun 2016 19:53, "Stephan Hoyer" notifications@github.com wrote: I suspect you hit this in IPython after rerunning cells, because file handles are only automatically closed when programs exit. You might find it a good idea to explicitly close files by calling .close() (or using a "with" statement) on Datasets opened with open_mfdataset. On Fri, Jun 3, 2016 at 11:08 AM, mangecoeur notifications@github.com wrote: I'm also running into this error - but strangely it only happens when using IPython interactive backend. I have some tests which work fine, but doing the same in IPython fails. I'm opening a few hundred files (about 10Mb each, one per month across a few variables). I'm using the default NetCDF backend. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/463#issuecomment-223651454, or mute the thread < https://github.com/notifications/unsubscribe/ABKS1sOTvuTtWVVFM7tnP7tnuGKvI-MBks5qIG2YgaJpZM4FWKen . — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/463#issuecomment-223663026, or mute the thread https://github.com/notifications/unsubscribe/AAtYVCtspqRb0AXy1ilbgoRuZN_syEDvks5qIHglgaJpZM4FWKen .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset too many files 94328498
223651454	https://github.com/pydata/xarray/issues/463#issuecomment-223651454	https://api.github.com/repos/pydata/xarray/issues/463	MDEyOklzc3VlQ29tbWVudDIyMzY1MTQ1NA==	mangecoeur 743508	2016-06-03T18:08:24Z	2016-06-03T18:08:24Z	CONTRIBUTOR	I'm also running into this error - but strangely it only happens when using IPython interactive backend. I have some tests which work fine, but doing the same in IPython fails. I'm opening a few hundred files (about 10Mb each, one per month across a few variables). I'm using the default NetCDF backend.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset too many files 94328498
222995827	https://github.com/pydata/xarray/issues/864#issuecomment-222995827	https://api.github.com/repos/pydata/xarray/issues/864	MDEyOklzc3VlQ29tbWVudDIyMjk5NTgyNw==	mangecoeur 743508	2016-06-01T13:42:21Z	2016-06-01T13:42:59Z	CONTRIBUTOR	On further investigation, it appears the problem is the dataset contains a mix of string and float data - the strings are redundant representations of the time stamp, therefore they don't appear in the index query. When I tried to convert to array, the numpy chokes on the mixed types. Explicitly selecting on the desired data variable solves this: `selection = cfsr_new.TMP_L103.sel(lon=lon_sel, lat=lat_sel, time=time_sel)` I think a clearer error message may be needed: when you do `sel` without indexing on certain dimensions, those are included in the resulting selection. It's possible for those to be of mixed incompatible types. Clearly to do `to_array` you need a numpy-friendly uniform type. The error should make this clearer.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	TypeError: invalid type promotion when reading multi-file dataset 157886730

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

34 rows where user = 743508 sorted by updated_at descending

return concat([self.isel(**d) for d in

[dict(zip(keys, inds)) for inds in

zip(*[v for k, v in indexers])]],

dim=dim, coords=coords, data_vars=data_vars)

Advanced export