home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 806711702

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/2857#issuecomment-806711702 https://api.github.com/repos/pydata/xarray/issues/2857 806711702 MDEyOklzc3VlQ29tbWVudDgwNjcxMTcwMg== 2418513 2021-03-25T13:08:46Z 2021-03-25T13:08:46Z NONE

@kmuehlbauer Just installed h5netcdf=0.10.0, here's the timings when there's 200 groups in file - store.close() takes 92.4% of time again:

1078 1 1.0 1.0 0.0 try: 1079 # TODO: allow this work (setting up the file for writing array data) 1080 # to be parallelized with dask 1081 2 221642.0 110821.0 4.2 dump_to_store( 1082 1 2.0 2.0 0.0 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1083 ) 1084 1 3.0 3.0 0.0 if autoclose: 1085 store.close() 1086 1087 1 1.0 1.0 0.0 if multifile: 1088 return writer, store 1089 1090 1 6.0 6.0 0.0 writes = writer.sync(compute=compute) 1091 1092 1 1.0 1.0 0.0 if path_or_file is None: 1093 store.sync() 1094 return target.getvalue() 1095 finally: 1096 1 2.0 2.0 0.0 if not multifile and compute: 1097 1 4857912.0 4857912.0 92.6 store.close()

And here's _lookup_dimensions(): (note that it only takes half of the time, there's tons of other time spent in File.flush() which I don't understand):

``` Timer unit: 1e-06 s

Total time: 2.44857 s File: .../python3.8/site-packages/h5netcdf/core.py Function: _lookup_dimensions at line 92

Line # Hits Time Per Hit % Time Line Contents

92                                               def _lookup_dimensions(self):
93       400      65513.0    163.8      2.7          attrs = self._h5ds.attrs
94       400       6175.0     15.4      0.3          if "_Netcdf4Coordinates" in attrs:
95                                                       order_dim = _reverse_dict(self._parent._dim_order)
96                                                       return tuple(
97                                                           order_dim[coord_id] for coord_id in attrs["_Netcdf4Coordinates"]
98                                                       )
99

100 400 44938.0 112.3 1.8 child_name = self.name.split("/")[-1] 101 400 5006.0 12.5 0.2 if child_name in self._parent.dimensions: 102 return (child_name,) 103
104 400 350.0 0.9 0.0 dims = [] 105 400 781.0 2.0 0.0 phony_dims = defaultdict(int) 106 1400 166093.0 118.6 6.8 for axis, dim in enumerate(self._h5ds.dims): 107 # get current dimension 108 1000 119507.0 119.5 4.9 dimsize = self.shape[axis] 109 1000 2459.0 2.5 0.1 phony_dims[dimsize] += 1 110 1000 34345.0 34.3 1.4 if len(dim): 111 1000 2001071.0 2001.1 81.7 name = _name_from_dimension(dim) 112 else: 113 # if unlabeled dimensions are found 114 if self._root._phony_dims_mode is None: 115 raise ValueError( 116 "variable %r has no dimension scale " 117 "associated with axis %s. \n" 118 "Use phony_dims=%r for sorted naming or " 119 "phony_dims=%r for per access naming." 120 % (self.name, axis, "sort", "access") 121 ) 122 else: 123 # get dimension name 124 name = self._parent._phony_dims[(dimsize, phony_dims[dimsize] - 1)] 125 1000 1820.0 1.8 0.1 dims.append(name) 126 400 512.0 1.3 0.0 return tuple(dims) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  427410885
Powered by Datasette · Queries took 1.029ms · About: xarray-datasette