home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where author_association = "CONTRIBUTOR" and user = 1492047 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 6

  • Dataset.where performances regression. 4
  • float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray 3
  • Support explicitly setting a dimension order with to_dataframe() 3
  • Cannot open NetCDF file if dimension with time coordinate has length 0 (`ValueError` when decoding CF datetime) 1
  • Dataset/DataArray to_dataframe() dimensions order mismatch. 1
  • open_mfdataset usage and limitations. 1

user 1

  • Thomas-Z · 13 ✖

author_association 1

  • CONTRIBUTOR · 13 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1532601237 https://github.com/pydata/xarray/issues/7516#issuecomment-1532601237 https://api.github.com/repos/pydata/xarray/issues/7516 IC_kwDOAMm_X85bWaOV Thomas-Z 1492047 2023-05-03T07:58:22Z 2023-05-03T07:58:22Z CONTRIBUTOR

Hello,

I'm not sure performances problematics were fully addressed (we're now forced to fully compute/load the selection expression) but changes made in the last versions makes this issue irrelevant and I think we can close it.

Thank you!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.where performances regression. 1575938277
1451754167 https://github.com/pydata/xarray/issues/7516#issuecomment-1451754167 https://api.github.com/repos/pydata/xarray/issues/7516 IC_kwDOAMm_X85WiAK3 Thomas-Z 1492047 2023-03-02T11:59:47Z 2023-03-02T11:59:47Z CONTRIBUTOR

The .variable computation is fast but it cannot be directly used like you suggest: ``` dsx.where(sel.variable, drop=True)

TypeError: cond argument is <xarray.Variable (num_lines: 5761870, num_pixels: 71)> ... but must be a <class 'xarray.core.dataset.Dataset'> or <class 'xarray.core.dataarray.DataArray'> ```

Doing it like this seems to be working correctly (and is fast enough): dsx["x"]= sel.variable.compute() dsx.where(dsx["x"], drop=True)

_nadir variables have the same chunks and are way faster to read than the other ones (lot smaller).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.where performances regression. 1575938277
1449714522 https://github.com/pydata/xarray/issues/7516#issuecomment-1449714522 https://api.github.com/repos/pydata/xarray/issues/7516 IC_kwDOAMm_X85WaONa Thomas-Z 1492047 2023-03-01T09:43:27Z 2023-03-01T09:43:27Z CONTRIBUTOR

sel = (dsx["longitude"] > 0) & (dsx["longitude"] < 100) sel.compute() This "compute" finishes and takes more than 80sec on both versions with a huge memory consumption (it loads the 4 coordinates and the result itself).

I know xarray has to keep more information regarding coordinates and dimensions but doing this (just dask arrays) : sel2 = (dsx["longitude"].data > 0) & (dsx["longitude"].data < 100) sel2.compute() Takes less than 6 seconds.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.where performances regression. 1575938277
1447798846 https://github.com/pydata/xarray/issues/7516#issuecomment-1447798846 https://api.github.com/repos/pydata/xarray/issues/7516 IC_kwDOAMm_X85WS6g- Thomas-Z 1492047 2023-02-28T08:54:16Z 2023-02-28T11:24:11Z CONTRIBUTOR

Just tried it and it does not seem identical at all to what was happening earlier.

This is the kind of dataset I'm working

With this selection: sel = (dsx["longitude"] > 0) & (dsx["longitude"] < 100)

Old xarray takes a little less that 1 minute and less than 6GB of memory. New xarray with compute did not finish and had to be stopped before consuming my 16GB of memory.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.where performances regression. 1575938277
863271758 https://github.com/pydata/xarray/issues/1329#issuecomment-863271758 https://api.github.com/repos/pydata/xarray/issues/1329 MDEyOklzc3VlQ29tbWVudDg2MzI3MTc1OA== Thomas-Z 1492047 2021-06-17T14:08:54Z 2021-06-17T14:11:47Z CONTRIBUTOR

Hello,

Using the same code sample:

``` import numpy import xarray

ds = xarray.Dataset( {"a": ("x", [])}, coords={"x": numpy.zeros(shape=0, dtype="M8[ns]")})

ds.to_netcdf("/tmp/test.nc")

xarray.open_dataset("/tmp/test.nc") ```

It works on xarray 0.17 but does not work anymore with xarray 0.18 & 0.18.2.

This addition seems to be responsible (coming from this commit).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot open NetCDF file if dimension with time coordinate has length 0 (`ValueError` when decoding CF datetime) 217216935
674361521 https://github.com/pydata/xarray/pull/4333#issuecomment-674361521 https://api.github.com/repos/pydata/xarray/issues/4333 MDEyOklzc3VlQ29tbWVudDY3NDM2MTUyMQ== Thomas-Z 1492047 2020-08-15T07:20:53Z 2020-08-15T07:20:53Z CONTRIBUTOR

My pleasure. I've been a user for a few years now, I'll gladly give something back whenever I can.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support explicitly setting a dimension order with to_dataframe() 676696822
672084936 https://github.com/pydata/xarray/pull/4333#issuecomment-672084936 https://api.github.com/repos/pydata/xarray/issues/4333 MDEyOklzc3VlQ29tbWVudDY3MjA4NDkzNg== Thomas-Z 1492047 2020-08-11T16:49:19Z 2020-08-11T16:49:19Z CONTRIBUTOR

Do we want DataArray.to_dataframe to be consistent with Dataset.to_dataframe regarding the default dimension ordering (i.e. alphabetically) or do we want to keep the current behavior (DataArray.dims order)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support explicitly setting a dimension order with to_dataframe() 676696822
672030936 https://github.com/pydata/xarray/pull/4333#issuecomment-672030936 https://api.github.com/repos/pydata/xarray/issues/4333 MDEyOklzc3VlQ29tbWVudDY3MjAzMDkzNg== Thomas-Z 1492047 2020-08-11T15:51:02Z 2020-08-11T15:54:40Z CONTRIBUTOR

Hello,

I actually followed @shoyer suggestion to use to_dask_dataframe parameter name.

And I just realized I only did half the work. I'll add this parameter to DataArray.to_dataframe if you validate this name.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support explicitly setting a dimension order with to_dataframe() 676696822
410792506 https://github.com/pydata/xarray/issues/2304#issuecomment-410792506 https://api.github.com/repos/pydata/xarray/issues/2304 MDEyOklzc3VlQ29tbWVudDQxMDc5MjUwNg== Thomas-Z 1492047 2018-08-06T17:47:23Z 2019-01-09T15:18:36Z CONTRIBUTOR

To explain the full context and why it became some kind of a problem to us :

We're experimenting with the parquet format (via pyarrow) and we first did something like : netcdf file -> netcdf4 -> pandas -> pyarrow -> pandas (when read later on).

We're now looking at xarray and the huge ease of access it offers to netcdf like data and we tried something similar : netcdf file -> xarray -> pandas -> pyarrow -> pandas (when read later on).

Our problem appears when we're reading and comparing the data stored with these 2 approches. The difference between the 2 was - sometimes - larger than what expected/acceptable (10e-6 for float32 if I'm not mistaken). We're not constraining any type and letting the system and modules decide how to encode what and in the end we have significantly different values.

There might be something wrong in our process but it originate here with this float32/float64 choice so we thought it might be a problem.

Thanks for taking the time to look into this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray  343659822
432546977 https://github.com/pydata/xarray/issues/2501#issuecomment-432546977 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDQzMjU0Njk3Nw== Thomas-Z 1492047 2018-10-24T07:38:31Z 2018-10-24T07:38:31Z CONTRIBUTOR

Thank you for looking into this.

I just want to point out that I'm not that much concerned with the "slow performance" but much more with the memory consumption and the limitation it implies.

```python from glob import glob import xarray as xr

all_files = glob('...TP110.nc') display(xr.open_dataset(all_files[0])) display(xr.open_dataset(all_files[1])) ```

<xarray.Dataset> Dimensions: (meas_ind: 40, time: 2871, wvf_ind: 128) Coordinates: * time (time) datetime64[ns] 2017-06-19T14:24:20.792036992 ... 2017-06-19T15:14:38.491743104 * meas_ind (meas_ind) int8 0 1 2 3 4 ... 36 37 38 39 * wvf_ind (wvf_ind) int8 0 1 2 3 ... 125 126 127 lat (time) float64 ... lon (time) float64 ... lon_40hz (time, meas_ind) float64 ... lat_40hz (time, meas_ind) float64 ... Data variables: time_40hz (time, meas_ind) datetime64[ns] ... surface_type (time) float32 ... rad_surf_type (time) float32 ... qual_alt_1hz_range (time) float32 ... qual_alt_1hz_swh (time) float32 ... qual_alt_1hz_sig0 (time) float32 ... qual_alt_1hz_off_nadir_angle_wf (time) float32 ... qual_inst_corr_1hz_range (time) float32 ... qual_inst_corr_1hz_swh (time) float32 ... qual_inst_corr_1hz_sig0 (time) float32 ... qual_rad_1hz_tb_k (time) float32 ... qual_rad_1hz_tb_ka (time) float32 ... alt_state_flag_acq_mode_40hz (time, meas_ind) float32 ... alt_state_flag_tracking_mode_40hz (time, meas_ind) float32 ... orb_state_flag_diode (time) float32 ... orb_state_flag_rest (time) float32 ... ecmwf_meteo_map_avail (time) float32 ... trailing_edge_variation_flag (time) float32 ... trailing_edge_variation_flag_40hz (time, meas_ind) float32 ... ice_flag (time) float32 ... interp_flag_mean_sea_surface (time) float32 ... interp_flag_mdt (time) float32 ... interp_flag_ocean_tide_sol1 (time) float32 ... interp_flag_ocean_tide_sol2 (time) float32 ... interp_flag_meteo (time) float32 ... alt (time) float64 ... alt_40hz (time, meas_ind) float64 ... orb_alt_rate (time) float32 ... range (time) float64 ... range_40hz (time, meas_ind) float64 ... range_used_40hz (time, meas_ind) float32 ... range_rms (time) float32 ... range_numval (time) float32 ... number_of_iterations (time, meas_ind) float32 ... net_instr_corr_range (time) float64 ... model_dry_tropo_corr (time) float32 ... model_wet_tropo_corr (time) float32 ... rad_wet_tropo_corr (time) float32 ... iono_corr_gim (time) float32 ... sea_state_bias (time) float32 ... swh (time) float32 ... swh_40hz (time, meas_ind) float32 ... swh_used_40hz (time, meas_ind) float32 ... swh_rms (time) float32 ... swh_numval (time) float32 ... net_instr_corr_swh (time) float32 ... sig0 (time) float32 ... sig0_40hz (time, meas_ind) float32 ... sig0_used_40hz (time, meas_ind) float32 ... sig0_rms (time) float32 ... sig0_numval (time) float32 ... agc (time) float32 ... agc_rms (time) float32 ... agc_numval (time) float32 ... net_instr_corr_sig0 (time) float32 ... atmos_corr_sig0 (time) float32 ... off_nadir_angle_wf (time) float32 ... off_nadir_angle_wf_40hz (time, meas_ind) float32 ... tb_k (time) float32 ... tb_ka (time) float32 ... mean_sea_surface (time) float64 ... mean_topography (time) float64 ... geoid (time) float64 ... bathymetry (time) float64 ... inv_bar_corr (time) float32 ... hf_fluctuations_corr (time) float32 ... ocean_tide_sol1 (time) float64 ... ocean_tide_sol2 (time) float64 ... ocean_tide_equil (time) float32 ... ocean_tide_non_equil (time) float32 ... load_tide_sol1 (time) float32 ... load_tide_sol2 (time) float32 ... solid_earth_tide (time) float32 ... pole_tide (time) float32 ... wind_speed_model_u (time) float32 ... wind_speed_model_v (time) float32 ... wind_speed_alt (time) float32 ... rad_water_vapor (time) float32 ... rad_liquid_water (time) float32 ... ice1_range_40hz (time, meas_ind) float64 ... ice1_sig0_40hz (time, meas_ind) float32 ... ice1_qual_flag_40hz (time, meas_ind) float32 ... seaice_range_40hz (time, meas_ind) float64 ... seaice_sig0_40hz (time, meas_ind) float32 ... seaice_qual_flag_40hz (time, meas_ind) float32 ... ice2_range_40hz (time, meas_ind) float64 ... ice2_le_sig0_40hz (time, meas_ind) float32 ... ice2_sig0_40hz (time, meas_ind) float32 ... ice2_sigmal_40hz (time, meas_ind) float32 ... ice2_slope1_40hz (time, meas_ind) float64 ... ice2_slope2_40hz (time, meas_ind) float64 ... ice2_mqe_40hz (time, meas_ind) float32 ... ice2_qual_flag_40hz (time, meas_ind) float32 ... mqe_40hz (time, meas_ind) float32 ... peakiness_40hz (time, meas_ind) float32 ... ssha (time) float32 ... tracker_40hz (time, meas_ind) float64 ... tracker_used_40hz (time, meas_ind) float32 ... tracker_diode_40hz (time, meas_ind) float64 ... pri_counter_40hz (time, meas_ind) float64 ... qual_alt_1hz_off_nadir_angle_pf (time) float32 ... off_nadir_angle_pf (time) float32 ... off_nadir_angle_rain_40hz (time, meas_ind) float32 ... uso_corr (time) float64 ... internal_path_delay_corr (time) float64 ... modeled_instr_corr_range (time) float32 ... doppler_corr (time) float32 ... cog_corr (time) float32 ... modeled_instr_corr_swh (time) float32 ... internal_corr_sig0 (time) float32 ... modeled_instr_corr_sig0 (time) float32 ... agc_40hz (time, meas_ind) float32 ... agc_corr_40hz (time, meas_ind) float32 ... scaling_factor_40hz (time, meas_ind) float64 ... epoch_40hz (time, meas_ind) float64 ... width_leading_edge_40hz (time, meas_ind) float64 ... amplitude_40hz (time, meas_ind) float64 ... thermal_noise_40hz (time, meas_ind) float64 ... seaice_epoch_40hz (time, meas_ind) float64 ... seaice_amplitude_40hz (time, meas_ind) float64 ... ice2_epoch_40hz (time, meas_ind) float64 ... ice2_amplitude_40hz (time, meas_ind) float64 ... ice2_mean_amplitude_40hz (time, meas_ind) float64 ... ice2_thermal_noise_40hz (time, meas_ind) float64 ... ice2_slope_40hz (time, meas_ind) float64 ... signal_to_noise_ratio (time) float32 ... waveforms_40hz (time, meas_ind, wvf_ind) float32 ... Attributes: Conventions: CF-1.1 title: GDR - Expertise dataset institution: CNES source: radar altimeter history: 2017-07-21 08:25:07 : Creation contact: CNES aviso@oceanobs.com, EUMETSAT ops@... references: L1 library=V4.5p1, L2 library=V5.5p2, ... processing_center: SALP reference_document: SARAL/ALTIKA Products Handbook, SALP-M... mission_name: SARAL altimeter_sensor_name: ALTIKA radiometer_sensor_name: ALTIKA_RAD doris_sensor_name: DGXX cycle_number: 110 absolute_rev_number: 22545 pass_number: 1 absolute_pass_number: 109219 equator_time: 2017-06-19 14:49:32.128000 equator_longitude: 227.77 first_meas_time: 2017-06-19 14:24:20.792037 last_meas_time: 2017-06-19 15:14:38.491743 xref_altimeter_level1: ALK_ALT_1PaS20170619_154722_20170619_1... xref_radiometer_level1: ALK_RAD_1PaS20170619_154643_20170619_1... xref_altimeter_characterisation: ALK_CHA_AXVCNE20131115_120000_20100101... xref_radiometer_characterisation: ALK_CHR_AXVCNE20110207_180000_20110101... xref_altimeter_ltm: ALK_CAL_AXXCNE20170720_110014_20130102... xref_doris_uso: SRL_OS1_AXXCNE20170720_083800_20130226... xref_orbit_data: SRL_VOR_AXVCNE20170720_111700_20170618... xref_pf_data: SRL_VPF_AXVCNE20170720_111800_20170618... xref_pole_location: SMM_POL_AXXCNE20170721_071500_19870101... xref_gim_data: SRL_ION_AXPCNE20170620_074756_20170619... xref_mog2d_data: SMM_MOG_AXVCNE20170709_191501_20170619... xref_orf_data: SRL_ORF_AXXCNE20170720_083800_20160704... xref_meteorological_files: SMM_APA_AXVCNE20170619_170611_20170619... ellipsoid_axis: 6378136.3 ellipsoid_flattening: 0.0033528131778969 <xarray.Dataset> Dimensions: (meas_ind: 40, time: 2779, wvf_ind: 128) Coordinates: * time (time) datetime64[ns] 2017-06-19T15:14:39.356848 ... 2017-06-19T16:04:56.808873920 * meas_ind (meas_ind) int8 0 1 2 3 4 ... 36 37 38 39 * wvf_ind (wvf_ind) int8 0 1 2 3 ... 125 126 127 lat (time) float64 ... lon (time) float64 ... lon_40hz (time, meas_ind) float64 ... lat_40hz (time, meas_ind) float64 ... Data variables: time_40hz (time, meas_ind) datetime64[ns] ... surface_type (time) float32 ... rad_surf_type (time) float32 ... qual_alt_1hz_range (time) float32 ... qual_alt_1hz_swh (time) float32 ... qual_alt_1hz_sig0 (time) float32 ... qual_alt_1hz_off_nadir_angle_wf (time) float32 ... qual_inst_corr_1hz_range (time) float32 ... qual_inst_corr_1hz_swh (time) float32 ... qual_inst_corr_1hz_sig0 (time) float32 ... qual_rad_1hz_tb_k (time) float32 ... qual_rad_1hz_tb_ka (time) float32 ... alt_state_flag_acq_mode_40hz (time, meas_ind) float32 ... alt_state_flag_tracking_mode_40hz (time, meas_ind) float32 ... orb_state_flag_diode (time) float32 ... orb_state_flag_rest (time) float32 ... ecmwf_meteo_map_avail (time) float32 ... trailing_edge_variation_flag (time) float32 ... trailing_edge_variation_flag_40hz (time, meas_ind) float32 ... ice_flag (time) float32 ... interp_flag_mean_sea_surface (time) float32 ... interp_flag_mdt (time) float32 ... interp_flag_ocean_tide_sol1 (time) float32 ... interp_flag_ocean_tide_sol2 (time) float32 ... interp_flag_meteo (time) float32 ... alt (time) float64 ... alt_40hz (time, meas_ind) float64 ... orb_alt_rate (time) float32 ... range (time) float64 ... range_40hz (time, meas_ind) float64 ... range_used_40hz (time, meas_ind) float32 ... range_rms (time) float32 ... range_numval (time) float32 ... number_of_iterations (time, meas_ind) float32 ... net_instr_corr_range (time) float64 ... model_dry_tropo_corr (time) float32 ... model_wet_tropo_corr (time) float32 ... rad_wet_tropo_corr (time) float32 ... iono_corr_gim (time) float32 ... sea_state_bias (time) float32 ... swh (time) float32 ... swh_40hz (time, meas_ind) float32 ... swh_used_40hz (time, meas_ind) float32 ... swh_rms (time) float32 ... swh_numval (time) float32 ... net_instr_corr_swh (time) float32 ... sig0 (time) float32 ... sig0_40hz (time, meas_ind) float32 ... sig0_used_40hz (time, meas_ind) float32 ... sig0_rms (time) float32 ... sig0_numval (time) float32 ... agc (time) float32 ... agc_rms (time) float32 ... agc_numval (time) float32 ... net_instr_corr_sig0 (time) float32 ... atmos_corr_sig0 (time) float32 ... off_nadir_angle_wf (time) float32 ... off_nadir_angle_wf_40hz (time, meas_ind) float32 ... tb_k (time) float32 ... tb_ka (time) float32 ... mean_sea_surface (time) float64 ... mean_topography (time) float64 ... geoid (time) float64 ... bathymetry (time) float64 ... inv_bar_corr (time) float32 ... hf_fluctuations_corr (time) float32 ... ocean_tide_sol1 (time) float64 ... ocean_tide_sol2 (time) float64 ... ocean_tide_equil (time) float32 ... ocean_tide_non_equil (time) float32 ... load_tide_sol1 (time) float32 ... load_tide_sol2 (time) float32 ... solid_earth_tide (time) float32 ... pole_tide (time) float32 ... wind_speed_model_u (time) float32 ... wind_speed_model_v (time) float32 ... wind_speed_alt (time) float32 ... rad_water_vapor (time) float32 ... rad_liquid_water (time) float32 ... ice1_range_40hz (time, meas_ind) float64 ... ice1_sig0_40hz (time, meas_ind) float32 ... ice1_qual_flag_40hz (time, meas_ind) float32 ... seaice_range_40hz (time, meas_ind) float64 ... seaice_sig0_40hz (time, meas_ind) float32 ... seaice_qual_flag_40hz (time, meas_ind) float32 ... ice2_range_40hz (time, meas_ind) float64 ... ice2_le_sig0_40hz (time, meas_ind) float32 ... ice2_sig0_40hz (time, meas_ind) float32 ... ice2_sigmal_40hz (time, meas_ind) float32 ... ice2_slope1_40hz (time, meas_ind) float64 ... ice2_slope2_40hz (time, meas_ind) float64 ... ice2_mqe_40hz (time, meas_ind) float32 ... ice2_qual_flag_40hz (time, meas_ind) float32 ... mqe_40hz (time, meas_ind) float32 ... peakiness_40hz (time, meas_ind) float32 ... ssha (time) float32 ... tracker_40hz (time, meas_ind) float64 ... tracker_used_40hz (time, meas_ind) float32 ... tracker_diode_40hz (time, meas_ind) float64 ... pri_counter_40hz (time, meas_ind) float64 ... qual_alt_1hz_off_nadir_angle_pf (time) float32 ... off_nadir_angle_pf (time) float32 ... off_nadir_angle_rain_40hz (time, meas_ind) float32 ... uso_corr (time) float64 ... internal_path_delay_corr (time) float64 ... modeled_instr_corr_range (time) float32 ... doppler_corr (time) float32 ... cog_corr (time) float32 ... modeled_instr_corr_swh (time) float32 ... internal_corr_sig0 (time) float32 ... modeled_instr_corr_sig0 (time) float32 ... agc_40hz (time, meas_ind) float32 ... agc_corr_40hz (time, meas_ind) float32 ... scaling_factor_40hz (time, meas_ind) float64 ... epoch_40hz (time, meas_ind) float64 ... width_leading_edge_40hz (time, meas_ind) float64 ... amplitude_40hz (time, meas_ind) float64 ... thermal_noise_40hz (time, meas_ind) float64 ... seaice_epoch_40hz (time, meas_ind) float64 ... seaice_amplitude_40hz (time, meas_ind) float64 ... ice2_epoch_40hz (time, meas_ind) float64 ... ice2_amplitude_40hz (time, meas_ind) float64 ... ice2_mean_amplitude_40hz (time, meas_ind) float64 ... ice2_thermal_noise_40hz (time, meas_ind) float64 ... ice2_slope_40hz (time, meas_ind) float64 ... signal_to_noise_ratio (time) float32 ... waveforms_40hz (time, meas_ind, wvf_ind) float32 ... Attributes: Conventions: CF-1.1 title: GDR - Expertise dataset institution: CNES source: radar altimeter history: 2017-07-21 08:25:19 : Creation contact: CNES aviso@oceanobs.com, EUMETSAT ops@... references: L1 library=V4.5p1, L2 library=V5.5p2, ... processing_center: SALP reference_document: SARAL/ALTIKA Products Handbook, SALP-M... mission_name: SARAL altimeter_sensor_name: ALTIKA radiometer_sensor_name: ALTIKA_RAD doris_sensor_name: DGXX cycle_number: 110 absolute_rev_number: 22546 pass_number: 2 absolute_pass_number: 109220 equator_time: 2017-06-19 15:39:46.492000 equator_longitude: 35.21 first_meas_time: 2017-06-19 15:14:39.356848 last_meas_time: 2017-06-19 16:04:56.808874 xref_altimeter_level1: ALK_ALT_1PaS20170619_154722_20170619_1... xref_radiometer_level1: ALK_RAD_1PaS20170619_154643_20170619_1... xref_altimeter_characterisation: ALK_CHA_AXVCNE20131115_120000_20100101... xref_radiometer_characterisation: ALK_CHR_AXVCNE20110207_180000_20110101... xref_altimeter_ltm: ALK_CAL_AXXCNE20170720_110014_20130102... xref_doris_uso: SRL_OS1_AXXCNE20170720_083800_20130226... xref_orbit_data: SRL_VOR_AXVCNE20170720_111700_20170618... xref_pf_data: SRL_VPF_AXVCNE20170720_111800_20170618... xref_pole_location: SMM_POL_AXXCNE20170721_071500_19870101... xref_gim_data: SRL_ION_AXPCNE20170620_074756_20170619... xref_mog2d_data: SMM_MOG_AXVCNE20170709_191501_20170619... xref_orf_data: SRL_ORF_AXXCNE20170720_083800_20160704... xref_meteorological_files: SMM_APA_AXVCNE20170619_170611_20170619... ellipsoid_axis: 6378136.3 ellipsoid_flattening: 0.0033528131778969

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
411385081 https://github.com/pydata/xarray/issues/2304#issuecomment-411385081 https://api.github.com/repos/pydata/xarray/issues/2304 MDEyOklzc3VlQ29tbWVudDQxMTM4NTA4MQ== Thomas-Z 1492047 2018-08-08T12:18:02Z 2018-08-22T07:14:58Z CONTRIBUTOR

So, a more complete example showing this problem. NetCDF file used in the example : test.nc.zip

````python from netCDF4 import Dataset import xarray as xr import numpy as np import pandas as pd

d = Dataset("test.nc") v = d.variables['var']

print(v)

<class 'netCDF4._netCDF4.Variable'>

int16 var(idx)

_FillValue: 32767

scale_factor: 0.01

unlimited dimensions:

current shape = (2,)

filling on

df_nc = pd.DataFrame(data={'var': v[:]})

print(df_nc)

var

0 21.94

1 27.04

ds = xr.open_dataset("test.nc") df_xr = ds['var'].to_dataframe()

Comparing both dataframes with float32 precision (1e-6)

mask = np.isclose(df_nc['var'], df_xr['var'], rtol=0, atol=1e-6)

print(mask)

[False True]

print(df_xr)

var

idx

0 21.939999

1 27.039999

Changing the type and rounding the xarray dataframe

df_xr2 = df_xr.astype(np.float64).round(int(np.ceil(-np.log10(ds['var'].encoding['scale_factor'])))) mask = np.isclose(df_nc['var'], df_xr2['var'], rtol=0, atol=1e-6)

print(mask)

[ True True]

print(df_xr2)

var

idx

0 21.94

1 27.04

````

As you can see, the problem appears early in the process (not related to the way data are stored in parquet later on) and yes, rounding values does solve it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray  343659822
410963647 https://github.com/pydata/xarray/issues/2346#issuecomment-410963647 https://api.github.com/repos/pydata/xarray/issues/2346 MDEyOklzc3VlQ29tbWVudDQxMDk2MzY0Nw== Thomas-Z 1492047 2018-08-07T07:37:06Z 2018-08-07T07:37:06Z CONTRIBUTOR

I was kind of expecting to get the order shown when looking at the dims property but i understand your point and it makes sense.

Two things are still bothering me though: - ds['foo'].to_dataframe() and ds[['foo']].to_dataframe() might have different results - if we want to have a specific order we have to apply reorder_levels and sort_index which can be quite expensive.

For the first point I don't think anything should be done, it's a special case and even if it could be easily tested it might be ugly. For the second point I would not change anything to the way the order is defined now, it's consistent and easily predictable. Instead I would add an additional optional parameter to to_dataframe() (the one from _to_dataframe(ordered_dims)) to allow the user to get the order he wants.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset/DataArray to_dataframe() dimensions order mismatch. 347895055
410675562 https://github.com/pydata/xarray/issues/2304#issuecomment-410675562 https://api.github.com/repos/pydata/xarray/issues/2304 MDEyOklzc3VlQ29tbWVudDQxMDY3NTU2Mg== Thomas-Z 1492047 2018-08-06T11:19:30Z 2018-08-06T11:19:30Z CONTRIBUTOR

You're right when you say

Note that it's very easy to later convert from float32 to float64, e.g., by writing ds.astype(np.float64).

You'll have a float64 in the end but you won't get your precision back and it might be a problem in some case.

I understand the benefits of using float32 on the memory side but it is kind of a problem for us each time we have variables using scale factors.

I'm surprised this issue (if considered as one) does not bother more people.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray  343659822

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.201ms · About: xarray-datasette