home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where user = 23199378 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 8

  • to_dataframe (pandas) usage question 3
  • Choose time units in output netcdf 2
  • Set a default _FillValue of NaN for float types 1
  • Request: implement unsigned integer type for xarray resample skipna 1
  • Need better user control of _FillValue attribute in NetCDF files 1
  • Resample / upsample behavior diverges from pandas 1
  • The name of the conda environment in the contributing guide is generic 1
  • In the contribution instructions, the py36.yml fails to set up 1

user 1

  • mmartini-usgs · 11 ✖

author_association 1

  • NONE 11
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
512590460 https://github.com/pydata/xarray/issues/3109#issuecomment-512590460 https://api.github.com/repos/pydata/xarray/issues/3109 MDEyOklzc3VlQ29tbWVudDUxMjU5MDQ2MA== mmartini-usgs 23199378 2019-07-17T22:21:39Z 2019-07-17T22:21:39Z NONE

Hello @shoyer, I hit this following your most excellent "how to contribute" page during the SciPy sprints, and it turns out to be a function of my local setup - so isn't really an issue. Others in the room did not have trouble with it. @rabernat explained the situation. So perhaps close this and leave the explanation for others that might hit it. The py37.yml build worked for me on Windows.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  In the contribution instructions, the py36.yml fails to set up 467736580
511133949 https://github.com/pydata/xarray/issues/3107#issuecomment-511133949 https://api.github.com/repos/pydata/xarray/issues/3107 MDEyOklzc3VlQ29tbWVudDUxMTEzMzk0OQ== mmartini-usgs 23199378 2019-07-13T16:10:58Z 2019-07-13T16:10:58Z NONE

seconded the name xarray_test_env since you do use xarray_docs for the docs environment

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  The name of the conda environment in the contributing guide is generic 467735590
340535370 https://github.com/pydata/xarray/issues/1631#issuecomment-340535370 https://api.github.com/repos/pydata/xarray/issues/1631 MDEyOklzc3VlQ29tbWVudDM0MDUzNTM3MA== mmartini-usgs 23199378 2017-10-30T18:11:58Z 2017-10-30T18:11:58Z NONE

Thanks for posting this @jhamman. It's really helping me understand what is going on with my data when I use xarray. My understanding of Pandas is that it should not by default be interpolating - however I am downsampling and this is stated for upsampling (in Python for Data Analysis).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Resample / upsample behavior diverges from pandas  265056503
332942206 https://github.com/pydata/xarray/issues/1598#issuecomment-332942206 https://api.github.com/repos/pydata/xarray/issues/1598 MDEyOklzc3VlQ29tbWVudDMzMjk0MjIwNg== mmartini-usgs 23199378 2017-09-28T19:38:42Z 2017-09-28T19:38:42Z NONE

There is also the philosophical problem of fill values for coordinate variables. To be true to reality, one really would want to add an interpolated value that fills whatever gap or bad value exists. That seems to be out of the scope of xarray though.

I'm fine with a flag that controls only the coordinate data. That said, for the rest of the variables, we avoid NaN in _FillValue. We use 1E35. So there you could give the user a choice in default fill value. It seems pythonic to give the user flexibility. And the minute you satisfy us, there will be another use case that comes along with conflicting requirements. So you could use a flag and make it the user's choice, and not xarray's concern.

It also depends on where in the process one cleans up one's data - reduce first, then QA/QC, or QA/QC first, then reduce. We do both, it depends on the instrument.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Need better user control of _FillValue attribute in NetCDF files 261403591
332833849 https://github.com/pydata/xarray/pull/1165#issuecomment-332833849 https://api.github.com/repos/pydata/xarray/issues/1165 MDEyOklzc3VlQ29tbWVudDMzMjgzMzg0OQ== mmartini-usgs 23199378 2017-09-28T13:20:12Z 2017-09-28T13:20:12Z NONE

It is not desirable for us to have _FillValue = NaN for dimensions and coordinate variables.

In trying to use xarray, _FillValue was carefully kept from these variables and dimensions during the creation of the un-resampled file and then were found to appear during the to_netcdf operation. This happens in spite of mask_and_scale=False is being used with xr.open_dataset

I would hope that downstream code would have trouble with coordinates that don't make logical sense (time or place being NaN, for instance). We would prefer NOT to instantiate coordinate variable data with any fill value. Keeping NaNs out of coordinate variables, dimensions and minima and maxima is part of our QA/QC process to avoid downstream issues.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Set a default _FillValue of NaN for float types 195832230
328127982 https://github.com/pydata/xarray/issues/1562#issuecomment-328127982 https://api.github.com/repos/pydata/xarray/issues/1562 MDEyOklzc3VlQ29tbWVudDMyODEyNzk4Mg== mmartini-usgs 23199378 2017-09-08T15:03:07Z 2017-09-08T15:03:07Z NONE

I'm generating netCDF4 files that have a variety of variables that have values representing things that are integer in nature and can't physically be negative. So to save space and provide as much bandwidth in the variable, I was careful to use float, int, unit types as appropriate. And then could not use resample with skipna because uint was not supported. I have gone back and made the uints into ints, skipna now works on my data, and I might now need to use 64 bit instead of 32 which defeats my efforts to keep the files as small as possible. It seems to me that if skipna is implemented for one or two types it should work for all of them. It's also quite possible I've mistaken something - I'm an experienced programmer in C and MATLAB and very new to python and have learned the lard way that what's obvious in C or C++ is not so in python.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Request: implement unsigned integer type for xarray resample skipna 256243636
326068119 https://github.com/pydata/xarray/issues/1534#issuecomment-326068119 https://api.github.com/repos/pydata/xarray/issues/1534 MDEyOklzc3VlQ29tbWVudDMyNjA2ODExOQ== mmartini-usgs 23199378 2017-08-30T17:50:11Z 2017-08-30T17:50:11Z NONE

Many thanks! I will try this. OK, since you asked for more details:

I have used xarray resample successfully on a file with ~3 million single ping ADCP ensembles, 17 variables of these with 1D and 2D data. Three lines of code in a handful of minutes to reduce that. On a middling laptop. Amazing.

Unintended behaviors from resample that I need to figure out:
-- a time offset is introduced. Hourly mean data is offset by 16 hours in one use case. 16 hours is odd. I may have my time set up wrong. Or it is a start transient from a filter. I've been reading code trying to figure out exactly what kind of algorithm pandas is using in resample (as best I can tell this is where xarray is getting its resample method). -- NaNs are introduced. I think I need to learn how to set mask_and_scale to prevent this.
-- when outputting to netCDF the "time' dimension got added to a variable that didn't and shouldn't have it.

On the menu next to learn/use: -- Multi-indexing and data reshaping -- slicing -- Separating an uneven time base into separate time series -- Calculations involving several of the variables at the same time (e.g. using xarray to perform the ADCP beam to earth rotations)

Where is all this going? Ultimately to produce mean current profile time series for data release (think hourly time-depth shaped dataframes) and bursts of hourly data (think hourly time-depth-sample shaped dataframes) on which to perform a variety of wave analysis.

This is my learn-python project, so apologies for the non-pythonic approach. I also need to preserve backwards compatibility with existing code and conventions (EPIC, historically, CF and thredds, going forward). The project is here: https://github.com/mmartini-usgs/ADCPy

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_dataframe (pandas) usage question 253407851
325773402 https://github.com/pydata/xarray/issues/1534#issuecomment-325773402 https://api.github.com/repos/pydata/xarray/issues/1534 MDEyOklzc3VlQ29tbWVudDMyNTc3MzQwMg== mmartini-usgs 23199378 2017-08-29T19:31:00Z 2017-08-29T19:31:00Z NONE

Hello Ryan,

I have read a bit about dask. Am I missing the Pandas Panel analog in Dask?

My data is in netcdf4, and the files can have as many as 17 variables or more. It's not clear how to get this easily into dask. In Pandas I think the entire netCDF file equates to a Panel. A single variable would be a DataFrame.

Rather than wandering around in the weeds, I could use a hint here. Do I really need to open the netCDF4 file, then iterate over my variables and deal them into a series of dask data frames? That seems very un-pythonic.

I tried this... presumably from here: http://www.unidata.ucar.edu/software/netcdf/docs/interoperability_hdf5.html I can open a netCDF4 file as "HDF5" using dask. Let's try a dask example (http://dask.pydata.org/en/latest/examples/dataframe-hdf5.html) with one of my netCDF files:

df = dd.read_hdf('reallybignetCDF4file.nc',key='/c') # this does not work

Thanks, Marinna

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_dataframe (pandas) usage question 253407851
325458780 https://github.com/pydata/xarray/issues/1534#issuecomment-325458780 https://api.github.com/repos/pydata/xarray/issues/1534 MDEyOklzc3VlQ29tbWVudDMyNTQ1ODc4MA== mmartini-usgs 23199378 2017-08-28T19:45:53Z 2017-08-28T19:45:53Z NONE

Many thanks, I will go learn about dask dataframes. Marinna

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_dataframe (pandas) usage question 253407851
323180681 https://github.com/pydata/xarray/issues/1324#issuecomment-323180681 https://api.github.com/repos/pydata/xarray/issues/1324 MDEyOklzc3VlQ29tbWVudDMyMzE4MDY4MQ== mmartini-usgs 23199378 2017-08-17T20:10:34Z 2017-08-17T20:10:34Z NONE

Thanks - It turns out the workaround is to open the file with decoding_times = False

Is this a bug, or a feature?

In the notebook, those variables called time and time2 are not datetime objects (I demonstrate that). In fact, there are no datetime objects in the Notebook's scope, unless you count the CF time variable that is part of the open Dataset. And yet, time and time2 seem to be treated as such when the dataset is written to a file. Or, is it that the CF time in the Dataset is getting re-encoded somehow in spite of encoding=None? Why would that fail? Would you explain what is going on here?

I have updated my gist with the .info() information here:
https://gist.github.com/anonymous/ff055f732029585605b965f282685d73

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Choose time units in output netcdf 216626776
322853199 https://github.com/pydata/xarray/issues/1324#issuecomment-322853199 https://api.github.com/repos/pydata/xarray/issues/1324 MDEyOklzc3VlQ29tbWVudDMyMjg1MzE5OQ== mmartini-usgs 23199378 2017-08-16T18:06:56Z 2017-08-16T18:06:56Z NONE

I think I have a similar problem.

I have a netCDF file with CF time. I'm trying to add two variables with a different kind of time (EPIC time, time2). These variables should not be datetime. They should be ordinary numbers.

Even when I don't include any units, it chokes on "units". Even with encoding = None ValueError: Failed hard to prevent overwriting key 'units' The gist is here: https://gist.github.com/anonymous/4614dfb55778cd0d51427daf6503cd8c

It's a time specific thing, because if I do the same operation with a different variable in the same file (read a variable, do something with it, then save it) it works fine. One clue is that it still carries the attributes (generally desired behavior) to the new file. But my EPIC time and time2 above, if they NOT datetime and have NO attributes... should work the same way.

This could be my misunderstanding of xarray and time operations, I'm a python newbie, if so I would appreciate someone setting me straight. It's preventing me from using xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Choose time units in output netcdf 216626776

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1033.596ms · About: xarray-datasette