html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/4167#issuecomment-647900586,https://api.github.com/repos/pydata/xarray/issues/4167,647900586,MDEyOklzc3VlQ29tbWVudDY0NzkwMDU4Ng==,4338975,2020-06-23T04:31:10Z,2020-06-23T04:31:10Z,NONE,"The warning asked me to set transpose_coords=False to keep the current behavior so my default position is to do that so my code keeps working as is but ffill doesn't take transpose_coords as a parameter or does it in (#3824)? I'm doing this inside a map_blocks call, and have already established a template and if the dims are swapped on the returned object the process fails as it cannot be combined as it doesn't match the expected template, so yes I added a transpose after the fill, but this does make things more complex as depending on how the user has subset the data the dimensions will change so that will require extra code to get the dimensions before and then to transpose after. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,642832962 https://github.com/pydata/xarray/issues/2022#issuecomment-402408267,https://api.github.com/repos/pydata/xarray/issues/2022,402408267,MDEyOklzc3VlQ29tbWVudDQwMjQwODI2Nw==,4338975,2018-07-04T08:41:47Z,2018-07-04T08:41:47Z,NONE,"My use case for this is appending Argo float data to an existing zarr store. At the moment I have 800+ netcdf files that need transforming before they can be added or read by xarray in *.nc type read. At the moment I read the first transform it and add to a zarr sort using .to_zarr. Then I proceed to read the next files and append each variable to zarr using zarr append function. This is probably not a good way to go but all that I could figure at the moment. @shoyer I think it would be useful to have a straight append mode: `to_zarr(....,mode='a+')` ","{""total_count"": 5, ""+1"": 5, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,309227775 https://github.com/pydata/xarray/issues/2265#issuecomment-402327882,https://api.github.com/repos/pydata/xarray/issues/2265,402327882,MDEyOklzc3VlQ29tbWVudDQwMjMyNzg4Mg==,4338975,2018-07-04T00:25:33Z,2018-07-04T00:25:33Z,NONE,@jhamman thanks I'll add to the discussion there and close this issue.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337733183 https://github.com/pydata/xarray/issues/2265#issuecomment-402155326,https://api.github.com/repos/pydata/xarray/issues/2265,402155326,MDEyOklzc3VlQ29tbWVudDQwMjE1NTMyNg==,4338975,2018-07-03T13:22:24Z,2018-07-03T13:24:04Z,NONE,"@spencerkclark yes that helps very much and a great example of how to answer a question! I'm learning so much from this group. Is there a way of appending an xarray dataset onto an existing zarr array? That's why I've been accessing direct through zarr, what I'm trying to do is build a zarr file of all the Argo float profiles and add new ones as they arrive.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,337733183 https://github.com/pydata/xarray/issues/2256#issuecomment-401745899,https://api.github.com/repos/pydata/xarray/issues/2256,401745899,MDEyOklzc3VlQ29tbWVudDQwMTc0NTg5OQ==,4338975,2018-07-02T10:03:36Z,2018-07-02T10:03:36Z,NONE,"As an update chunking could be improved, I've crunched over 800 floats into the structure with 140k profiles and even though the levels are expanded to 3000 (way over kill) the space on disk is 1/3 the original size and could be less than 1/4 if chunking was set nicely to prevent super small file sizes. I can now just access any profile by an index I might be happy!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,336458472 https://github.com/pydata/xarray/issues/2256#issuecomment-401728326,https://api.github.com/repos/pydata/xarray/issues/2256,401728326,MDEyOklzc3VlQ29tbWVudDQwMTcyODMyNg==,4338975,2018-07-02T09:18:08Z,2018-07-02T09:19:18Z,NONE,"@rabernat thanks so far for all the help. So if pickle is not the way forward then I need to resize casts so they all have same dimensions. So I came up with the following code: ``` def expand_levels(dataset,maxlevel=1500): newds = xr.Dataset() blankstack = np.empty((dataset.N_PROF.size,maxlevel-dataset.N_LEVELS.size)) blankstack[:] = np.nan newds['N_PROF'] = dataset.N_PROF.values; newds['N_LEVELS'] = np.arange(maxlevel).astype('int64') newds['N_PARAM'] = dataset.N_PARAM newds['N_CALIB'] = dataset.N_CALIB for varname, da in dataset.data_vars.items(): if 'N_PROF' in da.dims: if 'N_LEVELS' in da.dims: newds[varname] = xr.DataArray(np.hstack((da.values, blankstack)), dims=da.dims, name=da.name, attrs=da.attrs) elif 'N_HISTORY' not in da.dims: newds[varname] = da newds.attrs = dataset.attrs return newds def append_to_zarr(dataset,zarrfile): for varname, da in dataset.data_vars.items(): zarrfile[varname].append(da.values) files =list(glob.iglob(r'D:\argo\csiro\*\*_prof.nc', recursive=True)) expand_levels(xr.open_dataset(files[0]),3000).to_zarr(r'D:\argo\argo.zarr',mode='w') za =zarr.open(r'D:\argo\argo.zarr',mode='w+') for f in files[1:]: print(f) append_to_zarr(expand_levels(xr.open_dataset(f), 3000),za) ``` This basically appends nan on the end of the profiles to get them all the same length. Then I append them into the zarr structure. This is very experimental I just wanted to see how appending them all to big arrays would work. It might be better to save a resized netcdf and then open them all at once and do a to_zarr?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,336458472 https://github.com/pydata/xarray/issues/2256#issuecomment-401195638,https://api.github.com/repos/pydata/xarray/issues/2256,401195638,MDEyOklzc3VlQ29tbWVudDQwMTE5NTYzOA==,4338975,2018-06-28T22:46:32Z,2018-06-28T22:47:09Z,NONE,"Yes I agree Zarr is best for large arrays etc. that's kid of why I ended up on the array of xray objects idea. I guess that was sort of creating an object store in zarr. What I'd like to offer is a simple set of analytical tools based on jupyter allowing for easy processing of float data, getting away from the download and process pattern. I'm still trying to find the best way to do this as Argo data does not neatly fall into any one system because of it's lack of homogeneity ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,336458472 https://github.com/pydata/xarray/issues/2256#issuecomment-400910725,https://api.github.com/repos/pydata/xarray/issues/2256,400910725,MDEyOklzc3VlQ29tbWVudDQwMDkxMDcyNQ==,4338975,2018-06-28T04:56:48Z,2018-06-28T04:57:33Z,NONE,@jhamman Ah thanks for that it looks interesting. Is there a way a specifying in the .to_zarr()? ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,336458472 https://github.com/pydata/xarray/issues/2256#issuecomment-400909462,https://api.github.com/repos/pydata/xarray/issues/2256,400909462,MDEyOklzc3VlQ29tbWVudDQwMDkwOTQ2Mg==,4338975,2018-06-28T04:46:26Z,2018-06-28T04:46:26Z,NONE,"> I am still confused about what you are trying to achieve. What do you mean by ""cache""? Is your goal to compress the data so that it uses less space on disk? Or is it to provide a more ""analysis ready"" format? I'd like to have both ;)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,336458472 https://github.com/pydata/xarray/issues/2256#issuecomment-400908763,https://api.github.com/repos/pydata/xarray/issues/2256,400908763,MDEyOklzc3VlQ29tbWVudDQwMDkwODc2Mw==,4338975,2018-06-28T04:40:29Z,2018-06-28T04:40:29Z,NONE,"Now worries, at the moment I'm in play mode, everything is new to me pretty much! Ok the aim of this little set up is to be able to do things like compare floats with those nearby or create a climatology for a local area from Argo profiles. for example produce a report for every operational Argo float each cycle and feed that to some kind of AI/ML system to detect bad data in near real time. So initially I need a platform that I can easily data mine historical floats. Now with the pickle solution the entire data set can be accessed with a very small foot print. Why zarr? I seem to remember reading that reading/writing to from HFD5 was limited when compression was turned on. Plus I like the way zarr does things it looks a lot more fault tolerant Keep asking the questions they are very valuable Are you going to the pangeo meeting? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,336458472 https://github.com/pydata/xarray/issues/2256#issuecomment-400906158,https://api.github.com/repos/pydata/xarray/issues/2256,400906158,MDEyOklzc3VlQ29tbWVudDQwMDkwNjE1OA==,4338975,2018-06-28T04:20:28Z,2018-06-28T04:20:28Z,NONE,"With the Pickle solution I end up with 31 files in 3 folders with a size on disk of 1.2 MB storing 250 profiles of a single float I'm new to github and opensource! Thanks for the time and edit!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,336458472 https://github.com/pydata/xarray/issues/2256#issuecomment-400905262,https://api.github.com/repos/pydata/xarray/issues/2256,400905262,MDEyOklzc3VlQ29tbWVudDQwMDkwNTI2Mg==,4338975,2018-06-28T04:12:47Z,2018-06-28T04:18:07Z,NONE,"Yes I agree with you I started out with the ds.to_zarr for each file, the problem was that each property of the cycle e.g. lat and long ended up in it's own file. one float with 250 cycles ended up with over 70,000 small files one my file system, because of cluster size they occupied over 100meg of hard disk. as there are over 4000 floats lots of small files are not going to be viable. `cycles[int(ds.CYCLE_NUMBER.values[0])-1]=ds` Yep this line is funny. CYCLE_NUMBER increments up with each cycle and starts at 1. Sometimes a cycle might be delayed and added at a later date, so did not want to make the assumption that the list of files had been sorted into the order of the float cycles, so instead I want to build an array of cycles in order. Also if a file is replaced by a newer version then I want it to overwrite the profile in the array ``` Dimensions: (N_CALIB: 1, N_HISTORY: 9, N_LEVELS: 69, N_PARAM: 3, N_PROF: 1) Dimensions without coordinates: N_CALIB, N_HISTORY, N_LEVELS, N_PARAM, N_PROF Data variables: DATA_TYPE object ... FORMAT_VERSION object ... HANDBOOK_VERSION object ... REFERENCE_DATE_TIME object ... DATE_CREATION object ... DATE_UPDATE object ... PLATFORM_NUMBER (N_PROF) object ... PROJECT_NAME (N_PROF) object ... PI_NAME (N_PROF) object ... STATION_PARAMETERS (N_PROF, N_PARAM) object ... CYCLE_NUMBER (N_PROF) float64 ... DIRECTION (N_PROF) object ... DATA_CENTRE (N_PROF) object ... DC_REFERENCE (N_PROF) object ... DATA_STATE_INDICATOR (N_PROF) object ... DATA_MODE (N_PROF) object ... PLATFORM_TYPE (N_PROF) object ... FLOAT_SERIAL_NO (N_PROF) object ... FIRMWARE_VERSION (N_PROF) object ... WMO_INST_TYPE (N_PROF) object ... JULD (N_PROF) datetime64[ns] ... JULD_QC (N_PROF) object ... JULD_LOCATION (N_PROF) datetime64[ns] ... LATITUDE (N_PROF) float64 ... LONGITUDE (N_PROF) float64 ... POSITION_QC (N_PROF) object ... POSITIONING_SYSTEM (N_PROF) object ... PROFILE_PRES_QC (N_PROF) object ... PROFILE_TEMP_QC (N_PROF) object ... PROFILE_PSAL_QC (N_PROF) object ... VERTICAL_SAMPLING_SCHEME (N_PROF) object ... CONFIG_MISSION_NUMBER (N_PROF) float64 ... PRES (N_PROF, N_LEVELS) float32 ... PRES_QC (N_PROF, N_LEVELS) object ... PRES_ADJUSTED (N_PROF, N_LEVELS) float32 ... PRES_ADJUSTED_QC (N_PROF, N_LEVELS) object ... TEMP (N_PROF, N_LEVELS) float32 ... TEMP_QC (N_PROF, N_LEVELS) object ... TEMP_ADJUSTED (N_PROF, N_LEVELS) float32 ... TEMP_ADJUSTED_QC (N_PROF, N_LEVELS) object ... PSAL (N_PROF, N_LEVELS) float32 ... PSAL_QC (N_PROF, N_LEVELS) object ... PSAL_ADJUSTED (N_PROF, N_LEVELS) float32 ... PSAL_ADJUSTED_QC (N_PROF, N_LEVELS) object ... PRES_ADJUSTED_ERROR (N_PROF, N_LEVELS) float32 ... TEMP_ADJUSTED_ERROR (N_PROF, N_LEVELS) float32 ... PSAL_ADJUSTED_ERROR (N_PROF, N_LEVELS) float32 ... PARAMETER (N_PROF, N_CALIB, N_PARAM) object ... SCIENTIFIC_CALIB_EQUATION (N_PROF, N_CALIB, N_PARAM) object ... SCIENTIFIC_CALIB_COEFFICIENT (N_PROF, N_CALIB, N_PARAM) object ... SCIENTIFIC_CALIB_COMMENT (N_PROF, N_CALIB, N_PARAM) object ... SCIENTIFIC_CALIB_DATE (N_PROF, N_CALIB, N_PARAM) object ... HISTORY_INSTITUTION (N_HISTORY, N_PROF) object ... HISTORY_STEP (N_HISTORY, N_PROF) object ... HISTORY_SOFTWARE (N_HISTORY, N_PROF) object ... HISTORY_SOFTWARE_RELEASE (N_HISTORY, N_PROF) object ... HISTORY_REFERENCE (N_HISTORY, N_PROF) object ... HISTORY_DATE (N_HISTORY, N_PROF) object ... HISTORY_ACTION (N_HISTORY, N_PROF) object ... HISTORY_PARAMETER (N_HISTORY, N_PROF) object ... HISTORY_START_PRES (N_HISTORY, N_PROF) float32 ... HISTORY_STOP_PRES (N_HISTORY, N_PROF) float32 ... HISTORY_PREVIOUS_VALUE (N_HISTORY, N_PROF) float32 ... HISTORY_QCTEST (N_HISTORY, N_PROF) object ... Attributes: title: Argo float vertical profile institution: CSIRO source: Argo float history: 2013-07-30T09:13:35Z creation;2014-08-18T19:33:14Z ... references: http://www.argodatamgt.org/Documentation user_manual_version: 3.1 Conventions: Argo-3.1 CF-1.6 featureType: trajectoryProfile ``` A single float file end up with 194 small files in 68 directories total size 30.4 KB (31,223 bytes) but size on disk 776 KB (794,624 bytes) I have tried `ds = xr.open_mfdataset(r""C:\Users\mor582\Documents\projects\argo\D1901324\*_*.nc"")` but fails with: `ValueError: arguments without labels along dimension 'N_HISTORY' cannot be aligned because they have different dimension sizes: {9, 11, 6}`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,336458472 https://github.com/pydata/xarray/issues/2256#issuecomment-400901163,https://api.github.com/repos/pydata/xarray/issues/2256,400901163,MDEyOklzc3VlQ29tbWVudDQwMDkwMTE2Mw==,4338975,2018-06-28T03:41:10Z,2018-06-28T03:41:10Z,NONE,"Thanks yep my goal is to provide a simple online notebook that can be used to process/qa/qc Argo float data. I'd like to create system that works intuitively with the the current file structure and not build a database of values on the top of them. here's a first go with some code ``` def processfloat(floatpath,zarrpath): root = zarr.open(zarrpath, mode='a') filenames = glob.glob(floatpath) for file in filenames: ds = xr.open_dataset(file) platform = ds.PLATFORM_NUMBER.values[0].strip() float =root.get(platform) if float==None: float = root.create_group(platform) cycles = float.get('cycles') if cycles == None: cycles = float.zeros('cycles', shape=1, chunks=10, dtype=object, object_codec=numcodecs.Pickle()) while len(cycles)