issues: 1592154849
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1592154849 | I_kwDOAMm_X85e5lrh | 7542 | `OSError: [Errno -70] NetCDF: DAP server error` when `parallel=True` on a cluster | 5797727 | open | 0 | 1 | 2023-02-20T16:27:11Z | 2023-03-20T17:53:39Z | NONE | What is your issue?Hi, I am trying to access MERRA-2 dataset using The code runs well if @betolink suspected that the workers doesn’t know the authentication and suggested me to do something like mentioned in @rsignell issue. Which would involve adding It is important to say that Has anyone faced this problem before or has any guesses on how to solve this issue? ```python ----------------------------------Import Python modules----------------------------------import warnings warnings.filterwarnings("ignore") import xarray as xr import matplotlib.pyplot as plt from calendar import monthrange create_cluster = True parallel = True upload_file = True if create_cluster: # -------------------------------------- # Creating 50 workers with 1core and 2Gb each # -------------------------------------- import os from dask_jobqueue import SLURMCluster from dask.distributed import Client from dask.distributed import WorkerPlugin
---------------------------------Read data---------------------------------MERRA-2 collection (hourly)collection_shortname = 'M2T1NXAER'
collection_longname = 'tavg1_2d_aer_Nx'
collection_number = 'MERRA2_400' Open datasetRead selected days in the same month and yearmonth = 1 # January day_beg = 1 day_end = 31 Note that collection_number is MERRA2_401 in a few cases, refer to "Records of MERRA-2 Data Reprocessing and Service Changes"if year == 2020 and month == 9: collection_number = 'MERRA2_401' OPeNDAP URLurl = 'https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2/{}.{}/{}/{:0>2d}'.format(collection_shortname, MERRA2_version, year, month) files_month = ['{}/{}.{}.{}{:0>2d}{:0>2d}.nc4'.format(url,collection_number, collection_longname, year, month, days) for days in range(day_beg,day_end+1,1)] Get the number of fileslen_files_month=len(files_month) print("{} files to be opened:".format(len_files_month)) print("files_month", files_month) Read dataset URLsds = xr.open_mfdataset(files_month, parallel=parallel) View metadata (function like ncdump -c)ds ``` As this deals with HPCs, I also posted on pangeo forum https://discourse.pangeo.io/t/access-ges-disc-nasa-dataset-using-xarray-and-dask-on-a-cluster/3195/1 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7542/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |