home / github / issues

Menu
  • GraphQL API
  • Search all tables

issues: 1592154849

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1592154849 I_kwDOAMm_X85e5lrh 7542 `OSError: [Errno -70] NetCDF: DAP server error` when `parallel=True` on a cluster 5797727 open 0     1 2023-02-20T16:27:11Z 2023-03-20T17:53:39Z   NONE      

What is your issue?

Hi,

I am trying to access MERRA-2 dataset using opendap links on xarray. The code below is based on a tutorial that @betolink sent me as an example.

The code runs well if parallel=False, but returns OSError: [Errno -70] NetCDF: DAP server error if I set parallel=True, no matter if I create the cluster or not.

@betolink suspected that the workers doesn’t know the authentication and suggested me to do something like mentioned in @rsignell issue.

Which would involve adding client.register_worker_plugin(UploadFile('~/.netrc')) after creating the client. I also tested that but returned the same error. In the code below I had to replace ~/.netrc for the full path because it was returning file not found error.

It is important to say that parallel=True works fine on my local computer using Ubuntu by WSL.

Has anyone faced this problem before or has any guesses on how to solve this issue?

```python

----------------------------------

Import Python modules

----------------------------------

import warnings

warnings.filterwarnings("ignore")

import xarray as xr import matplotlib.pyplot as plt from calendar import monthrange

create_cluster = True parallel = True upload_file = True

if create_cluster: # -------------------------------------- # Creating 50 workers with 1core and 2Gb each # -------------------------------------- import os from dask_jobqueue import SLURMCluster from dask.distributed import Client from dask.distributed import WorkerPlugin

class UploadFile(WorkerPlugin):
    """A WorkerPlugin to upload a local file to workers.
    Parameters
    ----------
    filepath: str
        A path to the file to upload
    Examples
    --------
    >>> client.register_worker_plugin(UploadFile(".env"))
    """
    def __init__(self, filepath):
        """
        Initialize the plugin by reading in the data from the given file.
        """

        self.filename = os.path.basename(filepath)
        self.dirname = os.path.dirname(filepath)
        with open(filepath, "rb") as f:
            self.data = f.read()

    async def setup(self, worker):
        if not os.path.exists(self.dirname):
            os.mkdir(self.dirname)
        os.chdir(self.dirname)
        with open(self.filename, "wb+") as f:
            f.write(self.data)
        return os.listdir()

cluster = SLURMCluster(cores=1, memory="40GB")
cluster.scale(jobs=10)

client = Client(cluster)  # Connect this local process to remote workers
if upload_file:
    client.register_worker_plugin(UploadFile('/home/isimoesdesousa/.netrc'))

---------------------------------

Read data

---------------------------------

MERRA-2 collection (hourly)

collection_shortname = 'M2T1NXAER' collection_longname = 'tavg1_2d_aer_Nx' collection_number = 'MERRA2_400'
MERRA2_version = '5.12.4' year = 2020

Open dataset

Read selected days in the same month and year

month = 1 # January day_beg = 1 day_end = 31

Note that collection_number is MERRA2_401 in a few cases, refer to "Records of MERRA-2 Data Reprocessing and Service Changes"

if year == 2020 and month == 9: collection_number = 'MERRA2_401'

OPeNDAP URL

url = 'https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2/{}.{}/{}/{:0>2d}'.format(collection_shortname, MERRA2_version, year, month) files_month = ['{}/{}.{}.{}{:0>2d}{:0>2d}.nc4'.format(url,collection_number, collection_longname, year, month, days) for days in range(day_beg,day_end+1,1)]

Get the number of files

len_files_month=len(files_month)

Print

print("{} files to be opened:".format(len_files_month)) print("files_month", files_month)

Read dataset URLs

ds = xr.open_mfdataset(files_month, parallel=parallel)

View metadata (function like ncdump -c)

ds ```

As this deals with HPCs, I also posted on pangeo forum https://discourse.pangeo.io/t/access-ges-disc-nasa-dataset-using-xarray-and-dask-on-a-cluster/3195/1

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7542/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.842ms · About: xarray-datasette