home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

4 rows where state = "open", type = "issue" and user = 5797727 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date)

type 1

  • issue · 4 ✖

state 1

  • open · 4 ✖

repo 1

  • xarray 4
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1169750048 I_kwDOAMm_X85FuPgg 6360 Multidimensional `interpolate_na()` iuryt 5797727 open 0     4 2022-03-15T14:27:46Z 2023-09-28T11:51:20Z   NONE      

Is your feature request related to a problem?

I think that having a way to run a multidimensional interpolation for filling missing values would be awesome.

The code snippet below create a data and show the problem I am having now. If the data has some orientation, we couldn't simply interpolate dimensions separately.

```python import xarray as xr import numpy as np

n = 30 x = xr.DataArray(np.linspace(0,2np.pi,n),dims=['x']) y = xr.DataArray(np.linspace(0,2np.pi,n),dims=['y']) z = (np.sin(x)*xr.ones_like(y))

mask = xr.DataArray(np.random.randint(0,1+1,(n,n)).astype('bool'),dims=['x','y'])

kw = dict(add_colorbar=False)

fig,ax = plt.subplots(1,3,figsize=(11,3)) z.plot(ax=ax[0],kw) z.where(mask).plot(ax=ax[1],kw) z.where(mask).interpolate_na('x').plot(ax=ax[2],**kw) ```

I tried to use advanced interpolation for that, but it doesn't look like the best solution.

```python zs = z.where(mask).stack(k=['x','y']) zs = zs.where(np.isnan(zs),drop=True) xi,yi = zs.k.x.drop('k'),zs.k.y.drop('k') zi = z.interp(x=xi,y=yi)

fig,ax = plt.subplots() z.where(mask).plot(ax=ax,kw) ax.scatter(xi,yi,c=zi,kw,linewidth=1,edgecolor='k') ``` returns

Describe the solution you'd like

Simply z.interpolate_na(['x','y'])

Describe alternatives you've considered

I could extract the data to numpy and interpolate using scipy.interpolate.griddata, but this is not the way xarray should work.

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6360/reactions",
    "total_count": 11,
    "+1": 9,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 2
}
    xarray 13221727 issue
1592154849 I_kwDOAMm_X85e5lrh 7542 `OSError: [Errno -70] NetCDF: DAP server error` when `parallel=True` on a cluster iuryt 5797727 open 0     1 2023-02-20T16:27:11Z 2023-03-20T17:53:39Z   NONE      

What is your issue?

Hi,

I am trying to access MERRA-2 dataset using opendap links on xarray. The code below is based on a tutorial that @betolink sent me as an example.

The code runs well if parallel=False, but returns OSError: [Errno -70] NetCDF: DAP server error if I set parallel=True, no matter if I create the cluster or not.

@betolink suspected that the workers doesn’t know the authentication and suggested me to do something like mentioned in @rsignell issue.

Which would involve adding client.register_worker_plugin(UploadFile('~/.netrc')) after creating the client. I also tested that but returned the same error. In the code below I had to replace ~/.netrc for the full path because it was returning file not found error.

It is important to say that parallel=True works fine on my local computer using Ubuntu by WSL.

Has anyone faced this problem before or has any guesses on how to solve this issue?

```python

----------------------------------

Import Python modules

----------------------------------

import warnings

warnings.filterwarnings("ignore")

import xarray as xr import matplotlib.pyplot as plt from calendar import monthrange

create_cluster = True parallel = True upload_file = True

if create_cluster: # -------------------------------------- # Creating 50 workers with 1core and 2Gb each # -------------------------------------- import os from dask_jobqueue import SLURMCluster from dask.distributed import Client from dask.distributed import WorkerPlugin

class UploadFile(WorkerPlugin):
    """A WorkerPlugin to upload a local file to workers.
    Parameters
    ----------
    filepath: str
        A path to the file to upload
    Examples
    --------
    >>> client.register_worker_plugin(UploadFile(".env"))
    """
    def __init__(self, filepath):
        """
        Initialize the plugin by reading in the data from the given file.
        """

        self.filename = os.path.basename(filepath)
        self.dirname = os.path.dirname(filepath)
        with open(filepath, "rb") as f:
            self.data = f.read()

    async def setup(self, worker):
        if not os.path.exists(self.dirname):
            os.mkdir(self.dirname)
        os.chdir(self.dirname)
        with open(self.filename, "wb+") as f:
            f.write(self.data)
        return os.listdir()

cluster = SLURMCluster(cores=1, memory="40GB")
cluster.scale(jobs=10)

client = Client(cluster)  # Connect this local process to remote workers
if upload_file:
    client.register_worker_plugin(UploadFile('/home/isimoesdesousa/.netrc'))

---------------------------------

Read data

---------------------------------

MERRA-2 collection (hourly)

collection_shortname = 'M2T1NXAER' collection_longname = 'tavg1_2d_aer_Nx' collection_number = 'MERRA2_400'
MERRA2_version = '5.12.4' year = 2020

Open dataset

Read selected days in the same month and year

month = 1 # January day_beg = 1 day_end = 31

Note that collection_number is MERRA2_401 in a few cases, refer to "Records of MERRA-2 Data Reprocessing and Service Changes"

if year == 2020 and month == 9: collection_number = 'MERRA2_401'

OPeNDAP URL

url = 'https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2/{}.{}/{}/{:0>2d}'.format(collection_shortname, MERRA2_version, year, month) files_month = ['{}/{}.{}.{}{:0>2d}{:0>2d}.nc4'.format(url,collection_number, collection_longname, year, month, days) for days in range(day_beg,day_end+1,1)]

Get the number of files

len_files_month=len(files_month)

Print

print("{} files to be opened:".format(len_files_month)) print("files_month", files_month)

Read dataset URLs

ds = xr.open_mfdataset(files_month, parallel=parallel)

View metadata (function like ncdump -c)

ds ```

As this deals with HPCs, I also posted on pangeo forum https://discourse.pangeo.io/t/access-ges-disc-nasa-dataset-using-xarray-and-dask-on-a-cluster/3195/1

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7542/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1340669247 I_kwDOAMm_X85P6P0_ 6922 Support for matplotlib mosaic using variable names iuryt 5797727 open 0     1 2022-08-16T17:31:23Z 2022-08-17T05:35:39Z   NONE      

Is your feature request related to a problem?

This is not related to any problem, but I think it would be nice to have a support for giving a matplotlib mosaic with the keys for the variables you want to plot for different panels and xarray parse that into the figure.

Describe the solution you'd like

Something like

```python import matplotlib.pyplot as plt import xarray as xr import numpy as np

n = 200 t = np.linspace(0,32np.pi,n) ds = xr.Dataset({letter:(("s","t"),np.sin(t)+0.5*np.random.randn(3,n)) for letter in "A B C D E".split()}) ds = ds.assign_coords(t=t,s=range(3))

mosaic = [ ["A","A","B","B","C","C"], ["X","D","D","E","E","X"], ]

kw = dict(x="t",hue="s",add_legend=False) ds.plot.line(mosaic=mosaic,empty_sentinel="X",**kw)

```

Describe alternatives you've considered

I have a code snippet that generate similar results but with more code.

```python import matplotlib.pyplot as plt import xarray as xr import numpy as np

n = 200 t = np.linspace(0,32np.pi,n) ds = xr.Dataset({letter:(("s","t"),np.sin(t)+0.5*np.random.randn(3,n)) for letter in "A B C D E".split()}) ds = ds.assign_coords(t=t,s=range(3))

mosaic = [ ["A","A","B","B","C","C"], ["X","D","D","E","E","X"], ]

kw = dict(x="t",hue="s",add_legend=False) fig = plt.figure(constrained_layout=True,figsize=(8,4)) ax = fig.subplot_mosaic(mosaic,empty_sentinel="X") for key in ds: ds[key].plot.line(ax=ax[key],**kw) ```

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6922/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
782848816 MDU6SXNzdWU3ODI4NDg4MTY= 4787 Pass `locator` argument to matplotlib when calling `plot.contourf` iuryt 5797727 open 0     3 2021-01-10T16:04:26Z 2021-01-11T17:41:52Z   NONE      

Is your feature request related to a problem? Please describe. Everytime I have to do a contourf I need to call it from matplotlib directly because locator argument is not passed from xarray.plot.contourf to matplotlib.

Describe the solution you'd like Being ds a xarray.DataArray, I want to have the behaviour described here when passing locator = ticker.LogLocator() to ds.plot.contourf.

Describe alternatives you've considered I usually have to do plt.contourf(ds.dim_0,ds.dim_1,ds.values,locator = ticker.LogLocator()).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4787/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 50.802ms · About: xarray-datasette