home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

3 rows where repo = 13221727 and user = 12818667 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 2
  • open 1

type 1

  • issue 3

repo 1

  • xarray · 3 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1249638836 I_kwDOAMm_X85Ke_m0 6640 to_zarr fails for large dimensions; sensitive to exact dimension size and chunk size JamiePringle 12818667 closed 0     5 2022-05-26T14:22:20Z 2023-10-14T20:29:50Z 2023-10-14T20:29:49Z NONE      

What happened?

Using dask 2022.05.0, zarr 2.11.3 and xarray 2022.3.0, When creating a large empty dataset and trying to save it in the zarr data format with to_zarr, it fails with the following error. Frankly, I am not sure if the problem is with Xarray or Zarr, but as documented in the attached code, when I create the same dataset with Zarr, it works just fine.

``` File ~/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py:2101, in Array._decode_chunk(self, cdata, start, nitems, expected_shape) 2099 # ensure correct chunk shape 2100 chunk = chunk.reshape(-1, order='A') -> 2101 chunk = chunk.reshape(expected_shape or self._chunks, order=self._order) 2103 return chunk

ValueError: cannot reshape array of size 234506 into shape (235150,) ``` To show that this is not a zarr issue, I have made the same output directly with zarr in the example code below. It is in the "else" clause in the code.

Note well: I have included a value of numberOfDrifters that has the problem, and one that does not. Please see the comments where numberOfDrifters is defined.

What did you expect to happen?

I expected a zarr dataset to be created. I cannot solve the problem with a chunk size of 1 for memory issues. I would prefer to create the zarr dataset with xarray so it has the metadata to be easily loaded into xarray.

Minimal Complete Verifiable Example

```Python from numpy import * import xarray as xr import dask import zarr

dtype=float32 chunkSize=10000 maxNumObs=1

numberOfDrifters=120396431 #2008 This size WORKS

numberOfDrifters=120067029 #2007 This size FAILS

if True, make zarr with xarray

if True: #make xarray data set, then write to zarr coords={'traj':(['traj'],arange(numberOfDrifters)),'obs':(['obs'],arange(maxNumObs))} emptyArray=dask.array.empty(shape=(numberOfDrifters,maxNumObs),dtype=dtype,chunks=(chunkSize,maxNumObs)) var='time' data_vars={} attrs={} data_vars[var]=(['traj','obs'],emptyArray,attrs) dataOut=xr.Dataset(data_vars,coords,{}) print('done defining data set, now writing')

#now save to zarr dataset
dataOut.to_zarr('dataPaths/jnk_makeWithXarray.zarr','w')
print('done writing')
zarrInXarray=zarr.open('dataPaths/jnk_makeWithXarray.zarr','r')
print('done opening')

else: #make with zarr store=zarr.DirectoryStore('dataPaths/jnk_makeWithZarr.zarr') root=zarr.group(store=store) root.empty(shape=(numberOfDrifters,maxNumObs),name='time',dtype=dtype,chunks=(chunkSize,maxNumObs)) print('done writting') zarrInZarr=zarr.open('dataPaths/jnk_makeWithZarr.zarr','r') print('done opening') ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Python Traceback (most recent call last): File "/data/plumehome/pringle/workfiles/oceanparcels/makeCommunityConnectivity/breakXarray.py", line 26, in <module> dataOut.to_zarr('dataPaths/jnk_makeWithXarray.zarr','w') File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/core/dataset.py", line 2036, in to_zarr return to_zarr( File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/api.py", line 1431, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/api.py", line 1119, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/zarr.py", line 534, in store self.set_variables( File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/zarr.py", line 613, in set_variables writer.add(v.data, zarr_array, region) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/xarray/backends/common.py", line 154, in add target[region] = source File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1285, in __setitem__ self.set_basic_selection(pure_selection, value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1380, in set_basic_selection return self._set_basic_selection_nd(selection, value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1680, in _set_basic_selection_nd self._set_selection(indexer, value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1732, in _set_selection self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1994, in _chunk_setitem self._chunk_setitem_nosync(chunk_coords, chunk_selection, value, File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 1999, in _chunk_setitem_nosync cdata = self._process_for_setitem(ckey, chunk_selection, value, fields=fields) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 2049, in _process_for_setitem chunk = self._decode_chunk(cdata) File "/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib/python3.9/site-packages/zarr/core.py", line 2101, in _decode_chunk chunk = chunk.reshape(expected_shape or self._chunks, order=self._order) ValueError: cannot reshape array of size 234506 into shape (235150,)

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:22:55) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.13.0-41-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.20.3 scipy: 1.8.0 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.11.3 cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.05.0 distributed: 2022.5.0 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: None numbagg: None fsspec: 2022.02.0 cupy: None pint: None sparse: None setuptools: 61.2.0 pip: 22.0.4 conda: None pytest: 7.1.1 IPython: 8.2.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6640/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1371466778 I_kwDOAMm_X85Rvuwa 7028 .to_zarr() or .to_netcdf slow and uses excess memory when datetime64[ns] variable in output; a reproducible example JamiePringle 12818667 closed 0     3 2022-09-13T13:32:29Z 2022-11-03T15:40:53Z 2022-11-03T15:40:52Z NONE      

What happened?

This bug report is a reproducible example with code of an issue that may be in #7018, #2912 and other bug reports reporting slow performance and memory exhaustion when using .to_zarr() or .to_netcdf(). I think this has been hard to track down because it only occurs for large data sets. I have included code that replicates the problem without the need for downloading a large dataset.

The problem is that saving a xarray dataset which includes a variable with type datetime64[ns] is several orders of magnitude slower (!!) and uses a great deal of memory (!!) relative to the same dataset where that variable has another type. The work around is obvious -- turn off time decoding and treat time as a float64. But this is in-elegant, and I think this problem has lead to many un-answered questions on the issues page, such as the one above.

If I save a dataset whose structure (based on my use case, the ocean-parcels Lagrangian particle tracker) is: <xarray.Dataset> Dimensions: (trajectory: 953536, obs: 245) Dimensions without coordinates: trajectory, obs Data variables: time (trajectory, obs) float32 dask.array<chunksize=(50000, 10), meta=np.ndarray> age (trajectory, obs) float32 dask.array<chunksize=(50000, 10), meta=np.ndarray> lat (trajectory, obs) float32 dask.array<chunksize=(50000, 10), meta=np.ndarray> lon (trajectory, obs) float32 dask.array<chunksize=(50000, 10), meta=np.ndarray> z (trajectory, obs) float32 dask.array<chunksize=(50000, 10), meta=np.ndarray> I have no problems, even if the data set is much larger than the machine's memory. However, if I change the time variable to have the data type datetime64[ns] <xarray.Dataset> Dimensions: (trajectory: 953536, obs: 245) Dimensions without coordinates: trajectory, obs Data variables: time (trajectory, obs) datetime64[ns] dask.array<chunksize=(50000, 10), meta=np.ndarray> age (trajectory, obs) float32 dask.array<chunksize=(50000, 10), meta=np.ndarray> lat (trajectory, obs) float32 dask.array<chunksize=(50000, 10), meta=np.ndarray> lon (trajectory, obs) float32 dask.array<chunksize=(50000, 10), meta=np.ndarray> z (trajectory, obs) float32 dask.array<chunksize=(50000, 10), meta=np.ndarray> the time it takes to write this dataSet becomes much greater, and increases much more quickly with an increase in the "trajectory" coordinate then the case where time has type "float64." The increase in time is NOT in the writing time, but in the time it takes to compute the dask graph. At the same time the time to compute the graph increases, the memory usage increases, finally leading to memory exhaustion as the data set gets larger. This can be seen in the attached figure, which shows the time to create the graph with dataOut.to_zarr(outputDir,compute=False) and the time to write the data with delayedObj.compute(). By the time the data set is 10 million long in the first dimension, the dataset with datetime64[ns] takes 4 orders of magnitude longer (!!!) to compute -- hours instead of seconds!

To recreate this graph, and to see a very simple code that replicates this problem, see the attached python code. Note that the directory you run it in should have at least 30Gb free for the data set it writes, and for machines with less than 256Gb of memory, it will crash before completing after exhausting the memory. However, the last figure will be saved in jnk_out.png, and you can always change the largest size it attempts to create.

SmallestExample_zarrOutProblem.zip

What did you expect to happen?

I expect that the time to save a dataset with .to_zarr or .to_netcdf does not change dramatically if one of the variables has a datetime64[ns] type.

Minimal Complete Verifiable Example

```Python

this code is also included as a zip file above.

import xarray as xr from pylab import * from numpy import *

from glob import glob

from os import path

import time import dask from dask.diagnostics import ProgressBar import shutil import pickle

this is a minimal code that illustrates issue with .to_zarr() or .to_netcdf when writing a dataset with datetime64 data

outputDir is the name of the zarr output; it should be set to a location on a fast filesystem with enough space

outputDir='./testOut.zarr'

def testToZarr(dimensions,haveTimeType=True): '''This code writes out an empty dataset with the dimensions specified in the "dimensions" arguement, and returns the time it took to create the dask delayed object and the time it took to compute the delayed object.

if haveTimeType is True, the "time" variable has type datetime64
'''

if haveTimeType:
    #specify the type of variables to be written out. Each has dimensions (trajectory,obs)
    varType={'time':dtype('datetime64[ns]'),
             'age':dtype('float32'),
             'lat':dtype('float32'),
             'lon':dtype('float32'),
             'z':dtype('float32'),
             }
else:
    varType={'time':dtype('float32'),
             'age':dtype('float32'),
             'lat':dtype('float32'),
             'lon':dtype('float32'),
             'z':dtype('float32'),
             }


#now make an empty dataset 
dataOut=xr.Dataset()

#now add the empty variables
for v in varType:
    vEmpty=dask.array.zeros((dimensions['trajectory'],dimensions['obs']),dtype=varType[v])
    dataOut=dataOut.assign({v:(('trajectory','obs'),vEmpty)})

#chunk data
chunksize={'trajectory':5*int(1e4),'obs':10}
print('   chunking dataset to',chunksize)
dataOut=dataOut.chunk(chunksize)

#create dask delayed object, and time how long it took
tic=time.time()
if True: #write to zarr
    delayedObj=dataOut.to_zarr(outputDir,compute=False)
else: #write to netCDF
    delayedObj=dataOut.to_netcdf(outputDir,compute=False)
createGraphTime=time.time()-tic
print('   created graph in',createGraphTime)

#execute the delayed object, and see how long it took. Use progress bar
tic=time.time()
with ProgressBar():
    results=delayedObj.compute()
writeOutTime=time.time()-tic
print('   wrote data in',writeOutTime)

return createGraphTime,writeOutTime,dataOut

now lets do some benchmarking

if name == "main": figure(1,figsize=(10.0,8.0)) clf() style.use('ggplot')

#make a vector that is the size of the first dimension
#("trajectory") of the data set
trajectoryVec=logspace(log10(1000),log10(1.4e8),20).astype(int)

#pre-allocate variables to store results
createGraphTimeVec_time=0.0*trajectoryVec+nan
writeOutTimeVec_time=0.0*trajectoryVec+nan

createGraphTimeVec_noTime=0.0*trajectoryVec+nan
writeOutTimeVec_noTime=0.0*trajectoryVec+nan

#get data for various array sizes
for n in range(len(trajectoryVec)):

    #write out data, and benchmark time
    dimensions={'trajectory':trajectoryVec[n],'obs':245}
    print('starting to write file of dimensions',dimensions,'for case with time variable')

    shutil.rmtree(outputDir,ignore_errors=True)
    createGraphTimeVec_time[n],writeOutTimeVec_time[n],dataOut_time=testToZarr(dimensions,haveTimeType=True)

    shutil.rmtree(outputDir,ignore_errors=True)
    createGraphTimeVec_noTime[n],writeOutTimeVec_noTime[n],dataOut_noTime=testToZarr(dimensions,haveTimeType=False)

    print('   done')

    #now plot output
    clf()

    subplot(2,1,1)
    loglog(trajectoryVec,createGraphTimeVec_time,'r-o',label='with datetime64 variable')
    loglog(trajectoryVec,createGraphTimeVec_noTime,'b-*',label='without datetime64 variable')
    #xlabel('size of first dimensions')
    ylabel('seconds')
    title('time to create dask graph',fontsize='medium')
    legend()
    firstAx=axis()

    subplot(2,1,2)
    loglog(trajectoryVec,writeOutTimeVec_time,'r-o',label='with datetime64 variable')
    loglog(trajectoryVec,writeOutTimeVec_noTime,'b-*',label='without datetime64 variable')
    axis(xmin=firstAx[0],xmax=firstAx[1])
    xlabel('size of first dimensions')
    ylabel('seconds')
    title('time to write data to disk',fontsize='medium')
    legend()

    draw()
    show()
    pause(0.01)

    #save the figure each time, since this code can crash as the
    #size of the dataset gets larger and the dataset with
    #datetime64[ns] causes .to_zarr() to exhaust all the memory
    savefig('jnk_out.png',dpi=100)

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

Note -- I see the same thing on my linux machine

INSTALLED VERSIONS ------------------ commit: None python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:41:22) [Clang 13.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.6.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.23.2 scipy: 1.9.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.12.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.8.1 distributed: 2022.8.1 matplotlib: 3.5.3 cartopy: 0.20.3 seaborn: None numbagg: None fsspec: 2022.7.1 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.3.0 pip: 22.2.2 conda: None pytest: None IPython: 8.4.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7028/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1358960570 I_kwDOAMm_X85RABe6 6976 dataset.sel inconsistent results when argument is a list or a slice. JamiePringle 12818667 open 0     5 2022-09-01T14:40:34Z 2022-09-02T14:13:27Z   NONE      

What happened?

I am not sure if what I report is a bug; however, it is certainly not what I expect from a careful reading of the documentation, and I wonder if it is leading to some issues I describe below.

I am working with a large dataset produced by merging the output of runs made by different MPI processes. There are two coordinates, ("trajectory","obs"). All of the "obs" in the dataset are in order, but the "trajectory" coordinate is not in order. I made a smaller dataset that illustrates the issue below by reducing the number of "obs" from 250 to 2; this dataset can be found at http://oxbow.sr.unh.edu/data/smallExample.zarr.zip . This dataset looks like: Dimensions: (trajectory: 39363539, obs: 2) Coordinates: * obs (obs) int32 0 1 * trajectory (trajectory) int64 100 210 227 ... 39363210 39363255 39363379 Data variables: age (trajectory, obs) float32 dask.array<chunksize=(50000, 2), meta=np.ndarray> lat (trajectory, obs) float64 dask.array<chunksize=(50000, 2), meta=np.ndarray> lon (trajectory, obs) float64 dask.array<chunksize=(50000, 2), meta=np.ndarray> time (trajectory, obs) datetime64[ns] dask.array<chunksize=(40625, 2), meta=np.ndarray> z (trajectory, obs) float64 dask.array<chunksize=(50000, 2), meta=np.ndarray> Note that the trajectory coordinate is not in order; this is due to how the problem is partitioned into MPI jobs.

If I want an ordered set of trajectories, say trajectories [1,2,3,4,5,6,7,8,9,10], and I do this with subSet=dataIn.sel(trajectory=arange(1,11)), I get what I would expect from the documentation: The first through 10th trajectories, in order: <xarray.Dataset> Dimensions: (trajectory: 10, obs: 2) Coordinates: * obs (obs) int32 0 1 * trajectory (trajectory) int64 1 2 3 4 5 6 7 8 9 10 Data variables: age (trajectory, obs) float32 dask.array<chunksize=(1, 2), meta=np.ndarray> ....

But if I use the slice operator to specify what I want, I get something very different: dataIn.sel(trajectory=slice(1,11)) returns 2567339 trajectories, starting with the location of trajectory coordinate 1 in dataIn and extending to the location of trajectory coordinate 10 in dataIn: <xarray.Dataset> Dimensions: (trajectory: 2567339, obs: 2) Coordinates: * obs (obs) int32 0 1 * trajectory (trajectory) int64 1 27 57 59 ... 39363486 39363495 39363528 11 Data variables: age (trajectory, obs) float32 dask.array<chunksize=(17944, 2), meta=np.ndarray> ... This is not what I expect -- as I understand the documentation, .sel should work in coordinate space, and I would expect dataIn.sel(trajectory=slice(1,11)) and subSet=dataIn.sel(trajectory=arange(1,11)) to return the same thing. If I am wrong in this interpretation, perhaps a documentation update would be helpful.

I have had all sorts of issues with the full dataset, including .to_zarr(dataset, compute=False) failing to return a delayedObject because it used all the memory, .sortby(['trajectory']) failing by memory exhaustion, etc. I wonder if the issue reported here can be at the root of many of these issues? On a side note, the failure of .sortby(['trajectory']) makes re-ordering the dataset difficult, and I would be happy to hear any suggestions on that front.

What did you expect to happen?

See above for full descriptions

Minimal Complete Verifiable Example

```Python

get data from http://oxbow.sr.unh.edu/data/smallExample.zarr.zip

dataIn=xr.open_zarr('smallExample.zarr') print(dataIn.sel(trajectory=arange(1,11))) print(dataIn.sel(trajectory=slice(1,11))) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.15.0-46-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1 xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.23.2 scipy: 1.9.0 netCDF4: 1.6.0 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.12.0 cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.8.1 distributed: 2022.8.1 matplotlib: 3.5.3 cartopy: 0.20.3 seaborn: None numbagg: None fsspec: 2022.7.1 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.2.0 pip: 22.2.2 conda: None pytest: 7.1.2 IPython: 8.4.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6976/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 83.66ms · About: xarray-datasette