home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 350375750

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/1528#issuecomment-350375750 https://api.github.com/repos/pydata/xarray/issues/1528 350375750 MDEyOklzc3VlQ29tbWVudDM1MDM3NTc1MA== 703554 2017-12-08T21:24:45Z 2017-12-08T22:27:47Z CONTRIBUTOR

Just to confirm, if writes are aligned with chunk boundaries in the destination array then no locking is required.

Also if you're going to be moving large datasets into cloud storage and doing distributed computing then it may be worth investigating compressors and compressor options as good compression ratio may make a big difference where network bandwidth may be the limiting factor. I would suggest using the Blosc compressor with cname='zstd'. I would also suggest using shuffle, the Blosc codec in latest numcodecs has an AUTOSHUFFLE option so byte shuffle is used for arrays with >1 byte item size and bit shuffle is used for arrays with 1 byte item size . I would also experiment with compression level (clevel) to see how speed balances against compression ratio. E.g., Blosc(cname='zstd', clevel=5, shuffle=Blosc.AUTOSHUFFLE) may be a good starting point. The default compressor is Blosc(cname='lz4', ...) is more optimised for fast local storage, so speed is very good but compression ratio is moderate, this may not be best for distributed computing.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  253136694
Powered by Datasette · Queries took 6.116ms · About: xarray-datasette