issue_comments
17 rows where user = 6130352 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: issue_url, reactions, created_at (date), updated_at (date)
user 1
- eric-czech · 17 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
839846969 | https://github.com/pydata/xarray/issues/5286#issuecomment-839846969 | https://api.github.com/repos/pydata/xarray/issues/5286 | MDEyOklzc3VlQ29tbWVudDgzOTg0Njk2OQ== | eric-czech 6130352 | 2021-05-12T15:04:26Z | 2021-05-12T15:04:26Z | NONE | Thanks @shoyer, good to know! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Zarr chunks would overlap multiple dask chunks error 884209406 | |
832659229 | https://github.com/pydata/xarray/issues/5261#issuecomment-832659229 | https://api.github.com/repos/pydata/xarray/issues/5261 | MDEyOklzc3VlQ29tbWVudDgzMjY1OTIyOQ== | eric-czech 6130352 | 2021-05-05T12:47:43Z | 2021-05-05T12:47:43Z | NONE |
Makes sense. It is a somewhat awkward distinction to teach though to those who wouldn't appreciate the __array_ufunc__ protocol compliance, especially since most other functionality we appeal to like reductions (e.g. max, min, sum), concat, merge, indexing, filtering, etc. comes through the Xarray APIs alone. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Export ufuncs from DataArray API 876394165 | |
828726383 | https://github.com/pydata/xarray/issues/5229#issuecomment-828726383 | https://api.github.com/repos/pydata/xarray/issues/5229 | MDEyOklzc3VlQ29tbWVudDgyODcyNjM4Mw== | eric-czech 6130352 | 2021-04-28T19:38:26Z | 2021-04-28T19:38:26Z | NONE | Yep that fixed it after upgrading to pandas 1.2.4. Thanks! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Index level naming bug with `concat` 869792877 | |
740993933 | https://github.com/pydata/xarray/issues/4663#issuecomment-740993933 | https://api.github.com/repos/pydata/xarray/issues/4663 | MDEyOklzc3VlQ29tbWVudDc0MDk5MzkzMw== | eric-czech 6130352 | 2020-12-08T20:38:44Z | 2020-12-08T20:39:23Z | NONE |
Oo nice, great to know about that.
Defining a general strategy for handling unknown chunk sizes seems like a good umbrella for it. I would certainly mention the multiple executions though, that seems somewhat orthogonal. Have there been prior discussions about the fact that dask doesn't support consecutive slicing operations well (i.e. applying filters one after the other)? I am wondering what the thinking is on how far off that is in dask vs simply trying to support the current behavior well. I.e. maybe forcing evaluation of indexer arrays is the practical solution for the foreseeable future if xarray didn't do so more than once. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fancy indexing a Dataset with dask DataArray triggers multiple computes 759709924 | |
689046158 | https://github.com/pydata/xarray/issues/4412#issuecomment-689046158 | https://api.github.com/repos/pydata/xarray/issues/4412 | MDEyOklzc3VlQ29tbWVudDY4OTA0NjE1OA== | eric-czech 6130352 | 2020-09-08T18:06:23Z | 2020-09-08T18:06:23Z | NONE | Ok thanks @dcherian! I'll try that (feel free to close this). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Dataset.encode_cf function 696047530 | |
686752919 | https://github.com/pydata/xarray/issues/4405#issuecomment-686752919 | https://api.github.com/repos/pydata/xarray/issues/4405 | MDEyOklzc3VlQ29tbWVudDY4Njc1MjkxOQ== | eric-czech 6130352 | 2020-09-03T20:38:47Z | 2020-09-03T20:38:47Z | NONE | Np! Sounds good. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_zarr: concat_characters has no effect when dtype=U1 692238160 | |
686745468 | https://github.com/pydata/xarray/issues/4405#issuecomment-686745468 | https://api.github.com/repos/pydata/xarray/issues/4405 | MDEyOklzc3VlQ29tbWVudDY4Njc0NTQ2OA== | eric-czech 6130352 | 2020-09-03T20:29:21Z | 2020-09-03T20:29:21Z | NONE | Hm got it. Should I close this out then or might there be something awry given that concatenation doesn't work with U1 types? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_zarr: concat_characters has no effect when dtype=U1 692238160 | |
686716048 | https://github.com/pydata/xarray/issues/4405#issuecomment-686716048 | https://api.github.com/repos/pydata/xarray/issues/4405 | MDEyOklzc3VlQ29tbWVudDY4NjcxNjA0OA== | eric-czech 6130352 | 2020-09-03T19:40:53Z | 2020-09-03T19:40:53Z | NONE | Also out of curiosity, do you know why that's True by default? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_zarr: concat_characters has no effect when dtype=U1 692238160 | |
686715024 | https://github.com/pydata/xarray/issues/4405#issuecomment-686715024 | https://api.github.com/repos/pydata/xarray/issues/4405 | MDEyOklzc3VlQ29tbWVudDY4NjcxNTAyNA== | eric-czech 6130352 | 2020-09-03T19:38:36Z | 2020-09-03T19:38:36Z | NONE | 🤦 lol yes that works. Should ```python chrs = np.array([ ['A', 'B'], ['C', 'D'], ['E', 'F'], ], dtype='U1') ds = xr.Dataset(dict(x=(('dim0', 'dim1'), chrs))) ds.to_zarr('/tmp/test.zarr', mode='w') xr.open_zarr('/tmp/test.zarr', concat_characters=True).x.compute() No concatenation occurs<xarray.DataArray 'x' (dim0: 3, dim1: 2)> array([['A', 'B'], ['C', 'D'], ['E', 'F']], dtype='<U1') Dimensions without coordinates: dim0, dim1 ``` Basically, what does |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_zarr: concat_characters has no effect when dtype=U1 692238160 | |
612978261 | https://github.com/pydata/xarray/issues/3959#issuecomment-612978261 | https://api.github.com/repos/pydata/xarray/issues/3959 | MDEyOklzc3VlQ29tbWVudDYxMjk3ODI2MQ== | eric-czech 6130352 | 2020-04-13T16:36:32Z | 2020-04-13T16:36:32Z | NONE | Thanks again @keewis! I moved the static typing discussion to https://github.com/pydata/xarray/issues/3967. This is closed out now as far as I'm concerned. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Extending Xarray for domain-specific toolkits 597475005 | |
612513722 | https://github.com/pydata/xarray/issues/3959#issuecomment-612513722 | https://api.github.com/repos/pydata/xarray/issues/3959 | MDEyOklzc3VlQ29tbWVudDYxMjUxMzcyMg== | eric-czech 6130352 | 2020-04-11T21:07:07Z | 2020-04-11T21:39:42Z | NONE | Thanks @keewis! I like those ideas so I experimented a bit and found a few things.
Is there any reason not to put the name of the type into
I would love to try to use something like that. I couldn't get it to work either when trying to have a TypedDict that represents entire datasets, so I tried creating them for ```python MyDict = TypedDict('MyDict', {'x': str}) v1: MyDict = MyDict(x='x') This is finev2: Mapping = v1 But this doesn't work:v2: Mapping[Hashable, Any] = v1 # A notable examples since it's used in xr.Dataset error: Incompatible types in assignment (expression has type "MyDict", variable has type "Mapping[Hashable, Any]")And neither do any of these:v2: dict = v1 error: Incompatible types in assignment (expression has type "MyDict", variable has type "Dict[Any, Any]")v2: Mapping[str, str] = v1 error: Incompatible types in assignment (expression has type "MyDict", variable has type "Mapping[str, str]")``` Going the other direction isn't possible at all (i.e. from ```python ds = xr.Dataset(data_vars=MyTypedDict(data=...)) Now assume a user wants to use data_vars/coords with type safety:data_vars: MyTypedDict = ds.data_vars # This doesn't work ``` Generics seem like a decent solution to all these problems, but it would obviously involve a lot of type annotation changes: ```python Ideally, xarray.typing would help specify more specific constraints,but this works with what exists today:GenotypeDataVars = TypedDict('GenotypeDataVars', {'data': DataArray, 'mask': DataArray}) GenotypeCoords = TypedDict('GenotypeCoords', {'variant': DataArray, 'sample': DataArray}) D = TypeVar('D', bound=Mapping) C = TypeVar('C', bound=Mapping) Assume xr.Dataset was written something like this instead:class Dataset(Generic[D, C]):
ds1: Dataset[GenotypeDataVars, GenotypeCoords] = Dataset( GenotypeDataVars(data=xr.DataArray(), mask=xr.DataArray()), GenotypeCoords(variant=xr.DataArray(), sample=xr.DataArray()) ) Types should then be preserved even if xarray is constantly redefiningnew instances in internal functions:ds2: Dataset[GenotypeDataVars, GenotypeCoords] = type(ds1)(ds1.data_vars, ds1.coords) # This is OK ``` Anyways, my takeaways from everything on the thread so far are:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Extending Xarray for domain-specific toolkits 597475005 | |
612050871 | https://github.com/pydata/xarray/issues/3959#issuecomment-612050871 | https://api.github.com/repos/pydata/xarray/issues/3959 | MDEyOklzc3VlQ29tbWVudDYxMjA1MDg3MQ== | eric-czech 6130352 | 2020-04-10T14:23:20Z | 2020-04-10T14:24:35Z | NONE | Thanks @keewis, that would work though I think it leads to an awkward result if I'm understanding correctly. Here's what I'm imagining: ```python from genetics import api These are different types of data structures I originally wanted to model as classesds1 = api.create_genotype_call_dataset(...) ds2 = api.create_genotype_probability_dataset(...) ds3 = api.create_haplotype_call_dataset(...) ds1, ds2, and ds3 are now just xr.Dataset instancesFor each of these different types of datasets I have separate accessorsthat expose dataset-type-specific analytical methods:@xr.register_dataset_accessor("genotype_calls") class GenotypeCallAccessor: def init(self, ds): self.ds = ds
@xr.register_dataset_accessor("genotype_probabilities") class GenotypeProbabilityAccessor: ??? # This also has some "analyze" method @xr.register_dataset_accessor("haplotype_calls") class HaplotypeCallAccessor: ??? # This also has some "analyze" method * Now, how do I prevent this? ***ds1.haplotype_calls.analyze() ds1 is really genotype call data so it shouldn't be possible to do a haplotype analysis on it``` Is there a way to make accessors available on an xr.Dataset based on some conditions about the dataset itself? That still seems like a bad solution, but I think it would help me here. I was trying to think of some way to use static structural subtyping but I don't see how that could ever work with accessors given that 1) they're attached at runtime and 2) all accessors are available on ALL Dataset instances, regardless of whether or not I know only certain things should be possible based on their content. If accessors are the only way Xarray plans to facilitate extension, has anyone managed to enable static type analysis on their extensions? In my case, I'd be happy to have any kind of safety whether its static or monkey-patched in at runtime, but I'm curious if making static analysis impossible was a part of the discussion in deciding on accessors. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Extending Xarray for domain-specific toolkits 597475005 | |
611973587 | https://github.com/pydata/xarray/issues/3959#issuecomment-611973587 | https://api.github.com/repos/pydata/xarray/issues/3959 | MDEyOklzc3VlQ29tbWVudDYxMTk3MzU4Nw== | eric-czech 6130352 | 2020-04-10T10:20:37Z | 2020-04-10T10:22:16Z | NONE |
That works for documenting the methods but I'm more concerned with documenting how to build the Dataset in the first place. Specifically, this would mean describing how to construct several arrays relating to genotype calls, phasing information, variant call quality scores, individual pedigree info, blah blah etc. and all these domain-specific things can have some pretty nuanced relationships so I think describing how to create a sensible Dataset with them will be a big part of the learning curve for users. I want to essentially override the constructor docs for Dataset and make it more specific to our use cases. I can't see a good way to do that with accessors since the dataset would already need to have been created.
It is, or at least I'd like not to preclude the checks from doing things like checking min/max values and asserting conditions along axes (i.e. sums to 1).
Will do. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Extending Xarray for domain-specific toolkits 597475005 | |
611950517 | https://github.com/pydata/xarray/issues/3959#issuecomment-611950517 | https://api.github.com/repos/pydata/xarray/issues/3959 | MDEyOklzc3VlQ29tbWVudDYxMTk1MDUxNw== | eric-czech 6130352 | 2020-04-10T09:08:06Z | 2020-04-10T09:08:06Z | NONE | Thanks @TomNicholas, some thoughts on your points:
I'm ok with the subtype being lost after running some methods. I saw that so I'm assuming all functions that do anything with the data structures take and return Xarray objects alone.
Accessors could work but the issues I see with them are:
```python ds.haplo.do_custom_analysis_1() Do something with coords/indexes that causes a new Dataset to be createde.g. ds.reset_index ultimately hits https://github.com/pydata/xarray/blob/1eedc5c146d9e6ebd46ab2cc8b271e51b3a25959/xarray/core/dataset.py#L882which creates a new Datasetds = ds.reset_index() The
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Extending Xarray for domain-specific toolkits 597475005 | |
605697466 | https://github.com/pydata/xarray/issues/1194#issuecomment-605697466 | https://api.github.com/repos/pydata/xarray/issues/1194 | MDEyOklzc3VlQ29tbWVudDYwNTY5NzQ2Ng== | eric-czech 6130352 | 2020-03-29T20:37:29Z | 2020-03-29T20:37:29Z | NONE | I agree, I have this same issue with large genotyping data arrays often containing tiny integers and some degree of missingness in nearly 100% of raw datasets. Are there recommended workarounds now? I am thinking of constantly using Datasets instead of DataArrays with mask arrays to accompany every data array, but I'm not sure if that's the best interim solution. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Use masked arrays while preserving int 199188476 | |
604580582 | https://github.com/pydata/xarray/issues/3791#issuecomment-604580582 | https://api.github.com/repos/pydata/xarray/issues/3791 | MDEyOklzc3VlQ29tbWVudDYwNDU4MDU4Mg== | eric-czech 6130352 | 2020-03-26T17:51:34Z | 2020-03-26T17:51:34Z | NONE | That'll work, thanks @keewis! fwiw the number of use cases I've found concerning my initial question, where there are repeated index values on both sides of the join, is way lower. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Self joins with non-unique indexes 569176457 | |
604464873 | https://github.com/pydata/xarray/issues/3791#issuecomment-604464873 | https://api.github.com/repos/pydata/xarray/issues/3791 | MDEyOklzc3VlQ29tbWVudDYwNDQ2NDg3Mw== | eric-czech 6130352 | 2020-03-26T14:32:40Z | 2020-03-26T14:34:34Z | NONE | Hey @mrocklin (cc @max-sixty), sure thing. My original question was about how to implement a join in a typical relational algebra sense, where rows with identical values in the join clause are repeated, but I think I have an even simpler problem that is much more common in our workflows (and touches on how duplicated index values are supported). For example, I'd like to do something like this: ```python import xarray as xr import numpy as np import pandas as pd Assume we have a dataset of 3 individuals, one of Africanancestry and two of European ancestrya = pd.DataFrame({'pop_name': ['AFR', 'EUR', 'EUR'], 'sample_id': [1, 2, 3]}) Join on ancestry to get population sizeb = pd.DataFrame({'pop_name': ['AFR', 'EUR'], 'pop_size': [10, 100]}) pd.merge(a, b, on='pop_name') ``` | | pop_name | sample_id | pop_size | |----|------------|-------------|------------| | 0 | AFR | 1 | 10 | | 1 | EUR | 2 | 100 | | 2 | EUR | 3 | 100 | With xarray, the closest equivalent to this I can find is: ```python a = xr.DataArray( data=[1, 2, 3], dims='x', coords=dict(pop_name=('x', ['AFR', 'EUR', 'EUR'])), name='sample_id' ).set_index(dict(x='pop_name')) <xarray.DataArray 'sample_id' (x: 3)>array([1, 2, 3])Coordinates:* x (x) object 'AFR' 'EUR' 'EUR'b = xr.DataArray( data=[10, 100], dims='x', coords=dict(pop_name=('x', ['AFR', 'EUR'])), name='pop_size' ).set_index(dict(x='pop_name')) <xarray.DataArray 'pop_size' (x: 2)>array([100, 10])Coordinates:* x (x) object 'EUR' 'AFR'xr.merge([a, b]) InvalidIndexError: Reindexing only valid with uniquely valued Index objects``` The above does exactly what I want as long as the population names being used as the coordinate to merge on are unique, but that obviously doesn't make sense if those names correspond to a bunch of individuals in one of a small number of populations. The larger context for this is that genetic data itself is typically some 2+ dimensional array with the first two dimensions corresponding to genomic sites and people. Xarray is perfect for carrying around the extra information relating to those dimensions as coordinates, but being able to attach new coordinate values by joins to external tables is important. Am I missing something obvious in the API that will do this? Or am I likely better off converting DataArrays to DFs, doing my operations with some DF api, and then converting back? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Self joins with non-unique indexes 569176457 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
issue 9