issues: 207021356

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
207021356	MDU6SXNzdWUyMDcwMjEzNTY=	1262	Logical DTypes	306380	open	0			11	2017-02-12T01:26:23Z	2020-12-26T14:26:00Z		MEMBER				tl;dr: Can XArray enable user-defined logical dtypes on top of physical NumPy arrays ? The Need for New Datatypes NumPy's dtypes (int, float, etc.) are appropriate for many, but not all cases. There are a variety of situations where we want numpy-like array semantics (broadcasting, memory layout) but with different element properties. Use cases include the following: Datetimes with timezones Categorical values (such as for land-use in climate data) IPv4 or IPv6 addresses ... Currently dtypes need to be added directly to the NumPy source code. This is a high barrier for many community members, requires general approval (there can be only one datetime implementation) (this is good and bad), and limits experimentation. There is value to supporting user-definable datatypes. This is hard to do in NumPy Ideally we would implement extensible user-defined dtypes within NumPy (and there may be long-standing plans to do just this). However, changing NumPy today is hard, both because it's hard to find developers who are comfortable operating at that level and because the backwards compatibility pressure on NumPy is large. So as an alternative, we might consider lightly wrapping NumPy arrays in a new object that also includes extra dtype information. For example we might wrap an int64 numpy array with some datetime/timezone metadata to achieve a logical datetime array using a physical int64 array. We continue using NumPy as is but use this higher layer when necessary for more complex dtypes. However "lightly wrapping" NumPy arrays is hard to do while still maintaining a closed system where all operations remain consistent (raw NumPy arrays inevitably leak through). Additionally, asking communities to switch to new libraries is socially quite challenging. XArray is well placed Fortunately XArray appears to have already solved some of these technical and social challenges. XArray lightly wraps NumPy arrays in a consistent manner. NumPy-like operations on XArrays remain XArrays. Interactions with other NumPy arrays are well defined. XArray has also attracted an active user/developer community and has attained general respect from the broader ecosystem. XArray seems to be hackable, benefits from a decently active community, and is not yet under as much backwards compatibility pressure. So question: Is it sensible to add logical dtype information to XArray? Can this be done with only moderate effort and maintenance costs to the XArray project? If the answer is "yes, probably", then what is the right way to go about this?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1262/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }			13221727	issue

Links from other tables

1 row from issues_id in issues_labels
11 rows from issue in issue_comments