-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design: Should type aliases be used for Ibis types? #16
Comments
Good thoughts here, appreciate them a lot. I thought about this similarly, and I think in fact if you look through the git history I even used to have something similar. I think the For the
Thoughts? |
For Unfortunately I don't know if it's possible to define this strictly as a type (or maybe there's a Python trick I don't know?) The way that I define LabelingTable as a subclass of Table for the purposes of duck typing is a bit... not great? Regarding modules, I would put shared types at the top level, but then keep submodule-specific types either within the relevant submodules. Each (sub)module could have a Note that I haven't yet worked on a project that made widespread use of types and duck typing (rather than abstract classes), so my idea is kind of experimental. It could be tried out in a single submodule before deciding if it's a good idea. For the other questions:
|
Note: this issue is not super important, but I think it's something to think about to help clarify some aspects of the code.
Most Mismo modules rely on Ibis types from
ibis.expr.types
. This introduces a direct dependency on this module throughout the code, while requiring users to refer to this external dependency to understand the code.Would it be preferable for Mismo to have its own
types
module that only imports what it needs? This would provide a few advantages:For example, Mismo cluster metrics currently assume that Table objects representing membership vectors will have the two columns "record_id" and "label". This could be formalized by defining a Table subtype that can be documented in a single place, as follows:
This isn't technically duck typing (but does it matter, given that runtime Python doesn't generally care about nominal types?), and we're a bit liberal with our use of class attributes to describe what we want to be class attributes. But we can document the meaning and then use
table.record_id
andtable.label
in code functions that expect a membership vector as input.Additionally, we could have Mismo submodules each have a
types
submodule, where important interface definitions are placed and documented. For instance, all metrics function have a shared interface, expecting two LabelingTable objects as input. This could be defined inmetrics.types
to make this a stable interface definition, and so that this interface definition only has to be documented in one place (instead of having to re-explain each metrics function individually. This could be done as follows:In the metrics, module, things can be kept as they are or, alternatively, we could define things as follows:
These changes shouldn't affect users, as they can continue to use duck typing to extend as they want. They don't need to know about types object and can grab metrics function directly.
The text was updated successfully, but these errors were encountered: