Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create pydantic models for the different fusion callers #227

Open
jarbesfeld opened this issue Jan 16, 2025 · 4 comments
Open

Create pydantic models for the different fusion callers #227

jarbesfeld opened this issue Jan 16, 2025 · 4 comments
Assignees
Labels
enhancement New feature or request priority:medium Medium priority

Comments

@jarbesfeld
Copy link
Contributor

Feature description

The translators currently have a lot of arguments which can make them a bit unwieldy to use. We should create pydantic classes for these methods instead to make supplying data to these methods easier

Use case

Some of the translators have 10+ required arguments, which can make using these functions challenging.

Acceptance Criteria

New pydantic models are created for the different fusion callers, and the new models are imported and used as input in translators.py

Proposed solution

No response

Alternatives considered

No response

Implementation details

No response

Potential Impact

No response

Additional context

No response

Contribution

Yes, I can create a PR for this feature.

@jsstevenson
Copy link
Member

Isn't this just displacing complexity behind another abstraction layer rather than reducing or streamlining it?

@jarbesfeld
Copy link
Contributor Author

@jsstevenson Yes I would agree that the complexity remains the same. I wasn't sure if it was best practice to have the translator functions have so many parameters? For example, when adding support for the fusion metadata, the arriba caller has 19 parameters.

@jsstevenson
Copy link
Member

jsstevenson commented Jan 16, 2025

It's not a bad idea, but I'd want to be thinking about ways in which the individual components can naturally be coupled in a way that reduces repetition or otherwise looks smoother. For example, if a function has args along the lines of (gene_1, gene_2, chrom_1, chrom_2, pos_1, pos_2) then maybe there's an object like a partner that has gene, chrom, and pos properties, so then the arguments are just (partner1, partner2).

Even better if there is some repetition across different callers so you can reuse some of those abstractions.

@jarbesfeld jarbesfeld changed the title Create pydantic models for the different fusion callers Refactor translator to improve readability Jan 16, 2025
@jarbesfeld
Copy link
Contributor Author

jarbesfeld commented Jan 16, 2025

@jsstevenson I think that is a good idea. I can imagine creating one class to hold the genomic breakpoint data + transcript-specific metadata (i.e. Partner) and one class to hold all associated fusion metadata (i.e. FusionMetadata), where every parameter would be optional due to the absence of an intersection between the metadata output from the different callers.

The one issue is that some callers combine the genomic breakpoint data into one column (e.g. STAR-Fusion could output chr1:154170465:-​ while JAFFA would output the data in separate columns). This was the main reason why we do caller-specific processing of the data in each caller before we run the methods in fusor

@jarbesfeld jarbesfeld changed the title Refactor translator to improve readability Create pydantic models for the different fusion callers Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority:medium Medium priority
Projects
None yet
Development

No branches or pull requests

2 participants