You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cBioPortal is a powerful platform for exploring cancer genomics data, but its rich interface can be challenging for new users to navigate and interpret. An integrated Copilot (AI assistant powered by LLMs) could guide users through key pages of cBioPortal, answering questions and providing contextual help in real-time. This Copilot concept can be applied to multiple parts of the portal, including:
Query Page: Assists users in selecting relevant studies and genomic profiles for their analysis through natural language suggestions or explanations of available options.
Study View Page: Helps users navigate study-level data (genomic alterations, clinical summaries, etc.), highlighting important findings and answering questions about the cohort’s genomic and clinical landscape.
Results Page: Guides users through complex visualizations (Oncoprint, Plots, Mutations, Mutual Exclusivity, etc.), explaining charts and results, and helping interpret findings (e.g. what a mutual exclusivity plot means or how to download data).
Comparison Page: Assists in comparing patient groups (e.g. mutated vs. wild-type cases) by explaining differences in genomic or clinical characteristics and helping users set up or interpret group comparisons.
Patient View Page: Helps users understand a single patient’s data by summarizing the patient’s “journey” (diagnosis, treatments, outcomes) and explaining specific mutations or treatment history in context.
While the Copilot could eventually span all five pages above, the primary goal for GSoC 2025 is to achieve deep integration on at least one page rather than superficial support across all pages. We would recommend that the proposed project focus on one page. By narrowing the scope to one page, we aim to deliver a fully functional and insightful assistant experience that can later be extended to other parts of cBioPortal. The goal is to fully understand the technical feasibility and showcases the Copilot’s capabilities.
Benefits to the Community
Improved User Experience: This project will make cBioPortal more intuitive and user-friendly, especially for novices. An AI Copilot can lower the learning curve by providing on-demand explanations and guidance, allowing researchers and clinicians to focus on insights rather than figuring out the tool.
Innovation in Bioinformatics Tools: Implementing an LLM-driven assistant in cBioPortal will be a novel proof-of-concept for how AI can enhance bioinformatics software. It could inspire similar features in other scientific platforms and spark further development of intelligent user assistance in open-source tools.
Efficiency in Data Analysis: Researchers and clinicians will be able to navigate large genomic datasets more efficiently with the Copilot’s help. For example, instead of manually searching documentation, a user could ask the Copilot questions like “How many patients have this alteration?” or “What does this clinical term mean?” and get quick answers. This accelerates data interpretation and could lead to faster hypotheses generation or insights.
Community and Educational Value: A well-documented Copilot feature can also serve as an educational resource. New contributors can learn from the implementation about integrating AI into web applications. Users of cBioPortal will indirectly learn more about genomics as the Copilot explains concepts to them. Plus, the feature shows the community that cBioPortal is staying at the forefront by adopting modern AI assistance capabilities.
Overall, the Copilot integration aims to enhance cBioPortal’s functionality and accessibility, ensuring that the research community can leverage the portal’s full potential with greater ease and understanding. The project’s outcome will not only benefit current users but also lay the groundwork for future expansions of AI-driven assistance across the platform.
Approach
We will develop an interactive Copilot panel or chatbot UI within the chosen cBioPortal page. The Copilot will leverage a Large Language Model (LLM) to understand user questions or actions and provide helpful, context-specific responses. Key steps include:
Backend Integration (Java): Extend cBioPortal’s Java backend if necessary to gather the relevant context from the portal (such as current study details, selected patient data, or query parameters) and make it available to the Copilot. This may involve utilizing data fetched by the current page, creating new API endpoints or services to retrieve metadata (e.g. study descriptions, gene annotations, patient clinical data) on the fly.
API Development: Ensure that all information the Copilot needs (e.g. definitions of genomic terms, interpretation of plots, data summaries) is either already fetched by the page, or can be retrieved through cBioPortal’s APIs. We might enhance existing APIs or add new ones to provide richer context. This will enable the LLM to ground its answers in real cBioPortal data.
LLM Integration: Use an LLM to power the Copilot’s natural language understanding and generation. The Copilot will be prompted with the context from the current page and the user’s query. We will likely use a hosted LLM service or an open-source model, depending on what’s feasible within the project (ensuring compliance with any data privacy or hosting constraints).
Prompt Engineering: Craft effective prompts and conversation flows so that the LLM provides accurate and relevant assistance. This involves instructing the LLM on cBioPortal-specific roles (e.g. “You are an expert assistant for a cancer genomics portal…”) and feeding it context like “the user is looking at Oncoprint for Study X” or “the user selected patient Y with these mutations”. Iterative refinement of prompts will be done to handle different user questions (from basic “what is shown in this plot?” to complex “how do I find patients with KRAS mutations and treated with sotorasib?”) and to ensure the responses are correct, concise, and helpful.
UI/UX Integration: Design a user-friendly interface element on the page (such as a chat window or help sidebar) where users can interact with the Copilot. The UI will display the Copilot’s guidance and allow the user to ask follow-up questions. It should feel like a seamless part of cBioPortal, with context-aware behavior (for example, suggesting what to do next or providing explanations without the user always needing to ask).
Throughout development, we will test the Copilot’s responses for accuracy and helpfulness. We will also document how the context is gathered and used, to make it easier to expand the Copilot to the other pages in the future. If time permits after achieving a solid integration on the primary page, we can create a minimal prototype on one of the other pages to demonstrate scalability.
Technologies
Java: Used for back-end development in cBioPortal. We will use Java to integrate the Copilot with server-side logic, ensuring the LLM has access to necessary data through cBioPortal’s backend.
REST APIs: Developing and extending APIs will be crucial for retrieving page-specific metadata and context. This project will likely involve creating new API calls or augmenting existing ones.
LLM (Large Language Model): The core of the Copilot’s intelligence. We will utilize an LLM (such as GPT-based models or similar) to interpret user queries and generate helpful responses. The model could be accessed via an API (e.g. OpenAI, Anthropic, or a local model) depending on feasibility and permissions.
Prompt Engineering: Techniques for constructing effective prompts and handling the conversation with the LLM. This includes providing context, controlling the tone and detail of responses, and guiding the model to stay on topic (for instance, focusing on genomic data interpretation rather than general knowledge).
Expected Outcome
By the end of the project we expect to deliver:
Copilot on One Key Page: A fully functional, context-aware Copilot integrated into one of the major cBioPortal pages (as identified in the scope). Users should be able to interact with it to get assistance specific to that page (e.g. asking “What does this chart mean?” on the Results page and receiving a useful explanation).
Enhanced Metadata Access: Any necessary enhancements to cBioPortal’s data retrieval (such as new API endpoints or backend methods) to support the Copilot’s functionality will be implemented. This ensures the Copilot has the data it needs.
Documentation and Guides: Clear documentation of the Copilot’s implementation and usage. This will include how to configure or extend it to other pages, instructions for developers to maintain or improve it, and maybe a short user guide section on how to use the Copilot feature.
Proof-of-Concept for Expansion: Although we focus on one page, the project will serve as a template for future Copilot integrations in cBioPortal. The outcome will include insights or even a demo on how the Copilot could be rolled out to the other pages (for instance, noting what additional data would be needed for those pages), providing a foundation for continued development after GSoC.
Difficulty
Difficulty Level: Advanced – This project is challenging because it involves cutting-edge integration of AI (LLMs) with a complex bioinformatics platform. The student will need to be comfortable working across the full stack (front-end UI, back-end Java, and external AI services) and handling the uncertainties of LLM interactions. Experience with both software engineering and some understanding of the biological context will be helpful to navigate this project’s complexity.
Skills Required
LLM Integration: Experience or willingness to learn how to integrate large language model APIs or libraries. This includes handling API calls, parsing model responses, and possibly fine-tuning prompts or using libraries for better context management.
Prompt Engineering: Skill in crafting and refining prompts to get useful outputs from the LLM. The student should be prepared to experiment with prompt phrasing and conversation structure to improve the Copilot’s accuracy and usefulness.
Java Development: Strong skills in Java for modifying cBioPortal’s backend, adding new endpoints, and ensuring the application remains robust and efficient with the new features.
API Design & Development: Ability to design clear and efficient APIs. The student should be able to extend cBioPortal’s REST API or backend services to provide the data the Copilot needs.
Web Integration (Frontend): Familiarity with web development (JavaScript/React) to embed the Copilot interface seamlessly into the cBioPortal UI. While the project description emphasizes backend and LLM, some frontend work will be needed for the user interface.
Bioinformatics Domain Knowledge (Preferred): While not strictly required, familiarity with cancer genomics concepts and portals like cBioPortal will be very beneficial. Understanding terms like “oncoprint”, “mutation frequency”, or clinical data will help in guiding the LLM and interpreting what users might ask.
Background
cBioPortal is a powerful platform for exploring cancer genomics data, but its rich interface can be challenging for new users to navigate and interpret. An integrated Copilot (AI assistant powered by LLMs) could guide users through key pages of cBioPortal, answering questions and providing contextual help in real-time. This Copilot concept can be applied to multiple parts of the portal, including:
While the Copilot could eventually span all five pages above, the primary goal for GSoC 2025 is to achieve deep integration on at least one page rather than superficial support across all pages. We would recommend that the proposed project focus on one page. By narrowing the scope to one page, we aim to deliver a fully functional and insightful assistant experience that can later be extended to other parts of cBioPortal. The goal is to fully understand the technical feasibility and showcases the Copilot’s capabilities.
Benefits to the Community
Overall, the Copilot integration aims to enhance cBioPortal’s functionality and accessibility, ensuring that the research community can leverage the portal’s full potential with greater ease and understanding. The project’s outcome will not only benefit current users but also lay the groundwork for future expansions of AI-driven assistance across the platform.
Approach
We will develop an interactive Copilot panel or chatbot UI within the chosen cBioPortal page. The Copilot will leverage a Large Language Model (LLM) to understand user questions or actions and provide helpful, context-specific responses. Key steps include:
Throughout development, we will test the Copilot’s responses for accuracy and helpfulness. We will also document how the context is gathered and used, to make it easier to expand the Copilot to the other pages in the future. If time permits after achieving a solid integration on the primary page, we can create a minimal prototype on one of the other pages to demonstrate scalability.
Technologies
Expected Outcome
By the end of the project we expect to deliver:
Difficulty
Difficulty Level: Advanced – This project is challenging because it involves cutting-edge integration of AI (LLMs) with a complex bioinformatics platform. The student will need to be comfortable working across the full stack (front-end UI, back-end Java, and external AI services) and handling the uncertainties of LLM interactions. Experience with both software engineering and some understanding of the biological context will be helpful to navigate this project’s complexity.
Skills Required
Potential Mentors
The text was updated successfully, but these errors were encountered: