Skip to content
Susan VanderPlas edited this page Mar 5, 2015 · 21 revisions

Generate on-the-fly visualizations of genealogical data

Summary:

Re-factor the phyViz package into a new package called ggenealogy.

Genealogists wish to study the parent-child relationship between groups of organisms. Visual representations of genealogical relationships allow scientists to more effectively understand the historical changes that caused novel and desirable traits to arise in lineages. For example, in crops, desirable modifications could include an increase in protein yield or an increase in disease resistance. However, there are also times when lineages of detrimental traits can be viewed, such as to determine the origin of hazardous traits in rapidly-evolving viruses.

While there are visual methods available for genealogical data structures, there is a need for additional development of tools that are more customized to particular needs that arise when scientists want to make informed decisions while visualizing their data.

Description:

  • Draw interactive genealogy trees from database of known lineage.
  • Select generation number of ancestors and descendants to show around a given variety.
  • Show shortest path between two given varieties, and superimpose over full lineage structure.
  • Obtain graph theory measures of the full lineage structure.
  • Produce color matrix plots of variables between a subset of varieties.
> library(devtools)
> install_github("phyViz", "dicook")

Related work:

Potential tasks:

  • Incorporate Shiny to allow users to examine genealogy visualization tools in a more interactive way
  • Organize nodes horizontally by time of emergence, and vertically by a given additional variable.
  • Test the package on multiple toy datasets
  • Enhance the flexibility of potential data input types
    • Currently-required input format is a data frame of parent-child relationships
    • Possible input types include Purdy Notation System
  • Test and adapt the plotting tools so that they can be used for both exploratory data analysis and publication purposes.
  • Integrate ggenealogy functions with ggplot2 and ggbio packages

Skills required:

Knowledge of genealogical data structures, igraph, ggplot2, shiny, and devtools.

Test:

Install the phyViz package. Test it out by reporting the number of steps between the soybean lines Clark and Calland.

Mentor:

Di Cook ([@](mailto:visnut {at} gmail {dot} com)), [Michelle Graham](mailto:magraham {at} iastate {dot} edu) and Susan Vanderplas ([@](mailto:srvanderplas {at} gmail {dot} com)) as backup mentor

Reference:

Clone this wiki locally