Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBS clustering constraint: maximum cluster size (--mc, --mcc) #8

Open
davidonlaptop opened this issue Apr 21, 2015 · 0 comments
Open
Labels
Milestone

Comments

@davidonlaptop
Copy link
Member

Description

Similar to plink --cluster with --mc or --mcc options. See the wiki on IBS-MDS Process and the diagram for the Genome file.

More information can be found on the --cluster and --genome-full options in the section on Pairwise IBD estimation of plink manual.

The input file is the model created in #3.

This feature adds a constraint on the --cluster option described in issue #7.

Analysis

Add a comment to this issue with:

  • plink version used as reference (1.07 or 1.90 beta 3)
  • relevant C++ function name(s) and the file name(s) where they appear
  • Document the structure of the hh file generated by plink (not always present).
  • Document the algorithm and/or mathematical formula to compute:
    • --mc option
    • --mcc option

Design

Add a comment to this issue describing how this will be implemented in Spark, and how it differs from plink.

Implementation

The implementation should use:

  • Scala
  • Spark RDD
  • Spark MLlib / GraphX (if appropriate)
@davidonlaptop davidonlaptop changed the title IBS clustering constraint: cluster size maximums (--mc, --mcc) IBS clustering constraint: maximum cluster size (--mc, --mcc) Apr 21, 2015
@davidonlaptop davidonlaptop modified the milestone: 0.3 Apr 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant