Cython/Python implementation of Halko algorithm

This is a fast implementation of Halko algorithm in Python/Cython for genotype data. It takes binary PLINK format (*.bed, *.bim, *.fam) as input. For simplicity, mean imputation is performed for missing data.

It is inspired by the lovely PCAone software! Have a look here.

Installation

# Option 1: Build and install via PyPI
pip install halkoSVD

# Option 2: Download source and install via pip
git clone https://github.com/Rosemeis/halkoSVD.git
cd halkoSVD
pip install .

# Option 3: Download source and install in a new Conda environment
git clone https://github.com/Rosemeis/halkoSVD.git
conda env create -f halkoSVD/environment.yml
conda activate halkoSVD

You can now run the program with the halkoSVD command.

Quick usage

Provide halkoSVD with the file prefix of the PLINK files.

# Check help message of the program
halkoSVD -h

# Extract the top 10 PCs
halkoSVD --bfile input --threads 32 --pca 10 --out halko

Options

--seed, set random seed for reproducibility (42)
--power, specify the number of power iterations (10)
--batch, specify the batch size to process SNPs (8192)
--loadings, save the SNP loadings
--raw, only output eigenvectors without FID/IID

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
halko		halko
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cython/Python implementation of Halko algorithm

Installation

Quick usage

Options

About

Releases

Packages

Languages

License

Rosemeis/halkoSVD

Folders and files

Latest commit

History

Repository files navigation

Cython/Python implementation of Halko algorithm

Installation

Quick usage

Options

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages