Software

This page contains links to software packages that were developed by our group.

If you have questions use the [OXSTATGEN MAIL LIST], and post a question there.

Some of these programs are licenced for academic use only. Please read the [LICENCE]

Tensor and Matrix Decomposition

SDA4D

A program for sparse Bayesian 4D tensor decomposition

Christopher Gill and Jonathan Marchini (2020) Four-Dimensional Sparse Bayesian Tensor Decomposition for Gene Expression Data [bioRxiv]

[Source code and documentation]

SDA

A program for sparse Bayesian matrix and tensor decomposition.

Victoria Hore, Ana Viñuela, Alfonso Buil, Julian Knight, Mark I McCarthy, Kerrin Small, Jonathan Marchini. Tensor decomposition for multi-tissue gene expression experiments. Nature Genetics 10.1038/ng.3624 [Journal]

[Executables and documentation] [Licence]

FastICA

R package that implements the FastICA algorithm. This is now hosted on CRAN and maintained by others.

[R package]

GxE Interactions

LEMMA

LEMMA (Linear Environment Mixed Model Analysis) is a whole genome wide regression method for flexible modeling of gene-environment interactions in large datasets such as the UK Biobank.

The method estimates a linear combination of environmental variables, called an environmental score (ES), that interacts with genetic markers throughout the genome, and provides a readily interpretable way to examine the combined effect of many environmental variables. The ES can be used both to estimate the proportion of phenotypic variance attributable to GxE effects, and also to test for GxE effects at genetic variants across the genome.

Matthew Kerin and Jonathan Marchini (2020) Inferring Gene-by-Environment Interactions with a Bayesian Whole-Genome Regression Model. American Journal of Human Genetics [Journal]

[Documentation and Source Code] (freely available under an MIT licence)

GPLEMMA

GPLEMMA (Gaussian Prior Linear Environment Mixed Model Analysis) is a non-linear randomized Haseman-Elston regression method for flexible modeling of gene-environment interactions in large datasets such as the UK Biobank.

The method simultaneously estimates a linear combination of environmental variables, called an environmental score (ES), that interacts with genetic markers throughout the genome, and it’s associated heritability. Estimation of the ES provides a readily interpretable way to examine the combined effect of many environmental variables.

Matthew Kerin and Jonathan Marchini (2020) Non-linear randomized Haseman-Elston regression for estimation of gene-environment heritability. Bioinformatics [Journal]

The GPLEMMA method is implemented as an option in the LEMMA code

[Documentation and Source Code] (freely available under an MIT licence)

Imputation

IMPUTE 5

IMPUTE 5 is a genotype imputation method that can scale to reference panels with millions of samples. This method continues to refine the observation made in the IMPUTE2 method, that accuracy is optimized via use of a custom subset of haplotypes when imputing each individual. It achieves fast, accurate, and memory-efficient imputation by selecting haplotypes using the Positional Burrows Wheeler Transform (PBWT). By using the PBWT data structure at genotyped markers, IMPUTE 5 identifies locally best matching haplotypes and long identical by state segments. The method then uses the selected haplotypes as conditioning states within the IMPUTE model.

IMPUTE5 is up to 30x faster than MINIMAC4 and up to 3x faster than BEAGLE5.1

S. Rubinacci, O. Delaneau, J. Marchini (2019) Genotype imputation using the Positional Burrows Wheeler Transform PLoS Genetics [Journal]

IMPUTE 5 is freely available for academic use only. To see rules for non-academic use see the [Licence] (also included with each software download).

[Binary executables and documentation]

IMPUTE 4

A program for efficient genotype imputation. IMPUTE 4 implements the haploid imputation options included in IMPUTE 2, but is much faster and more memory efficient. It was written to impute genotypes for the UK Biobank dataset that consists of genetic data on ~500,000 individuals

IMPUTE 4 is freely available for academic use only. To see rules for non-academic use see the [Licence] (also included with each software download).

[Binary executables and documentation]

IMPUTE 2

a program for genotype imputation and phasing in genome-wide association studies and fine-mapping studies based on a dense set of marker data (such as 1000 Genomes Project haplotypes)

IMPUTE 2 is freely available for academic use only. To see rules for non-academic use see the [Licence] (also included with each software download).

[Software and Documentation]

MVNCALL

a program for genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold.

[Software and Documentation]

MVNCALL is freely available for academic use only. To see rules for non-academic use see the [Licence] (also included with each software download).

Haplotype estimation

SHAPEIT 4

a program for accurate and efficient phasing of very large genetic datasets

[Software and Documentation] [Paper]

SHAPEIT 3

a program for accurate and efficient phasing of very large genetic datasets. It was used to phase the 500,000 individuals as part of the UK Biobank dataset.

SHAPEIT3 introduces several important extensions to the vanilla SHAPEIT algorithm to enable this scalability:

[Software and Documentation] [Paper]

SHAPEIT 2

a program for accurate and efficient phasing of genetic datasets

[Software and Documentation] [Paper]

SHAPEIT2 is freely available for academic use only. See the [SHAPEIT2 website] for more details.

PHASING SERVER

A service for accurately phasing sequenced samples using large haplotype reference panels, such as the Haplotype Reference Consortium dataset.

[Website] [Paper]

DUOHMM

a program that works together with SHAPEIT to produce accurate inference of haplotypes in pedigrees, estimate recombination events and detect genotyping errors.

[Source code] [Documentation]

Association testing

BGENIE

a program for multiple trait GWAS focussed on the BGEN format files used to store the UK Biobank genetic data on ~500,000 individuals. [Software] [Documentation] [Paper]

BGENIE is freely available for academic use only.

SNPTEST

a program for Frequentist and Bayesian tests of SNP association with binary (case-control) and quantitative phenotypes that takes genotype uncertainty into account.

[Software and Documentation]

SNPTEST is freely available for academic use only. To see rules for non-academic use see the [Licence] (also included with each software download).

SBAT

A program for Sparse Bayesian Association Testing that can fit multi-trait linear mixed models.

[Software]

SBAT is *freely available for academic use only

META

A program to carryout meta-analysis of genetic studies.

[Software and Documentation]

GENECLUSTER

A program for location and detection of unobserved causal loci in fine-mapping experiments and genome-wide association studies.

[Software and Documentation]

PHENIX

PHENIX (PHENotype Imputation eXpediated) is a method for imputing missing phenotypes where samples have an arbitrary level of relatedness. The method can also be used for dimensionality reduction of multiple traits in related samples. The resulting latent traits can be tested as phenotypes, and can result in an increase in power.

[Software and Documentation]

HAPGEN

A program to simulate case control datasets at linked SNP markers conditional upon a set of known haplotypes.

[Software and Documentation]

GWAPOWER

An R package for assessing the power of genome-wide association studies using commercially available genotyping chips. The package encapsulates extensive simulation results generated by our program HAPGEN and described fully in the paper

[Software and Documentation]

Oxford Brain Imaging Genetics (BIG) server

A PheWeb browser for 3,144 GWAS of brain imaging derived phenotypes in the UK Biobank, based on Elliott, L. et al (2018)

[Browser]

Population structure

MULTIMIX

A program for admixture deconvolution using multiple (i.e. > 2) phased or unphased ancestral panels.

[Software and Documentation]

MULTIMIX is freely available for academic use only. To see rules for non-academic use see the [Licence] (also included with each software download).

GWAS Quality Control

QCTOOL

A program for carrying out SNP and sample quality control (QC) for genome-wide association studies

[Software and Documentation]

GTOOL

A program for (a) generating subsets of genotype data, and (b) converting genotype data between the PED file format and the FILE FORMAT used by SNPTEST and IMPUTE.

[Software and Documentation]

Genotype calling

CHIAMANTE

A joint genotype calling algorithm for array and sequence data. This method can be used to call genotype from just array data.

[Software and Documentation]

CHIAMO

A genotype calling algorithm for multi-cohort studies.

[Software and Documentation]

Brain Imaging

AnalyzeFMRI

An R package for visualisation and analysis of FMRI data

[Software and Documentation]