A program for efficient GWAS for multiple continuous traits and PHEWAS with many features designed and optimized for large scale analysis:
- BGENIE is built upon the BGEN library. It takes BGEN files as input and avoids repeated decompression and conversion of these files when analyzing multiple continuous phenotypes.
- It was written for the analysis of the UK Biobank dataset (which is stored in the BGEN v1.2 file format). This dataset consists of genetic data on ~500,000 individuals, ~93 Million autosomal variants and thousands of phenotypes.
- It works with indexed BGEN files yielding fast access for any (group of) SNPs. This feature facilitates very fast PHEWAS.
- BGENIE uses the Eigen matrix library and OpenMP to carry out as many of the linear algebra operations in parallel as possible. For example, estimation of effect sizes of large numbers of SNPs can be carried out in parallel using matrix operations, and indexing of missing data values is used to allow for fast estimation of standard errors.
- It has built in functionality to apply PCA or ICA (using the fastICA algorithm) to multiple phenotypes and use the resulting transformed phenotypes for testing via GWAS.
If you use BGENIE in your research, please cite the following publication:
Studies using BGENIE:
Several studies have used BGENIE to carry out genome-wide association studies
Elliott et al. (2017) The genetic basis of human brain structure and function: 1,262 genome-wide associations found from 3,144 GWAS of multimodal brain imaging phenotypes from 9,707 UK Biobank participants.
28 July (v1.2) : added features –include_rsids, –scale_phenotypes –scale_genotypes, –dosage flag, –dump_phenotypes
10 July (v1.1) : Improvements to performance when using threading
14 Jun (v1.0) : First release
BGENIE performs a linear association test between SNP/phenotype pairs in the provided data. A basic command to run GWAS on all the phenotypes is:
bgenie --bgen example.bgen --pheno example.pheno --out example.out
If you wish to specify a range of SNPs specified by position (useful if you wish to split the genome up into multiple jobs) you can use the –range option, for example:
bgenie --bgen example.bgen --pheno example.pheno --out example.out --range 22 20000000 21000000
If you wish to analyse just a single SNP you can select it using the –rsid option, for example:
bgenie --bgen example.bgen --pheno example.pheno --out example.out --rsid rs573069994
A full list of arguments and details of file formats are listed here.
Software registration and license:
Please join the OXSTATGEN mailing list and then post any questions there