Haplotype estimation for biobank scale datasets

SHAPEIT3 is a new version of the SHAPEIT algorithm designed specifically for phasing very large micro-array datasets. It was used to phase the 500,000 individuals as part of the UK Biobank dataset.

SHAPEIT3 introduces several important extensions to the vanilla SHAPEIT algorithm to enable this scalability:

  • a fast clustering routine to identify conditioning haplotypes in sub-quadratic time
  • early stopping of the HMM when perfect haplotype matches are found
  • a redesigned MCMC routine for better performance

These features only become relevant at very large sample sizes. If your sample size is <20,000, we recommend you use SHAPEIT2.


If you use SHAPEIT3 in your research, please cite the following publication:

O’Connell, J, Sharp, K, Shrine, N, Wain, L, Hall, I, Tobin, M, Zagury, JF, Delaneau, O and Marchini, J. (2016). Haplotype estimation for biobank scale datasets. Nature Genetics. 48, 817–820 (2016)


SHAPEIT3 commands are very similar to SHAPEIT2, with a few additional arguments to enable its fast scaling. Simply enabling the --fast flag should be sufficient for most users:

shapeit3 \
-B example/gwas \
-M genetic_map.txt \
-O out \
--threads 2 \
--cluster-size 500

A full list of arguments are listed here. In addition to the standard SHAPEIT2 parameters, SHAPEIT3 has the following arguments:

  • --fast enable fast mode. This enables both the fast conditioning haplotype search and the early stopping HMM. We recommend users enable this when sample sizes are >15,000
  • --cluster-size 4000 the size of the clusters used in the haplotype search. Accuracy and computation time will increase with this value. We have found 4000 provides a good tradeoff
  • --early-stopping do not perform HMM iterations in a window if a perfect match with a conditioning haplotype is found

Software registration and license:

SHAPEIT 3 is freely available for academic use only. To see rules for non-academic use see the LICENCE file (also included with each software download).

Software and licence can be downloaded here.


21 July (v1.0) : First release

<span>%d</span> bloggers like this: