Basic Usage

SHAPEIT3 commands are very similar to SHAPEIT2, with a few additional arguments to enable its fast scaling. Simply enabling the --fast flag should be sufficient for most users:

shapeit3 -B example/gwas -M genetic_map.txt -O out --threads 2 --cluster-size 500

A full list of arguments are listed below. In addition to the standard SHAPEIT2 parameters, SHAPEIT3 has the following arguments:

Option Description
–fast Turn on fast mode. Greater speed with a small decrease in
–cluster-size arg cluster size for fast conditioning haplotype search. We have found 4000 provides a good tradeoff
–early-stopping do not perform HMM calculations when perfect matches are found with conditioning haplotypes

Full list of options

Basic options:

Option Description
-H [ –help ] Produce help message.
–seed arg (=1484822686) Seed of the random number generator.
-T [ –thread ] arg (=1) Number of thread used for phasing.
-L [ –output-log ] arg (=shapeit_date_time_UUID.log) Log file containing a duplicate of the screen output.

Subsetting options:

Option Description
–exclude-snp arg File containing all the positions of the SNPs to exclude in input/output files.
–include-snp arg File containing all the positions of the SNPs to
include in input/output files.
–exclude-ind arg File containing all the ID of the individuals to exclude in input/output files.
–include-ind arg File containing all the ID of the individuals to include in input/output
files.
–input-from arg (=0) First physical position to consider in input files.
–input-to arg (=1000000000) Last physical position to consider in input file.

Input files options:

Option Description
-B [ –input-bed ] arg Unphased genotypes in Plink BED/BIM/FAM format.
-M [ –input-map ] arg Genetic map in HapMap format.
-G [ –input-gen ] arg Unphased genotypes in Impute2 GEN/SAMPLE format.
–input-thr arg (=0.9) Probability threshold used to call a genotype in GEN file.
-R [ –input-ref ] arg Reference set of haplotypes in HAPS/SAMPLE format.

MCMC options:

Option Description
–burn arg (=7) Number of burn-in MCMC iterations.
–prune arg (=8) Number of pruning MCMC iterations.
–main arg (=20) Number of main MCMC iterations.

Model options:

Option Description
-c [ –cluster_burn ] arg (=1) Frequency of clustering in burnin iterations.
-S [ –states ] arg (=100) Number of hidden states used for phasing.
–effective-size arg (=15000) Effective size of the population.
–rho arg (=0.0004) Constant recombination rate.
–fast Turn on fast mode. Greater speed with a small decrease in accuracy. (suitable
for N>10,000)
–cluster-size arg cluster size for fast conditioning haplotype
search. Accuracy and computation time will increase with this value. We have found 4000 provides a good tradeoff
–early-stopping do not perform HMM calculations when perfect matches are found with conditioning haplotypes
-W [ –window ] arg (=2) Mean size of the windows in which conditioning haplotypes are defined.

Output file options:

Option Description
-O [ –output-max ] arg Phased haplotypes in Impute2 format.