IMPUTE 4 USAGE

This is a list of the command line options for IMPUTE 4.

-h <file.hap.gz> (Required argument) Reference panel haplotype file. With one row per SNP and one column per haplotype. All alleles must be coded as 0 or 1. Publicly available reference panels can be found here

 https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#reference

-l <file.legend> (Required argument)  Reference panel legend file. Legend file(s) with information about the SNPs in the -h file(s). Each file should have four columns: rsID, physical position (in base pairs), allele 0, and allele 1. The last two columns specify the alleles underlying the 0/1 coding in the corresponding -h file; these alleles can take values in {A,C,G,T}. Each legend file should also have a header line with an unbroken character string for each column (e.g., “rsID position a0 a1“).

 
-g <file> (Required argument)  Phased samples to be imputed. The format has one row per SNP. The first five columns are ID1, ID2, position, allele 1 and allele 2. The subsequent columns occur in pairs with two columns (haplotypes) per individual. Allowed values in the haplotype columns are 01.

-m <file> (Required argument)  genetic map file. Fine-scale recombination map for the region to be analyzed. This file should have three columns: physical position (in base pairs), recombination rate between current position and next position in map (in cM/Mb), and genetic map position (in cM). The file should also have a header line with an unbroken character string for each column (e.g., “position COMBINED_rate(cM/Mb) Genetic_Map(cM)”). All of our reference panel download packages come with appropriate recombination map files.

 
-int <s> <e> (Required argument) interval for imputation. s = start position in base pairs, e = end position in base pairs.

-buffer <kb> (Optional parameter. Default is 250: buffer region each side of interval used in computation to avoid edge effects.

 
-no_maf_align : flag that says DON’T TRY to align strand ambiguous SNPs based on allele frequency matching of samples and reference panel. Either this option or -maf_align is needed.

 
-maf_align : flag that says DO TRY  to align strand ambiguous SNPs based on allele frequency matching of samples and reference panel. Either this option or -no_maf_align is needed.

-Ne <int> (Optional parameter. Default is 20000) effective population size. Internal parameter. Safe to leave at 20000.

-o <file> (Required argument)  specifies output file prefix

-o_gz (Optional flag)  gzip output files