Title: | Summaries and Population Structure Analyses of Genetic Data |
---|---|
Description: | A toolkit for analyzing stratified population genetic data. Functions are provided for summarizing and checking loci (haploid, diploid, and polyploid), single stranded DNA sequences, calculating most population subdivision metrics, and running external programs such as structure and fastsimcoal. The package is further described in Archer et al (2016) <doi:10.1111/1755-0998.12559>. |
Authors: | Eric Archer [aut, cre], Paula Adams [aut], Brita Schneiders [aut], Sarina Fernandez [aut], Warren Asfazadour [aut] |
Maintainer: | Eric Archer <[email protected]> |
License: | GNU General Public License |
Version: | 2.5.01 |
Built: | 2024-11-27 02:55:03 UTC |
Source: | https://github.com/EricArcher/strataG |
Calculate allele frequencies or proportions for each locus.
alleleFreqs(g, by.strata = FALSE, type = c("freq", "prop"))
alleleFreqs(g, by.strata = FALSE, type = c("freq", "prop"))
g |
a gtypes object. |
by.strata |
logical determining if results should be returned by strata? |
type |
return counts ( |
A list of allele frequencies for each locus. Each element is a
vector (by.strata = FALSE
) or matrix (by.strata = TRUE
)
with the frequency or proportion of each allele.
If g
is a haploid object with sequences, the function will run
labelHaplotypes
if sequences aren't already grouped by
haplotype. The gtypes
object used with haplotype assignments and
unassigned individuals will be stored in attr(*, "gtypes")
.
Eric Archer [email protected]
data(msats.g) f <- alleleFreqs(msats.g) f$D11t # Frequencies for Locus D11t f.pop <- alleleFreqs(msats.g, TRUE, "prop") f.pop$EV94[, "Coastal"] # Proportions of EV94 alleles in the Coastal population
data(msats.g) f <- alleleFreqs(msats.g) f$D11t # Frequencies for Locus D11t f.pop <- alleleFreqs(msats.g, TRUE, "prop") f.pop$EV94[, "Coastal"] # Proportions of EV94 alleles in the Coastal population
Split loci stored in one column to two columns for each allele in a matrix of diploid data.
alleleSplit(x, sep = NULL)
alleleSplit(x, sep = NULL)
x |
a matrix or data.frame containing diploid data. Every column represents one locus with two alleles. |
sep |
separator used between alleles of a locus. If |
matrix with alleles for each locus in one column split into separate columns.
Eric Archer [email protected]
# A sample SNP data set with no separators between nucleotides in a genotype snps <- do.call(cbind, lapply(1:3, function(i) { a1 <- sample(c("A", "G"), 10, rep = TRUE) a2 <- sample(c("A", "G"), 10, rep = TRUE) paste(a1, a2, sep = "") })) colnames(snps) <- paste("Loc", LETTERS[1:3], sep = "_") snps alleleSplit(snps) # A sample microsatellie data set with alleles separated by "/" alleles <- seq(100, 150, 2) msats <- do.call(cbind, lapply(1:3, function(i) { a1 <- sample(alleles, 10, rep = TRUE) a2 <- sample(alleles, 10, rep = TRUE) paste(a1, "/", a2, sep = "") })) colnames(msats) <- paste("Loc", LETTERS[1:3], sep = "_") msats alleleSplit(msats, sep = "/")
# A sample SNP data set with no separators between nucleotides in a genotype snps <- do.call(cbind, lapply(1:3, function(i) { a1 <- sample(c("A", "G"), 10, rep = TRUE) a2 <- sample(c("A", "G"), 10, rep = TRUE) paste(a1, a2, sep = "") })) colnames(snps) <- paste("Loc", LETTERS[1:3], sep = "_") snps alleleSplit(snps) # A sample microsatellie data set with alleles separated by "/" alleles <- seq(100, 150, 2) msats <- do.call(cbind, lapply(1:3, function(i) { a1 <- sample(alleles, 10, rep = TRUE) a2 <- sample(alleles, 10, rep = TRUE) paste(a1, "/", a2, sep = "") })) colnames(msats) <- paste("Loc", LETTERS[1:3], sep = "_") msats alleleSplit(msats, sep = "/")
Calculate allelic richness for each locus.
allelicRichness(g, by.strata = FALSE)
allelicRichness(g, by.strata = FALSE)
g |
a gtypes object. |
by.strata |
logical - return results grouped by strata? |
a data frame with the allelic richness of each locus calculated as the number of alleles divided by the number of samples without missing data at that locus.
Eric Archer [email protected]
data(msats.g) allelicRichness(msats.g)
data(msats.g) allelicRichness(msats.g)
Read an Arlequin-formatted project input file (.arp). Convert
.arp data into gtypes
object. Write an input file from a
gtypes
object.
arlequinRead(file) arp2gtypes(arp, avoid.dups = FALSE) arlequinWrite(g, file = NULL, locus = 1, haploid.microsat = FALSE) read.arlequin(file) write.arlequin(g, file = NULL, locus = 1)
arlequinRead(file) arp2gtypes(arp, avoid.dups = FALSE) arlequinWrite(g, file = NULL, locus = 1, haploid.microsat = FALSE) read.arlequin(file) write.arlequin(g, file = NULL, locus = 1)
file |
filename of an arlequin project (.arp) file. See
|
arp |
a list of arlequin profile information and data as returned from
|
avoid.dups |
logical. Should sample identifiers be combined with
strata names to avoid duplicate identifiers between strata? If set to
|
g |
a gtypes object. |
locus |
numeric or character designation of which locus to write for haploid data. |
haploid.microsat |
logical. If |
arlequinRead |
parses a .arp file. |
arp2gtypes |
converts list from parsed .arp file to gtypes. |
arlequinWrite |
writes gtypes to .arp file. |
a list containing:
file |
name and full path of .arp file that was read. |
profile.info |
list containing parameters in [[Profile]]
section of .arp file. All parameters are provided. Parameters unset in
.arp file are set to default values. |
data.info |
list containing data from [[Data]] section
of .arp file. Can contain elements for haplotype.definition
(a data.frame), distance.matrix (a matrix), sample.data
(a data.frame), or genetic.structure (a list). |
a gtypes object.
the filename of the .arp file that was written.
arp2gtypes()
will not create a gtypes
object for
Arlequin projects with relative frequency data (DataType=FREQUENCY
and FREQUENCY=REL
). If DataType=DNA
and
GenotypicData=0
, sequences for each haplotype or individual are
assumed to be from a single locus.
Eric Archer [email protected]
Excoffier, L.G. Laval, and S. Schneider (2005)
Arlequin ver. 3.0: An integrated software package for population genetics
data analysis. Evolutionary Bioinformatics Online 1:47-50.
Available at http://cmpg.unibe.ch/software/arlequin3/
# write test microsat data .arp file f <- arlequinWrite(msats.g, tempfile()) # read .arp file and show structure msats.arp <- arlequinRead(f) print(str(msats.arp)) # convert parsed data to gtypes object msats.arp.g <- arp2gtypes(msats.arp) msats.arp.g # compare to original msats.g
# write test microsat data .arp file f <- arlequinWrite(msats.g, tempfile()) # read .arp file and show structure msats.arp <- arlequinRead(f) print(str(msats.arp)) # convert parsed data to gtypes object msats.arp.g <- arp2gtypes(msats.arp) msats.arp.g # compare to original msats.g
gtypes
to data.frame or matrixCreate a formatted data.frame or matrix from a gtypes object.
## S4 method for signature 'gtypes' as.data.frame( x, one.col = FALSE, sep = "/", ids = TRUE, strata = TRUE, sort.alleles = TRUE, coded.snps = FALSE, ref.allele = NULL, ... ) ## S4 method for signature 'gtypes' as.matrix( x, one.col = FALSE, sep = "/", ids = TRUE, strata = TRUE, sort.alleles = TRUE, ... )
## S4 method for signature 'gtypes' as.data.frame( x, one.col = FALSE, sep = "/", ids = TRUE, strata = TRUE, sort.alleles = TRUE, coded.snps = FALSE, ref.allele = NULL, ... ) ## S4 method for signature 'gtypes' as.matrix( x, one.col = FALSE, sep = "/", ids = TRUE, strata = TRUE, sort.alleles = TRUE, ... )
x |
a gtypes object. |
one.col |
logical. If |
sep |
character to use to separate alleles if |
ids |
logical. include a column for individual identifiers ( |
strata |
logical. include a column for current statification
( |
sort.alleles |
logical. for non-haploid objects, should alleles be
sorted in genotypes or left in original order? (only takes affect if
|
coded.snps |
return diploid SNPs coded as 0 (reference allele homozygote), 1 (heterozygote), or 2 (alternate allele homozygote). If this is 'TRUE', the data is diploid, and all loci are biallelic, a data frame of coded genotypes will be returned with one column per locus. |
ref.allele |
an optional vector of reference alleles for each SNP.
Only used if 'coded.snps = TRUE'. If provided, it must be at least as long
as there are biallelic SNPs in |
... |
additional arguments to be passed to or from methods. |
A data.frame
or matrix
with one row per individual.
Eric Archer [email protected]
data(msats.g) # with defaults (alleles in multiple columns, with ids and stratification) df <- as.data.frame(msats.g) str(df) # one column per locus onecol.df <- as.data.frame(msats.g, one.col = TRUE) str(onecol.df) # just the genotypes genotypes.df <- as.data.frame(msats.g, one.col = TRUE, ids = FALSE, strata = FALSE) str(genotypes.df) # as a matrix instead genotypes.mat <- as.matrix(msats.g) str(genotypes.mat)
data(msats.g) # with defaults (alleles in multiple columns, with ids and stratification) df <- as.data.frame(msats.g) str(df) # one column per locus onecol.df <- as.data.frame(msats.g, one.col = TRUE) str(onecol.df) # just the genotypes genotypes.df <- as.data.frame(msats.g, one.col = TRUE, ids = FALSE, strata = FALSE) str(genotypes.df) # as a matrix instead genotypes.mat <- as.matrix(msats.g) str(genotypes.mat)
Convert a set of sequences to a multidna object if possible.
as.multidna(x, ...)
as.multidna(x, ...)
x |
a valid set of sequences: character matrix, list of
character vectors, |
... |
arguments to pass to |
Eric Archer [email protected]
# convert list of character vectors data(dolph.seqs) list.mdna <- as.multidna(dolph.seqs) list.mdna # convert gtypes object data(dloop.g) gtype.mdna <- as.multidna(dloop.g) gtype.mdna
# convert list of character vectors data(dolph.seqs) list.mdna <- as.multidna(dolph.seqs) list.mdna # convert gtypes object data(dloop.g) gtype.mdna <- as.multidna(dloop.g) gtype.mdna
Calculate nucleotide base frequencies along a sequence.
baseFreqs(x, bases = NULL, ignore = c("n", "x", "-", "."), simplify = TRUE)
baseFreqs(x, bases = NULL, ignore = c("n", "x", "-", "."), simplify = TRUE)
x |
a gtypes object with aligned sequences or a list of aligned DNA sequences. |
bases |
character vector of bases. Must contain valid IUPAC codes. If
|
ignore |
a character vector of bases to ignore when calculating site frequencies. |
simplify |
if there is a single locus, return result in a simplified
form? If |
For each gene, a list containing:
site.freqs |
a matrix of base frequencies at each site. |
base.freqs |
a vector of overall base proportion composition. |
Eric Archer [email protected]
data(dloop.g) bf <- baseFreqs(dloop.g) # Frequencies of first 10 sites bf$site.freqs[, 1:10] # Base composition bf$base.freqs
data(dloop.g) bf <- baseFreqs(dloop.g) # Frequencies of first 10 sites bf$site.freqs[, 1:10] # Base composition bf$base.freqs
A data.frame of position information for SNPs to be phased
data(bowhead.snp.position)
data(bowhead.snp.position)
data.frame
Morin, P.A., Archer, F.I., Pease, V.L., Hancock-Hanser, B.L., Robertson, K.M., Huebinger, R.M., Martien, K.K., Bickham, J.W., George, J.C., Postma, L.D., Taylor, B.L., 2012. Empirical comparison of single nucleotide polymorphisms and microsatellites for population and demographic analyses of bowhead whales. Endangered Species Research 19, 129-147.
A data.frame of 42 SNPs with sample ids and stratification
data(bowhead.snps)
data(bowhead.snps)
data.frame
Morin, P.A., Archer, F.I., Pease, V.L., Hancock-Hanser, B.L., Robertson, K.M., Huebinger, R.M., Martien, K.K., Bickham, J.W., George, J.C., Postma, L.D., Taylor, B.L., 2012. Empirical comparison of single nucleotide polymorphisms and microsatellites for population and demographic analyses of bowhead whales. Endangered Species Research 19, 129-147.
Run CLUMPP to aggregate multiple STRUCTURE runs.
clumpp( sr, k, align.algorithm = "greedy", sim.stat = "g", greedy.option = "ran.order", repeats = 100, order.by.run = 0, label = NULL, delete.files = TRUE )
clumpp( sr, k, align.algorithm = "greedy", sim.stat = "g", greedy.option = "ran.order", repeats = 100, order.by.run = 0, label = NULL, delete.files = TRUE )
sr |
result from |
|||||
k |
choice of k in |
|||||
align.algorithm |
algorithm to be used for aligning the runs. Can be "full.search", "greedy", or "large.k". |
|||||
sim.stat |
pairwise matrix similarity statistic to be used. Can be "g" or "g.prime". |
|||||
greedy.option |
input order of runs to be tested. Required if
|
|||||
repeats |
the number of input orders of runs to be tested. Only used if
|
|||||
order.by.run |
permute the clusters according to the cluster order of
a specific run. Set this parameter to a number from 1 to the number of
runs in |
|||||
label |
label to use for input and output files. |
|||||
delete.files |
logical. Delete all files when CLUMPP is finished? |
CLUMPP is not included with strataG
and must be downloaded
separately. Additionally, it must be installed such that it can be run from
the command line in the current working directory. See the vignette
for external.programs
.
Eric Archer [email protected]
Mattias Jakobsson and Noah A. Rosenberg. 2007. CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23(14):1801-1806. Available at http://web.stanford.edu/group/rosenberglab/clumppDownload.html
Return a consensus sequence from set of aligned sequences, introducing IUPAC ambiguity codes where necessary.
createConsensus(x, ignore.gaps = FALSE, simplify = TRUE)
createConsensus(x, ignore.gaps = FALSE, simplify = TRUE)
x |
a gtypes object with aligned sequences or a list of aligned DNA sequences. |
ignore.gaps |
logical. Ignore gaps at a site when creating consensus.
If |
simplify |
if there is a single locus, return result in a simplified
form? If |
A character vector of the consensus sequence.
Eric Archer [email protected]
data(dolph.seqs) createConsensus(dolph.seqs)
data(dolph.seqs) createConsensus(dolph.seqs)
Load allelic data from a data.frame or matrix into a gtypes object.
df2gtypes( x, ploidy, id.col = 1, strata.col = 2, loc.col = 3, sequences = NULL, schemes = NULL, description = NULL, other = NULL )
df2gtypes( x, ploidy, id.col = 1, strata.col = 2, loc.col = 3, sequences = NULL, schemes = NULL, description = NULL, other = NULL )
x |
a matrix or data.frame of genetic data. |
ploidy |
number of number of columns in |
id.col |
column name or number where individual sample ids are stored.
If |
strata.col |
column name or number where stratification is stored. If
|
loc.col |
column number of first allele of first locus. |
sequences |
a list, matrix, |
schemes |
an optional data.frame of stratification schemes. |
description |
a label for the object (optional). |
other |
a list to carry other related information (optional). |
The genetic data in x
starting at loc.col
should be
formatted such that every consecutive ploidy
columns represent
alleles of one locus. Locus names are taken from the column names in
x
and should be formatted with the same root locus name, with
unique suffixes representing allels (e.g., for Locus1234: Locus1234.1
and Locus1234.2, or Locus1234_A and Locus1234_B).
If sequences are provided in sequences
, then they should be named
and match haplotype labels in loc.col
of x
. If multiple
genes are given as a multidna, then they should have the
same names as column names in x
from loc.col
to the end.
a gtypes object.
Eric Archer [email protected]
gtypes.initialize, sequence2gtypes, as.data.frame.gtypes, gtypes2genind, gtypes2loci, gtypes2phyDat
#--- create a diploid (microsatellite) gtypes object data(dolph.msats) ms.g <- df2gtypes(dolph.msats, ploidy = 2, strata.col = NULL, loc.col = 2) ms.g #' #--- create a haploid sequence (mtDNA) gtypes object data(dolph.strata) data(dolph.haps) seq.df <- dolph.strata[ c("id", "broad", "dLoop")] dl.g <- df2gtypes(seq.df, ploidy = 1, sequences = dolph.haps) dl.g
#--- create a diploid (microsatellite) gtypes object data(dolph.msats) ms.g <- df2gtypes(dolph.msats, ploidy = 2, strata.col = NULL, loc.col = 2) ms.g #' #--- create a haploid sequence (mtDNA) gtypes object data(dolph.strata) data(dolph.haps) seq.df <- dolph.strata[ c("id", "broad", "dLoop")] dl.g <- df2gtypes(seq.df, ploidy = 1, sequences = dolph.haps) dl.g
A gtypes object of 126 samples and 33 haplotypes.
data(dloop.g)
data(dloop.g)
gtypes
Lowther-Thieleking J.L., F.I. Archer, A.R. Lang, and D.W. Weller. 2015. Genetic variation of coastal and offshore bottlenose dolphins, Tursiops truncatus, in the eastern North Pacific Ocean. Marine Mammal Science 31:1-20
A list of 33 aligned d-loop haplotypes
data(dolph.haps)
data(dolph.haps)
list
Lowther-Thieleking J.L., F.I. Archer, A.R. Lang, and D.W. Weller. 2015. Genetic variation of coastal and offshore bottlenose dolphins, Tursiops truncatus, in the eastern North Pacific Ocean. Marine Mammal Science 31:1-20
A data.frame of 126 samples and 4 microsatellite loci
data(dolph.msats)
data(dolph.msats)
data.frame
Lowther-Thieleking J.L., F.I. Archer, A.R. Lang, and D.W. Weller. 2015. Genetic variation of coastal and offshore bottlenose dolphins, Tursiops truncatus, in the eastern North Pacific Ocean. Marine Mammal Science 31:1-20
A list of 126 aligned control region sequences
data(dolph.seqs)
data(dolph.seqs)
list
Lowther-Thieleking J.L., F.I. Archer, A.R. Lang, and D.W. Weller. 2015. Genetic variation of coastal and offshore bottlenose dolphins, Tursiops truncatus, in the eastern North Pacific Ocean. Marine Mammal Science 31:1-20
A data.frame of 126 samples with assignment of samples to either broad-scale or fine-scale stratifications and mtDNA haplotype designations
data(dolph.strata)
data(dolph.strata)
data.frame
Lowther-Thieleking J.L., F.I. Archer, A.R. Lang, and D.W. Weller. 2015. Genetic variation of coastal and offshore bottlenose dolphins, Tursiops truncatus, in the eastern North Pacific Ocean. Marine Mammal Science 31:1-20
Identify duplicate or very similar genotypes.
dupGenotypes(g, num.shared = 0.8)
dupGenotypes(g, num.shared = 0.8)
g |
a gtypes object. |
num.shared |
either number of loci or percentage of loci two individuals must share to be considered duplicate individuals. |
if no duplicates are present, the result is NULL
, otherwise
a data frame with the following columns is returned:
ids.1, ids.2 |
sample ids. |
strata.1, strata.2 |
sample stratification. |
mismatch.loci |
loci where the two samples do not match. |
num.loci.genotyped |
number of loci genotyped for both samples. |
num.loci.shared |
number of loci shared (all alleles the same) between both samples. |
prop.loci.shared |
proportion of loci genotyped for both samples that are shared. |
Eric Archer [email protected]
data(msats.g) # identify potential duplicates in Coastal strata coastal <- msats.g[, , "Coastal"] coastal.5 <- coastal[getIndNames(coastal)[1:5], , ] dupes <- dupGenotypes(coastal.5) dupes
data(msats.g) # identify potential duplicates in Coastal strata coastal <- msats.g[, , "Coastal"] coastal.5 <- coastal[getIndNames(coastal)[1:5], , ] dupes <- dupGenotypes(coastal.5) dupes
Calculate first and second order rates of changes of LnPr(K) from STRUCTURE results based on Evanno et al. 2005.
evanno(sr, plot = TRUE)
evanno(sr, plot = TRUE)
sr |
output from a call to |
plot |
logical. Generate a plot of Evanno metrics? |
a list with:
df |
data.frame with Evanno log-likelihood metrics for each value of K. |
plots |
list of four ggplot objects for later plotting. |
Eric Archer [email protected]
Evanno, G., Regnaut, S., and J. Goudet. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology 14:2611-2620.
## Not run: data(msats.g) # Run STRUCTURE sr <- structureRun(msats, k.range = 1:4, num.k.rep = 10) # Calculate Evanno metrics evno <- evanno(sr) evno ## End(Not run)
## Not run: data(msats.g) # Run STRUCTURE sr <- structureRun(msats, k.range = 1:4, num.k.rep = 10) # Calculate Evanno metrics evno <- evanno(sr) evno ## End(Not run)
Expand haplotypes to a single sequence per individual.
expandHaplotypes(g)
expandHaplotypes(g)
g |
a haploid gtypes object with sequences. |
a gtypes
object with sequences expanded and renamed so there
is one sequence per individual. Sequence names are set to individual sample
IDs.
Eric Archer [email protected]
data(dloop.g) # Haplotypes have already been labelled dloop.g # Haplotypes expanded to individual sequences (num.alleles == num.samples) expanded.g <- expandHaplotypes(dloop.g) expanded.g
data(dloop.g) # Haplotypes have already been labelled dloop.g # Haplotypes expanded to individual sequences (num.alleles == num.samples) expanded.g <- expandHaplotypes(dloop.g) expanded.g
Read and write FASTA formatted files of sequences.
read.fasta(file) write.fasta(x, file = NULL)
read.fasta(file) write.fasta(x, file = NULL)
file |
a FASTA-formatted file of sequences. |
x |
a list or a matrix of DNA sequences (see |
a set of sequences in DNAbin format
invisbly, name(s) of file(s) written
Eric Archer [email protected]
Summarize fixed base pair differences between strata.
fixedDifferences( g, count.indels = TRUE, consec.indels.as.one = TRUE, bases = c("a", "c", "g", "t", "-") )
fixedDifferences( g, count.indels = TRUE, consec.indels.as.one = TRUE, bases = c("a", "c", "g", "t", "-") )
g |
a gtypes object. |
count.indels |
logical. Count indels when evaluating sites for fixed differences? |
consec.indels.as.one |
logical. If |
bases |
a character vector of valid bases to consider. |
a list with components:
list of sites with fixed differences for each pair of strata
data.frame of number of sites fixed between each pair of strata
Eric Archer <[email protected]>
data(dloop.g) fd <- fixedDifferences(dloop.g) fd
data(dloop.g) fd <- fixedDifferences(dloop.g) fd
Identify fixed sites among sequences.
fixedSites(x, bases = c("a", "c", "g", "t", "-"), simplify = TRUE)
fixedSites(x, bases = c("a", "c", "g", "t", "-"), simplify = TRUE)
x |
a |
bases |
a character vector of valid bases to consider. |
simplify |
if there is a single locus, return result in a simplified
form? If |
a vector of fixed sites. Element names are site positions in the original sequence.
Eric Archer <[email protected]>
data(dolph.haps) fixedSites(dolph.haps)
data(dolph.haps) fixedSites(dolph.haps)
Create a data frame of stratified individuals and their haplotypes from a frequency table
freq2GenData( freq.mat, hap.col = NULL, freq.col = 1, id.label = NULL, hap.label = NULL )
freq2GenData( freq.mat, hap.col = NULL, freq.col = 1, id.label = NULL, hap.label = NULL )
freq.mat |
a matrix or data.frame containing haplotypic frequencies with strata as column names. |
hap.col |
a number giving the column providing haplotype
labels or a vector the same length as freq.mat. If |
freq.col |
a number giving the first column containing haplotype frequencies. |
id.label |
character to label sample IDs with in resulting data.frame. |
hap.label |
character to label haplotypes with in resulting data.frame. |
a data frame with one row per sample and columns for id, strata, and haplotype, suitable for use in df2gtypes.
Eric Archer [email protected]
hap.freqs <- data.frame( haps = c("hap1", "hap2", "hap3"), pop1 = rmultinom(1, 50, prob = c(0.1, 0.2, 0.7)), pop2 = rmultinom(1, 25, prob = c(0.5, 0.4, 0.1)) ) gen.data <- freq2GenData(hap.freqs, hap.col = 1, freq.col = 2) x <- df2gtypes(gen.data, ploidy = 1) summary(x)
hap.freqs <- data.frame( haps = c("hap1", "hap2", "hap3"), pop1 = rmultinom(1, 50, prob = c(0.1, 0.2, 0.7)), pop2 = rmultinom(1, 25, prob = c(0.5, 0.4, 0.1)) ) gen.data <- freq2GenData(hap.freqs, hap.col = 1, freq.col = 2) x <- df2gtypes(gen.data, ploidy = 1) summary(x)
These functions specify and format simulation parameters used to write fastsimcoal2 parameter or template files, parameter estimation files, parameter definition files, and site frequency spectrum files.
fscDeme(deme.size, sample.size, sample.time = 0, inbreeding = 0, growth = 0) fscSettingsDemes(..., ploidy = 2) fscEvent( event.time = 0, source = 0, sink = 0, prop.migrants = 1, new.size = 1, new.growth = 0, migr.mat = 0 ) fscSettingsEvents(...) fscSettingsMigration(...) fscBlock_dna( sequence.length, mut.rate, recomb.rate = 0, transition.rate = 1/3, chromosome = 1 ) fscBlock_microsat( num.loci, mut.rate, recomb.rate = 0, gsm.param = 0, range.constraint = 0, chromosome = 1 ) fscBlock_snp(sequence.length, mut.rate, recomb.rate = 0, chromosome = 1) fscBlock_standard(num.loci, mut.rate, recomb.rate = 0, chromosome = 1) fscBlock_freq(mut.rate, outexp = TRUE) fscSettingsGenetics(..., num.chrom = NULL) fscEstParam( name, is.int = TRUE, distr = c("unif", "logunif"), min = NA, max = NA, value = NA, output = TRUE, bounded = FALSE, reference = FALSE ) fscSettingsEst(..., obs.sfs, rules = NULL, sfs.type = c("maf", "daf")) fscSettingsDef(mat)
fscDeme(deme.size, sample.size, sample.time = 0, inbreeding = 0, growth = 0) fscSettingsDemes(..., ploidy = 2) fscEvent( event.time = 0, source = 0, sink = 0, prop.migrants = 1, new.size = 1, new.growth = 0, migr.mat = 0 ) fscSettingsEvents(...) fscSettingsMigration(...) fscBlock_dna( sequence.length, mut.rate, recomb.rate = 0, transition.rate = 1/3, chromosome = 1 ) fscBlock_microsat( num.loci, mut.rate, recomb.rate = 0, gsm.param = 0, range.constraint = 0, chromosome = 1 ) fscBlock_snp(sequence.length, mut.rate, recomb.rate = 0, chromosome = 1) fscBlock_standard(num.loci, mut.rate, recomb.rate = 0, chromosome = 1) fscBlock_freq(mut.rate, outexp = TRUE) fscSettingsGenetics(..., num.chrom = NULL) fscEstParam( name, is.int = TRUE, distr = c("unif", "logunif"), min = NA, max = NA, value = NA, output = TRUE, bounded = FALSE, reference = FALSE ) fscSettingsEst(..., obs.sfs, rules = NULL, sfs.type = c("maf", "daf")) fscSettingsDef(mat)
deme.size |
the number of individuals in the deme. |
sample.size |
the number of samples to take. |
sample.time |
the number of generations in the past at which samples are taken. |
inbreeding |
the inbreeding coefficient for the deme |
growth |
the growth rate of the deme. |
... |
a set of comma-separated values for settings. See Notes for more information. |
ploidy |
the desired ploidy of the final data. |
event.time |
the number of generations before present at which the historical event happened. |
source |
the source deme (the first listed deme has index 0). |
sink |
the sink deme. |
prop.migrants |
the expected proportion of migrants to move from the source to the sink deme. |
new.size |
the new size for the sink deme, relative to its size in the previous (later in time) generation. |
new.growth |
the new growth rate for the sink deme. |
migr.mat |
the number of the new migration matrix to be used further
back in time. The matrices are those supplied to the
|
sequence.length |
number of base pairs to use for each block. |
mut.rate |
per base pair or locus mutation rate. |
recomb.rate |
recombination rate between adjacent markers. No effect for SNPs. |
transition.rate |
dna: fraction of substitutions that are transitions. |
chromosome |
number or character identifying which chromosome the marker is on. |
num.loci |
number of loci to simulate. |
gsm.param |
value of the geometric parameter for a Generalized Stepwise Mutation (GSM) model. This value represents the proportion of mutations that will change the allele size by more than one step. Values between 0 and 1 are required. A value of 0 is for a strict Stepwise Mutation Model (SMM). |
range.constraint |
|
outexp |
logical describing if the expected site frequency spectrum given the estimated parameters should be output? |
num.chrom |
the number of chromosomes to be simulated. If this is
specified and not the same as the number of linkage blocks specified by the
|
name |
name of the parameter being specified. Must match a name used in one of the simulation settings functions. |
is.int |
logical specifying whether or not the parameter is an integer. |
distr |
a character string giving the distribution to use to select initial values for
parameter estimation. Can be |
min , max
|
minimum and maximum values for the distribution specified in |
value |
character string giving the value that the complex parameter is to take. |
output |
logical indicating if estimates for the parameter should be output. |
bounded |
logical indicating whether to treat the parameter as a bounded estimate. |
reference |
logical indicating whether the parameter is to be used as a reference. |
obs.sfs |
vector, matrix, or list containing observed SFS to use for parameter estimation. |
rules |
character vector giving rules for the parameter estimation. |
sfs.type |
type of SFS to write. Can be |
mat |
numeric matrix or data frame with values of parameters to use in place of parameter names in simulation. |
All settings must be passed to fscWrite
using one
of the fscSettingsXXX
functions. Most of these functions in turn
take as their input comma-separated values which are the result of specific
fscXXX
functions:
comma-separated instances of fscDeme(). If names are given for each deme, these names will be used in the parsed output.
comma-separated instances of fscEvent().
comma-separated migration matrices.
comma-separated instances of fscBlock_dna(), fscBlock_microsat(), fscBlock_snp(), fscBlock_standard(), or fscBlock_freq(). SNPs are simulated as a DNA sequence with a transiton rate of 1. 'fscBlock_freq()' can only be used by itself and in parameter estimation simulations.
comma-separated instances of fscEstParam()
as well as site frequency spectra (obs.sfs
) and
parameter rules rules
.
fastsimcoal2
is not included with 'strataG' and must be downloaded
separately. Additionally, it must be installed such that it can be run
from the command line in the current working directory.
The function fscTutorial()
will open a detailed tutorial on the
interface in your web browser.
Eric Archer [email protected]
Excoffier, L. and Foll, M (2011) fastsimcoal: a continuous-time
coalescent simulator of genomic diversity under arbitrarily complex
evolutionary scenarios Bioinformatics 27: 1332-1334.
Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V.C.,
and M. Foll (2013) Robust demographic inference from genomic and SNP data.
PLOS Genetics, 9(10):e1003905.
http://cmpg.unibe.ch/software/fastsimcoal2/
# three demes with optional names demes <- fscSettingsDemes( Large = fscDeme(10000, 10), Small = fscDeme(2500, 10), Medium = fscDeme(5000, 3, 1500) ) # four historic events events <- fscSettingsEvents( fscEvent(event.time = 2000, source = 1, sink = 2, prop.migrants = 0.05), fscEvent(2980, 1, 1, 0, 0.04), fscEvent(3000, 1, 0), fscEvent(15000, 0, 2, new.size = 3) ) # four genetic blocks of different types on three chromosomes. genetics <- fscSettingsGenetics( fscBlock_snp(10, 1e-6, chromosome = 1), fscBlock_dna(10, 1e-5, chromosome = 1), fscBlock_microsat(3, 1e-4, chromosome = 2), fscBlock_standard(5, 1e-3, chromosome = 3) ) #' same four genetic blocks of different types with same structure repeated three times. genetics <- fscSettingsGenetics( fscBlock_snp(10, 1e-6), fscBlock_dna(10, 1e-5), fscBlock_microsat(3, 1e-4), fscBlock_standard(5, 1e-3), num.chrom = 3 )
# three demes with optional names demes <- fscSettingsDemes( Large = fscDeme(10000, 10), Small = fscDeme(2500, 10), Medium = fscDeme(5000, 3, 1500) ) # four historic events events <- fscSettingsEvents( fscEvent(event.time = 2000, source = 1, sink = 2, prop.migrants = 0.05), fscEvent(2980, 1, 1, 0, 0.04), fscEvent(3000, 1, 0), fscEvent(15000, 0, 2, new.size = 3) ) # four genetic blocks of different types on three chromosomes. genetics <- fscSettingsGenetics( fscBlock_snp(10, 1e-6, chromosome = 1), fscBlock_dna(10, 1e-5, chromosome = 1), fscBlock_microsat(3, 1e-4, chromosome = 2), fscBlock_standard(5, 1e-3, chromosome = 3) ) #' same four genetic blocks of different types with same structure repeated three times. genetics <- fscSettingsGenetics( fscBlock_snp(10, 1e-6), fscBlock_dna(10, 1e-5), fscBlock_microsat(3, 1e-4), fscBlock_standard(5, 1e-3), num.chrom = 3 )
Read arlequin formatted output or parameter estimation files generated by fastsimcoal
fscReadArp( p, sim = c(1, 1), marker = c("all", "snp", "microsat", "dna", "standard"), chrom = NULL, sep.chrom = FALSE, drop.mono = FALSE, as.genotypes = TRUE, one.col = FALSE, sep = "/", coded.snps = FALSE ) fscReadParamEst(p) fscReadSFS(p, sim = 1) fsc2gtypes(p, marker = c("dna", "snp", "microsat"), concat.dna = TRUE, ...)
fscReadArp( p, sim = c(1, 1), marker = c("all", "snp", "microsat", "dna", "standard"), chrom = NULL, sep.chrom = FALSE, drop.mono = FALSE, as.genotypes = TRUE, one.col = FALSE, sep = "/", coded.snps = FALSE ) fscReadParamEst(p) fscReadSFS(p, sim = 1) fsc2gtypes(p, marker = c("dna", "snp", "microsat"), concat.dna = TRUE, ...)
p |
list of fastsimcoal parameters output from |
sim |
one or two-element numberic vector giving the number of the
simulation replicate (and sub-replicate) to read. For example, |
marker |
type of marker to return. |
chrom |
numerical vector giving chromosomes to return. If |
sep.chrom |
return a list of separate chromosomes? |
drop.mono |
return only polymorphic loci? |
as.genotypes |
return data as genotypes? If |
one.col |
return genotypes with one column per locus? If |
sep |
character to use to separate alleles if |
coded.snps |
return diploid SNPs coded as 0 (major allele homozygote), 1
(heterozygote), or 2 (minor allele homozygote). If this is |
concat.dna |
logical. concatenate multiple DNA blocks into single locus? |
... |
arguments to be passed to |
Reads and parses Arlequin-formatted .arp output files
created by fastsimcoal2
. Returns a data frame of genotypes, with
individuals created by combining haplotypes based on the stored value of
ploidy specified when the simulation was run.
Reads and parses files output from a
fastsimcoal2
run conducted for parameter estimation. Returns a list
of data frames and vectors containing the data from each file.
Reads site frequency spectra generated from
fastsimcoal2
. Returns a list of the marginal and joint SFS, the
polymorphic sites, and the estimated maximum likelihood of the SFS."
Creates a gtypes object from fastsimcoal2 output.
fastsimcoal2
is not included with 'strataG' and must be
downloaded separately. Additionally, it must be installed such that it can
be run from the command line in the current working directory.
The function fscTutorial()
will open a detailed tutorial on the
interface in your web browser.
Eric Archer [email protected]
Excoffier, L. and Foll, M (2011) fastsimcoal: a continuous-time
coalescent simulator of genomic diversity under arbitrarily complex
evolutionary scenarios Bioinformatics 27: 1332-1334.
Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V.C.,
and M. Foll (2013) Robust demographic inference from genomic and SNP data.
PLOS Genetics, 9(10):e1003905.
http://cmpg.unibe.ch/software/fastsimcoal2/
## Not run: #' # three demes with optional names demes <- fscSettingsDemes( Large = fscDeme(10000, 10), Small = fscDeme(2500, 10), Medium = fscDeme(5000, 3, 1500) ) # four historic events events <- fscSettingsEvents( fscEvent(event.time = 2000, source = 1, sink = 2, prop.migrants = 0.05), fscEvent(2980, 1, 1, 0, 0.04), fscEvent(3000, 1, 0), fscEvent(15000, 0, 2, new.size = 3) ) # four genetic blocks of different types on three chromosomes. genetics <- fscSettingsGenetics( fscBlock_snp(10, 1e-6, chromosome = 1), fscBlock_dna(10, 1e-5, chromosome = 1), fscBlock_microsat(3, 1e-4, chromosome = 2), fscBlock_standard(5, 1e-3, chromosome = 3) ) params <- fscWrite(demes = demes, events = events, genetics = genetics) # runs 100 replicates, converting all DNA sequences to 0/1 SNPs # will also output the MAF site frequency spectra (SFS) for all SNP loci. params <- fscRun(params, num.sim = 100, dna.to.snp = TRUE, num.cores = 3) # extracting only microsattelite loci from simulation replicate 1 msats <- fscReadArp(params, marker = "microsat") # read SNPs from simulation replicate 5 with genotypes coded as 0/1 snp.5 <- fscReadArp(params, sim = 1, marker = "snp", coded.snps = TRUE # read SFS for simulation 20 sfs.20 <- fscReadSFS(params, sim = 20) ## End(Not run)
## Not run: #' # three demes with optional names demes <- fscSettingsDemes( Large = fscDeme(10000, 10), Small = fscDeme(2500, 10), Medium = fscDeme(5000, 3, 1500) ) # four historic events events <- fscSettingsEvents( fscEvent(event.time = 2000, source = 1, sink = 2, prop.migrants = 0.05), fscEvent(2980, 1, 1, 0, 0.04), fscEvent(3000, 1, 0), fscEvent(15000, 0, 2, new.size = 3) ) # four genetic blocks of different types on three chromosomes. genetics <- fscSettingsGenetics( fscBlock_snp(10, 1e-6, chromosome = 1), fscBlock_dna(10, 1e-5, chromosome = 1), fscBlock_microsat(3, 1e-4, chromosome = 2), fscBlock_standard(5, 1e-3, chromosome = 3) ) params <- fscWrite(demes = demes, events = events, genetics = genetics) # runs 100 replicates, converting all DNA sequences to 0/1 SNPs # will also output the MAF site frequency spectra (SFS) for all SNP loci. params <- fscRun(params, num.sim = 100, dna.to.snp = TRUE, num.cores = 3) # extracting only microsattelite loci from simulation replicate 1 msats <- fscReadArp(params, marker = "microsat") # read SNPs from simulation replicate 5 with genotypes coded as 0/1 snp.5 <- fscReadArp(params, sim = 1, marker = "snp", coded.snps = TRUE # read SFS for simulation 20 sfs.20 <- fscReadSFS(params, sim = 20) ## End(Not run)
Run a fastsimcoal simulation.
fscRun( p, num.sims = 1, dna.to.snp = FALSE, max.snps = 0, sfs.type = c("maf", "daf"), nonpar.boot = NULL, all.sites = TRUE, inf.sites = FALSE, no.arl.output = FALSE, num.loops = 20, min.num.loops = 20, brentol = 0.01, trees = FALSE, num.cores = 1, seed = NULL, quiet = TRUE, exec = "fsc26" ) fscCleanup(label, folder = ".") fscTutorial()
fscRun( p, num.sims = 1, dna.to.snp = FALSE, max.snps = 0, sfs.type = c("maf", "daf"), nonpar.boot = NULL, all.sites = TRUE, inf.sites = FALSE, no.arl.output = FALSE, num.loops = 20, min.num.loops = 20, brentol = 0.01, trees = FALSE, num.cores = 1, seed = NULL, quiet = TRUE, exec = "fsc26" ) fscCleanup(label, folder = ".") fscTutorial()
p |
list of fastsimcoal input parameters and output produced by fscWrite. |
num.sims |
number of simulation replicates to run. |
dna.to.snp |
convert DNA sequences to numerical SNPs? |
max.snps |
maximum number of SNPs to retain. |
sfs.type |
type of site frequency spectrum to compute for each population sample: 'daf' = derived allele frequency (unfolded), 'maf' = minor allele frequency (folded). |
nonpar.boot |
number of bootstraps to perform on polymorphic sites to extract SFS. |
all.sites |
retain all sites? If |
inf.sites |
use infinite sites model? If |
no.arl.output |
do not output arlequin files. |
num.loops |
number of loops (ECM cycles) to be performed when estimating parameters from SFS. Default is 20. |
min.num.loops |
number of loops (ECM cycles) for which the likelihood is computed on both monomorphic and polymorphic sites. Default is 20. |
brentol |
Tolerance level for Brent optimization. Smaller value imply more precise estimations, but require more computation time. Default = 0.01. Value is restricted between 1e-5 and 1e-1. |
trees |
output NEXUS formatted coalescent trees for all replicates? |
num.cores |
number of cores to use. If set to |
seed |
random number seed for simulation. |
quiet |
logical indicating if fastsimcoal2 should be run in quiet mode. |
exec |
name of fastsimcoal executable. |
label |
character string of file run labels prefixes. |
folder |
character string giving the root working folder where input files and output resides |
Runs the fastsimcoal2
simulation and returns a
list containing run parameters and a data frame used by
fscRead
to parse the genotypes generated (if
Arlequin-formatted output was requested).
Deletes all files associated with the simulation
identified by label
.
fastsimcoal2
is not included with 'strataG' and must be
downloaded separately. Additionally, it must be installed such that it can
be run from the command line in the current working directory.
The function fscTutorial()
will open a detailed tutorial on the
interface in your web browser.
Eric Archer [email protected]
Excoffier, L. and Foll, M (2011) fastsimcoal: a continuous-time
coalescent simulator of genomic diversity under arbitrarily complex
evolutionary scenarios Bioinformatics 27: 1332-1334.
Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V.C.,
and M. Foll (2013) Robust demographic inference from genomic and SNP data.
PLOS Genetics, 9(10):e1003905.
http://cmpg.unibe.ch/software/fastsimcoal2/
## Not run: #' # three demes with optional names demes <- fscSettingsDemes( Large = fscDeme(10000, 10), Small = fscDeme(2500, 10), Medium = fscDeme(5000, 3, 1500) ) # four historic events events <- fscSettingsEvents( fscEvent(event.time = 2000, source = 1, sink = 2, prop.migrants = 0.05), fscEvent(2980, 1, 1, 0, 0.04), fscEvent(3000, 1, 0), fscEvent(15000, 0, 2, new.size = 3) ) # four genetic blocks of different types on three chromosomes. genetics <- fscSettingsGenetics( fscBlock_snp(10, 1e-6, chromosome = 1), fscBlock_dna(10, 1e-5, chromosome = 1), fscBlock_microsat(3, 1e-4, chromosome = 2), fscBlock_standard(5, 1e-3, chromosome = 3) ) params <- fscWrite(demes = demes, events = events, genetics = genetics) # runs 100 replicates, converting all DNA sequences to 0/1 SNPs # will also output the MAF site frequency spectra (SFS) for all SNP loci. params <- fscRun(params, num.sim = 100, dna.to.snp = TRUE, num.cores = 3) ## End(Not run)
## Not run: #' # three demes with optional names demes <- fscSettingsDemes( Large = fscDeme(10000, 10), Small = fscDeme(2500, 10), Medium = fscDeme(5000, 3, 1500) ) # four historic events events <- fscSettingsEvents( fscEvent(event.time = 2000, source = 1, sink = 2, prop.migrants = 0.05), fscEvent(2980, 1, 1, 0, 0.04), fscEvent(3000, 1, 0), fscEvent(15000, 0, 2, new.size = 3) ) # four genetic blocks of different types on three chromosomes. genetics <- fscSettingsGenetics( fscBlock_snp(10, 1e-6, chromosome = 1), fscBlock_dna(10, 1e-5, chromosome = 1), fscBlock_microsat(3, 1e-4, chromosome = 2), fscBlock_standard(5, 1e-3, chromosome = 3) ) params <- fscWrite(demes = demes, events = events, genetics = genetics) # runs 100 replicates, converting all DNA sequences to 0/1 SNPs # will also output the MAF site frequency spectra (SFS) for all SNP loci. params <- fscRun(params, num.sim = 100, dna.to.snp = TRUE, num.cores = 3) ## End(Not run)
Write files necessary to run a fastsimcoal2
simulation.
fscWrite( demes, genetics, migration = NULL, events = NULL, est = NULL, def = NULL, label = "strataG.fsc", use.wd = FALSE )
fscWrite( demes, genetics, migration = NULL, events = NULL, est = NULL, def = NULL, label = "strataG.fsc", use.wd = FALSE )
demes |
matrix of deme sampling information created by the
|
genetics |
data.frame specifying loci to simulate created by the
|
migration |
a list of matrices giving the migration rates
between pairs of demes created by the |
events |
matrix of historical events created by the
|
est |
list of parameter estimation definitions and rules generated by
the |
def |
matrix of parameter values to substitute into the model generated
by the |
label |
character string used to label output files for the simulation. |
use.wd |
use current working directory for input and output? If
|
Writes input files for fastsimcoal2
and returns a list of
input parameters, input file, and input filenames. This list is the primary
input to fscRun
.
fastsimcoal2
is not included with 'strataG' and must be
downloaded separately. Additionally, it must be installed such that it can
be run from the command line in the current working directory.
The function fscTutorial()
will open a detailed tutorial on the
interface in your web browser.
Eric Archer [email protected]
Excoffier, L. and Foll, M (2011) fastsimcoal: a continuous-time
coalescent simulator of genomic diversity under arbitrarily complex
evolutionary scenarios Bioinformatics 27: 1332-1334.
Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V.C.,
and M. Foll (2013) Robust demographic inference from genomic and SNP data.
PLOS Genetics, 9(10):e1003905.
http://cmpg.unibe.ch/software/fastsimcoal2/
## Not run: #' # three demes with optional names demes <- fscSettingsDemes( Large = fscDeme(10000, 10), Small = fscDeme(2500, 10), Medium = fscDeme(5000, 3, 1500) ) # four historic events events <- fscSettingsEvents( fscEvent(event.time = 2000, source = 1, sink = 2, prop.migrants = 0.05), fscEvent(2980, 1, 1, 0, 0.04), fscEvent(3000, 1, 0), fscEvent(15000, 0, 2, new.size = 3) ) # four genetic blocks of different types on three chromosomes. genetics <- fscSettingsGenetics( fscBlock_snp(10, 1e-6, chromosome = 1), fscBlock_dna(10, 1e-5, chromosome = 1), fscBlock_microsat(3, 1e-4, chromosome = 2), fscBlock_standard(5, 1e-3, chromosome = 3) ) params <- fscWrite(demes = demes, events = events, genetics = genetics) ## End(Not run)
## Not run: #' # three demes with optional names demes <- fscSettingsDemes( Large = fscDeme(10000, 10), Small = fscDeme(2500, 10), Medium = fscDeme(5000, 3, 1500) ) # four historic events events <- fscSettingsEvents( fscEvent(event.time = 2000, source = 1, sink = 2, prop.migrants = 0.05), fscEvent(2980, 1, 1, 0, 0.04), fscEvent(3000, 1, 0), fscEvent(15000, 0, 2, new.size = 3) ) # four genetic blocks of different types on three chromosomes. genetics <- fscSettingsGenetics( fscBlock_snp(10, 1e-6, chromosome = 1), fscBlock_dna(10, 1e-5, chromosome = 1), fscBlock_microsat(3, 1e-4, chromosome = 2), fscBlock_standard(5, 1e-3, chromosome = 3) ) params <- fscWrite(demes = demes, events = events, genetics = genetics) ## End(Not run)
Calculate Fu's Fs for a set of sequences to test for selection.
fusFs(x)
fusFs(x)
x |
set of DNA sequences or a haploid gtypes object with sequences. |
Currently, this function is limited to calculating Fs for fewer than
approximately 172 sequences due to numerical overflow issues. NaN
will be returned for larger data sets. Statistical significance (p-values)
of Fs must be calculated with case-specific simulations.
Eric Archer [email protected]
Fu, Y-X. 1997. Statistical tests of neutrality of mutations against population growth, hitchiking and background selection. Genetics 147:915-925.
data(dolph.seqs) fusFs(dolph.seqs)
data(dolph.seqs) fusFs(dolph.seqs)
Run a GELATo test to evaluate assignment likelihoods of groups of samples.
gelato(g, unknown.strata, nrep = 1000, min.sample.size = 5, num.cores = 1) gelatoPlot(gelato.result, unknown = NULL, main = NULL)
gelato(g, unknown.strata, nrep = 1000, min.sample.size = 5, num.cores = 1) gelatoPlot(gelato.result, unknown = NULL, main = NULL)
g |
a gtypes object. |
unknown.strata |
a character vector listing to assign. Strata must
occur in |
nrep |
number of permutation replicates for Fst distribution. |
min.sample.size |
minimum number of samples to use to characterize knowns. If the known sample size would be smaller than this after drawing an equivalent number of unknowns for self-assignment, then the comparison is not done. |
num.cores |
The number of cores to use to distribute replicates over.
If set to |
gelato.result |
the result of a call to |
unknown |
the names of the unknown strata in the |
main |
main label for top of plot.#' |
A list with the following elements:
assign.prob |
a data.frame of assignment probabilities. |
likelihoods |
a list of likelihoods. |
Eric Archer [email protected]
O'Corry-Crowe, G., W. Lucey, F.I. Archer, and B. Mahoney. 2015. The genetic ecology and population origins of the beluga whales of Yakutat Bay. Marine Fisheries Review 77(1):47-58
## Not run: data(msats.g) # Run GELATo analysis gelato.fine <- gelato(msats.g, unk = "Offshore.South", nrep = 20, num.cores = 2) gelato.fine # Plot results gelatoPlot(gelato.fine, "Offshore.South") ## End(Not run)
## Not run: data(msats.g) # Run GELATo analysis gelato.fine <- gelato(msats.g, unk = "Offshore.South", nrep = 20, num.cores = 2) gelato.fine # Plot results gelatoPlot(gelato.fine, "Offshore.South") ## End(Not run)
Format output files and run GENEPOP. Filenames used are returned so that output files can be viewed or read and parsed into R.
genepop( g, output.ext = "", show.output = F, label = "genepop.run", dem = 10000, batches = 100, iter = 5000, other.settings = "", input.fname = "loc_data.txt", exec = "Genepop" ) genepopWrite(g, label = NULL)
genepop( g, output.ext = "", show.output = F, label = "genepop.run", dem = 10000, batches = 100, iter = 5000, other.settings = "", input.fname = "loc_data.txt", exec = "Genepop" ) genepopWrite(g, label = NULL)
g |
a |
output.ext |
character string to use as extension for output files. |
show.output |
logical. Show GENEPOP output on console? |
label |
character string to use to label GENEPOP input and output files. |
dem |
integer giving the number of MCMC dememorisation or burnin steps. |
batches |
integer giving number of MCMC batches. |
iter |
integer giving number of MCMC iterations. |
other.settings |
character string of optional GENEPOP command line arguments. |
input.fname |
character string to use for input file name. |
exec |
name of Genepop executable |
genepop
a list with a vector of the locus names and a vector of the input and output filenames.
genepopWrite
a list with the filename written and a vector mapping locus names in the file to the original locus names.
GENEPOP is not included with strataG
and must be downloaded
separately. Additionally, it must be installed such that it can be run from
the command line in the current working directory. See the vignette
for external.programs
for installation instructions.
Eric Archer [email protected]
GENEPOP 4.3 (08 July 2014; Rousset, 2008)
http://kimura.univ-montp2.fr/~rousset/Genepop.htm
## Not run: # Estimate Nm for the microsatellite data data(msats.g) # Run Genepop for Option 4 results <- genepop(msats.g, output.ext = ".PRI", other.settings = "MenuOptions=4") # Locus name mapping and files results # Show contents of output file file.show(results$files["output.fname"]) ## End(Not run)
## Not run: # Estimate Nm for the microsatellite data data(msats.g) # Run Genepop for Option 4 results <- genepop(msats.g, output.ext = ".PRI", other.settings = "MenuOptions=4") # Locus name mapping and files results # Show contents of output file file.show(results$files["output.fname"]) ## End(Not run)
gtypes
ClassAn S4 class storing multi-allelic locus or sequence data along with a current stratification and option stratification schemes.
data
a data.table where the first column contains the sample ID
(ids
). The second column contains the sample stratification
(strata
). The third column to the end contains the allelic data as
one column per locus. Alleles are on multiple rows per column with sample
IDs duplicated for all alleles. Column names are unique locus names.
sequences
a multidna object.
ploidy
integer representing the ploidy of the data. There are ploidy * the number of individuals rows in 'data'.
schemes
a data.frame with stratification schemes in each column. The rownames are individual names and must match the 'id' column of the 'data' slot. Each column is a factor.
description
a label for the object (optional).
other
a slot to carry other related information - currently unused in analyses (optional).
Eric Archer [email protected]
df2gtypes
, sequence2gtypes
,
gtypes.accessors
, gtypes.initialize
#--- create a diploid (microsatellite) gtypes object data(dolph.msats) data(dolph.strata) strata.schemes <- dolph.strata[, c("broad", "fine")] rownames(strata.schemes) <- dolph.strata$id msats.g <- new("gtypes", gen.data = dolph.msats[, -1], ploidy = 2, ind.names = dolph.msats[, 1], schemes = strata.schemes) msats.g #--- create a haploid sequence (mtDNA) gtypes object and label haplotypes data(dolph.seqs) dloop.haps <- cbind(dLoop = dolph.strata$id) rownames(dloop.haps) <- dolph.strata$id dloop.g <- new("gtypes", gen.data = dloop.haps, ploidy = 1, schemes = strata.schemes, sequences = dolph.seqs, strata = "fine") dloop.g labelHaplotypes(dloop.g, "Hap.")
#--- create a diploid (microsatellite) gtypes object data(dolph.msats) data(dolph.strata) strata.schemes <- dolph.strata[, c("broad", "fine")] rownames(strata.schemes) <- dolph.strata$id msats.g <- new("gtypes", gen.data = dolph.msats[, -1], ploidy = 2, ind.names = dolph.msats[, 1], schemes = strata.schemes) msats.g #--- create a haploid sequence (mtDNA) gtypes object and label haplotypes data(dolph.seqs) dloop.haps <- cbind(dLoop = dolph.strata$id) rownames(dloop.haps) <- dolph.strata$id dloop.g <- new("gtypes", gen.data = dloop.haps, ploidy = 1, schemes = strata.schemes, sequences = dolph.seqs, strata = "fine") dloop.g labelHaplotypes(dloop.g, "Hap.")
gtypes
AccessorsAccessors for slots in gtypes objects.
## S4 method for signature 'gtypes' getNumInd(x, by.strata = FALSE, ...) ## S4 method for signature 'gtypes' getNumLoci(x, ...) getNumStrata(x, ...) ## S4 method for signature 'gtypes' getNumStrata(x, ...) getIndNames(x, ...) ## S4 method for signature 'gtypes' getIndNames(x, by.strata = FALSE, ...) getLociNames(x, ...) ## S4 method for signature 'gtypes' getLociNames(x, ...) getAlleleNames(x, ...) ## S4 method for signature 'gtypes' getAlleleNames(x, ...) getStrataNames(x, ...) ## S4 method for signature 'gtypes' getStrataNames(x, ...) getPloidy(x, ...) ## S4 method for signature 'gtypes' getPloidy(x, ...) getStrata(x, ...) ## S4 method for signature 'gtypes' getStrata(x) setStrata(x) <- value ## S4 replacement method for signature 'gtypes' setStrata(x) <- value getSchemes(x, ...) ## S4 method for signature 'gtypes' getSchemes(x, ...) setSchemes(x) <- value ## S4 replacement method for signature 'gtypes' setSchemes(x) <- value getSequences(x, ...) ## S4 method for signature 'gtypes' getSequences( x, as.haplotypes = FALSE, seqName = NULL, as.multidna = FALSE, simplify = TRUE, ... ) getDescription(x, ...) ## S4 method for signature 'gtypes' getDescription(x, ...) setDescription(x) <- value ## S4 replacement method for signature 'gtypes' setDescription(x) <- value getOther(x, ...) ## S4 method for signature 'gtypes' getOther(x, value = NULL, ...) setOther(x, name) <- value ## S4 replacement method for signature 'gtypes' setOther(x, name) <- value ## S4 method for signature 'gtypes,ANY,ANY,ANY' x[i, j, k, ..., quiet = TRUE, drop = FALSE]
## S4 method for signature 'gtypes' getNumInd(x, by.strata = FALSE, ...) ## S4 method for signature 'gtypes' getNumLoci(x, ...) getNumStrata(x, ...) ## S4 method for signature 'gtypes' getNumStrata(x, ...) getIndNames(x, ...) ## S4 method for signature 'gtypes' getIndNames(x, by.strata = FALSE, ...) getLociNames(x, ...) ## S4 method for signature 'gtypes' getLociNames(x, ...) getAlleleNames(x, ...) ## S4 method for signature 'gtypes' getAlleleNames(x, ...) getStrataNames(x, ...) ## S4 method for signature 'gtypes' getStrataNames(x, ...) getPloidy(x, ...) ## S4 method for signature 'gtypes' getPloidy(x, ...) getStrata(x, ...) ## S4 method for signature 'gtypes' getStrata(x) setStrata(x) <- value ## S4 replacement method for signature 'gtypes' setStrata(x) <- value getSchemes(x, ...) ## S4 method for signature 'gtypes' getSchemes(x, ...) setSchemes(x) <- value ## S4 replacement method for signature 'gtypes' setSchemes(x) <- value getSequences(x, ...) ## S4 method for signature 'gtypes' getSequences( x, as.haplotypes = FALSE, seqName = NULL, as.multidna = FALSE, simplify = TRUE, ... ) getDescription(x, ...) ## S4 method for signature 'gtypes' getDescription(x, ...) setDescription(x) <- value ## S4 replacement method for signature 'gtypes' setDescription(x) <- value getOther(x, ...) ## S4 method for signature 'gtypes' getOther(x, value = NULL, ...) setOther(x, name) <- value ## S4 replacement method for signature 'gtypes' setOther(x, name) <- value ## S4 method for signature 'gtypes,ANY,ANY,ANY' x[i, j, k, ..., quiet = TRUE, drop = FALSE]
x |
a gtypes object. |
by.strata |
logical - return results by strata? |
... |
other arguments passed from generics (ignored). |
value |
value being assigned by accessor. |
as.haplotypes |
return sequences as haplotypes? If |
seqName |
the name (or number) of a set of sequences from the
|
as.multidna |
return sequences as a multidna object? If
|
simplify |
if 'getSequences()' would return a single locus, return it as a 'DNAbin' object ('TRUE'), or a single element named list ('FALSE'). |
name |
name of the value going into the |
i , j , k
|
subsetting slots for individuals ( |
quiet |
suppress warnings about unmatched requested individuals, loci, or strata? |
drop |
if |
Indexing a gtypes
object with integers, characters, or
logicals with the [
operator follows the same rules as normal indexing
in R. The order that individuals, loci, and strata are chosen is the order
returned by getIndNames
, getLocNames
, and getStrataNames
respectively. If unstratified samples are present, they can be selected as a
group either by including NA
in the character or numeric vector of the
k
slot, or by providing a logical vector based on
is.na(strata(g))
to the i
slot.
number of individuals
number of loci
number of strata
vector of individual/sample names
vector of locus names
vector of strata names for current scheme
number of alleles at each locus
contents of
@other
slot
return or modify the current stratification
return or modify the current stratification schemes
return a list of alleles at each locus
return the multidna object in the
@sequences
slot. See getSequences
to extract
individual genes or sequences from this object
return the object's description
Eric Archer [email protected]
#--- create a diploid (microsatellite) gtypes object data(msats.g) msats.g <- stratify(msats.g, "fine") getNumStrata(msats.g) getStrataNames(msats.g) getNumLoci(msats.g) getLociNames(msats.g) # reassign all samples to two randomly chosen strata new.strata <- sample(c("A", "B"), getNumInd(msats.g), rep = TRUE) names(new.strata) <- getIndNames(msats.g) setStrata(msats.g) <- new.strata msats.g #--- a sequence example library(ape) data(woodmouse) genes <- list(gene1=woodmouse[,1:500], gene2=woodmouse[,501:965]) x <- new("multidna", genes) wood.g <- sequence2gtypes(x) new.strata <- sample(c("A", "B"), getNumInd(wood.g), rep = TRUE) names(new.strata) <- getIndNames(wood.g) setStrata(wood.g) <- new.strata wood.g # get the multidna sequence object multi.seqs <- getSequences(wood.g, as.multidna = TRUE) class(multi.seqs) # "multidna" # get a list of DNAbin objects dnabin.list <- getSequences(wood.g) class(dnabin.list) # "list" # get a DNAbin object of the first locus dnabin.1 <- getSequences(wood.g)[[1]] class(dnabin.1) # "DNAbin" # getting and setting values in the `other` slot: getOther(dloop.g) setOther(dloop.g, "timestamp") <- timestamp() setOther(dloop.g, "Author") <- "Hoban Washburne" getOther(dloop.g) getOther(dloop.g, "timestamp") setOther(dloop.g, "Author") <- NULL getOther(dloop.g)
#--- create a diploid (microsatellite) gtypes object data(msats.g) msats.g <- stratify(msats.g, "fine") getNumStrata(msats.g) getStrataNames(msats.g) getNumLoci(msats.g) getLociNames(msats.g) # reassign all samples to two randomly chosen strata new.strata <- sample(c("A", "B"), getNumInd(msats.g), rep = TRUE) names(new.strata) <- getIndNames(msats.g) setStrata(msats.g) <- new.strata msats.g #--- a sequence example library(ape) data(woodmouse) genes <- list(gene1=woodmouse[,1:500], gene2=woodmouse[,501:965]) x <- new("multidna", genes) wood.g <- sequence2gtypes(x) new.strata <- sample(c("A", "B"), getNumInd(wood.g), rep = TRUE) names(new.strata) <- getIndNames(wood.g) setStrata(wood.g) <- new.strata wood.g # get the multidna sequence object multi.seqs <- getSequences(wood.g, as.multidna = TRUE) class(multi.seqs) # "multidna" # get a list of DNAbin objects dnabin.list <- getSequences(wood.g) class(dnabin.list) # "list" # get a DNAbin object of the first locus dnabin.1 <- getSequences(wood.g)[[1]] class(dnabin.1) # "DNAbin" # getting and setting values in the `other` slot: getOther(dloop.g) setOther(dloop.g, "timestamp") <- timestamp() setOther(dloop.g, "Author") <- "Hoban Washburne" getOther(dloop.g) getOther(dloop.g, "timestamp") setOther(dloop.g, "Author") <- NULL getOther(dloop.g)
gtypes
And genind
objects.Convert a gtypes
object to a genind
object
and vice-versa.
gtypes2genind(x, type = c("codom", "PA")) genind2gtypes(x)
gtypes2genind(x, type = c("codom", "PA")) genind2gtypes(x)
x |
|
type |
a character string indicating the type of marker for genind objects: 'codom' stands for 'codominant' (e.g. microstallites, allozymes); 'PA' stands for 'presence/absence' markers (e.g. AFLP, RAPD). |
Eric Archer [email protected]
initialize.gtypes, df2gtypes, sequence2gtypes, as.data.frame.gtypes, gtypes2loci
data(msats.g) # Convert to genind gi <- gtypes2genind(msats.g) gi # Convert to gtypes gt <- genind2gtypes(gi) gt
data(msats.g) # Convert to genind gi <- gtypes2genind(msats.g) gi # Convert to gtypes gt <- genind2gtypes(gi) gt
gtypes
And genlight
objects.Convert a gtypes
object to a genlight
object
and vice-versa.
gtypes2genlight(x) genlight2gtypes(x)
gtypes2genlight(x) genlight2gtypes(x)
x |
Eric Archer [email protected]
initialize.gtypes, df2gtypes, sequence2gtypes, as.data.frame.gtypes, gtypes2loci, gtypes2genind
data(msats.g) # Create simple simulated SNPs gl1 <- adegenet::glSim(n.ind = 100, n.snp.nonstruc = 1000, ploidy = 2) gl1 # Convert to gtypes gt <- genlight2gtypes(gl1) gt # Convert back to genlight gl2 <- gtypes2genlight(gt) gl2
data(msats.g) # Create simple simulated SNPs gl1 <- adegenet::glSim(n.ind = 100, n.snp.nonstruc = 1000, ploidy = 2) gl1 # Convert to gtypes gt <- genlight2gtypes(gl1) gt # Convert back to genlight gl2 <- gtypes2genlight(gt) gl2
gtypes
And loci
objects.Convert a gtypes
object to a loci
object.
gtypes2loci(x, sep = "/") loci2gtypes(x, description = NULL, sep = "/")
gtypes2loci(x, sep = "/") loci2gtypes(x, description = NULL, sep = "/")
x |
a gtypes or |
sep |
character used to separate alleles at a locus. |
description |
a label for the |
Eric Archer [email protected]
initialize.gtypes, df2gtypes, sequence2gtypes, as.data.frame.gtypes, gtypes2genind
data(msats.g) # Convert to loci lc <- gtypes2loci(msats.g) lc # Convert to gtypes gt <- loci2gtypes(lc) gt
data(msats.g) # Convert to loci lc <- gtypes2loci(msats.g) lc # Convert to gtypes gt <- loci2gtypes(lc) gt
gtypes
And phyDat
objects.Convert a gtypes
object to a phyDat
object.
gtypes2phyDat(x, locus = 1) phyDat2gtypes(x, ...)
gtypes2phyDat(x, locus = 1) phyDat2gtypes(x, ...)
x |
a gtypes or |
locus |
name or number of locus to convert. |
... |
optional arguments passed to |
Eric Archer [email protected]
initialize.gtypes, df2gtypes, sequence2gtypes, as.data.frame.gtypes, as.matrix.gtypes, gtypes2genind, gtypes2loci
data(dloop.g) # Convert to phDat pd <- gtypes2phyDat(dloop.g) pd # Convert to gtypes gt <- phyDat2gtypes(pd) gt
data(dloop.g) # Convert to phDat pd <- gtypes2phyDat(dloop.g) pd # Convert to gtypes gt <- phyDat2gtypes(pd) gt
Calculate observed and heterozygosity.
heterozygosity(g, by.strata = FALSE, type = c("expected", "observed"))
heterozygosity(g, by.strata = FALSE, type = c("expected", "observed"))
g |
a gtypes object. |
by.strata |
logical - return results by strata? |
type |
return |
If g
is a haploid object with sequences, the value for
expected heterozygosity (= haplotpyic diversity) will be returned.
Eric Archer [email protected]
data(msats.g) # Expected heterozygosity heterozygosity(msats.g, type = "expected") # Observed heterozygosity by strata heterozygosity(msats.g, FALSE, "observed")
data(msats.g) # Expected heterozygosity heterozygosity(msats.g, type = "expected") # Observed heterozygosity by strata heterozygosity(msats.g, FALSE, "observed")
Calculate Hardy-Weinberg equilibrium p-values.
hweTest( g, use.genepop = FALSE, which = c("Proba", "excess", "deficit"), enumeration = FALSE, dememorization = 10000, batches = 20, num.rep = 5000, delete.files = TRUE, label = NULL )
hweTest( g, use.genepop = FALSE, which = c("Proba", "excess", "deficit"), enumeration = FALSE, dememorization = 10000, batches = 20, num.rep = 5000, delete.files = TRUE, label = NULL )
g |
a gtypes object. |
use.genepop |
logical. Use GENEPOP to calculate HWE p-values?
If |
which , enumeration , dememorization , batches
|
parameters for GENEPOP MCMC
HWE procedure as defined in |
num.rep |
the number of replicates for the Monte Carlo procedure for
|
delete.files |
logical. Delete GENEPOP files when done? |
label |
character string to use to label GENEPOP files. |
a vector of p-values for each locus.
Eric Archer [email protected]
data(msats.g) hweTest(msats.g)
data(msats.g) hweTest(msats.g)
gtypes
ConstructorCreate a new gtypes object using
new("gtypes", ...)
, where '...
' are arguments
documented below.
## S4 method for signature 'gtypes' initialize( .Object, gen.data, ploidy, ind.names = NULL, sequences = NULL, strata = NULL, schemes = NULL, description = NULL, other = NULL, remove.sequences = FALSE )
## S4 method for signature 'gtypes' initialize( .Object, gen.data, ploidy, ind.names = NULL, sequences = NULL, strata = NULL, schemes = NULL, description = NULL, other = NULL, remove.sequences = FALSE )
.Object |
the object skeleton, automatically generated when
calling |
gen.data |
a vector, matrix, or data.frame containing the alleles at each locus. See below for more details. |
ploidy |
ploidy of the loci. |
ind.names |
an optional vector of individual sample names. |
sequences |
an optional multidna object containing sequences represented by each locus. |
strata |
an optional stratification scheme from |
schemes |
an optional data.frame of stratification schemes. |
description |
an optional description for the object. |
other |
other optional information to include. |
remove.sequences |
logical. If |
For multi-allele loci, the gen.data
argument should be
formatted such that every consecutive ploidy
columns represent
alleles of one locus. Locus names are taken from the column names in
gen.data
and should be formatted with the same root locus name, with
unique suffixes representing alleles (e.g., for Locus1234: Locus1234.1
and Locus1234.2, or Locus1234_A and Locus1234_B).
If gen.data
is a vector it is assumed to represent haplotypes of a
haploid marker.
Sample names can be either in the rownames of gen.data
or given
separately in ind.names
. If ind.names
are provided, these are
used in lieu of rownames in gen.data
.
If schemes
has a column named 'id
', it will be used to match
to sample names in gen.data
. Otherwise, if rownames are present in
schemes
, a column named 'id
' will be created from them.
If sequences are provided in sequences
, then they should be named
and match values in the haplotype column in gen.data
. If multiple
genes are given as a multidna object, it is assumed that they
are in the same order as the columns in gen.data
.
Eric Archer [email protected]
gtypes
Test if object is gtypes
is.gtypes(x)
is.gtypes(x)
x |
R object to be tested. |
Logical stating if 'x' is a gtypes
object.
Eric Archer [email protected]
data(msats.g) is.gtypes(msats.g) # TRUE data(dolph.msats) is.gtypes(dolph.msats) # FALSE
data(msats.g) is.gtypes(msats.g) # TRUE data(dolph.msats) is.gtypes(dolph.msats) # FALSE
Calculate the correct IUPAC code for a vector of nucleotides.
iupacCode(bases, ignore.gaps = FALSE) validIupacCodes(bases) iupacMat()
iupacCode(bases, ignore.gaps = FALSE) validIupacCodes(bases) iupacMat()
bases |
character vector containing valid nucleotides or IUPAC codes. |
ignore.gaps |
logical. Ignore gaps at a site when creating consensus. If true, then bases with a gap are removed before consensus is calculated. If false and a gap is present, then the result is a gap. |
iupacCode |
a character representing the correct IUPAC code
bases . |
validIupacCodes |
a character vector of all valid IUPAC
codes for bases . |
iupacMat |
a logical matrix identifying valid IUPAC codes. |
Eric Archer [email protected]
iupacCode(c("a", "a", "g")) iupacCode(c("t", "c", "g")) validIupacCodes(c("c", "t", "c", "c")) validIupacCodes(c("c", "y", "c", "c")) validIupacCodes(c("a", "g", "t", "a"))
iupacCode(c("a", "a", "g")) iupacCode(c("t", "c", "g")) validIupacCodes(c("c", "t", "c", "c")) validIupacCodes(c("c", "y", "c", "c")) validIupacCodes(c("a", "g", "t", "a"))
Identify and group sequences that share the same haplotype.
labelHaplotypes(x, prefix = NULL, use.indels = TRUE) ## Default S3 method: labelHaplotypes(x, prefix = NULL, use.indels = TRUE) ## S3 method for class 'list' labelHaplotypes(x, ...) ## S3 method for class 'character' labelHaplotypes(x, ...) ## S3 method for class 'gtypes' labelHaplotypes(x, ...)
labelHaplotypes(x, prefix = NULL, use.indels = TRUE) ## Default S3 method: labelHaplotypes(x, prefix = NULL, use.indels = TRUE) ## S3 method for class 'list' labelHaplotypes(x, ...) ## S3 method for class 'character' labelHaplotypes(x, ...) ## S3 method for class 'gtypes' labelHaplotypes(x, ...)
x |
sequences in a |
prefix |
a character string giving prefix to be applied to numbered haplotypes. If NULL, haplotypes will be labeled with the first label from original sequences. |
use.indels |
logical. Use indels when comparing sequences? |
... |
arguments to be passed to |
If any sequences contain ambiguous bases (N's) they are first
removed. Then haplotypes are assigned based on the remaining
sequences. The sequences with N's that were removed are then assigned to
the new haplotypes if it can be done unambiguously (they match only one
haplotype with 0 differences once the N's have been removed). If this
can't be done they are assigned NAs and listed in the
unassigned
element.
For character
, list
, or DNAbin
, a list with the following elements:
named vector (DNAbin
) or list of named vectors
(multidna
) of haplotypes for each sequence in x
.
DNAbin
or multidna
object containing
sequences for each haplotype.
data.frame
listing closest matching haplotypes
for unassignable sequences with N's and the minimum number of
substitutions between the two. Will be NULL
if no sequences
remain unassigned.
For gtypes
, a new gtypes
object with unassigned individuals
stored in the @other slot in an element named 'haps.unassigned'
(see
getOther
).
Eric Archer [email protected]
# create 5 example short haplotypes haps <- c( H1 = "ggctagct", H2 = "agttagct", H3 = "agctggct", H4 = "agctggct", H5 = "ggttagct" ) # draw and label 100 samples sample.seqs <- sample(names(haps), 100, rep = TRUE) ids <- paste(sample.seqs, 1:length(sample.seqs), sep = "_") sample.seqs <- lapply(sample.seqs, function(x) strsplit(haps[x], "")[[1]]) names(sample.seqs) <- ids # add 1-2 random ambiguities with.error <- sample(1:length(sample.seqs), 10) for(i in with.error) { num.errors <- sample(1:2, 1) sites <- sample(1:length(sample.seqs[[i]]), num.errors) sample.seqs[[i]][sites] <- "n" } hap.assign <- labelHaplotypes(sample.seqs, prefix = "Hap.") hap.assign
# create 5 example short haplotypes haps <- c( H1 = "ggctagct", H2 = "agttagct", H3 = "agctggct", H4 = "agctggct", H5 = "ggttagct" ) # draw and label 100 samples sample.seqs <- sample(names(haps), 100, rep = TRUE) ids <- paste(sample.seqs, 1:length(sample.seqs), sep = "_") sample.seqs <- lapply(sample.seqs, function(x) strsplit(haps[x], "")[[1]]) names(sample.seqs) <- ids # add 1-2 random ambiguities with.error <- sample(1:length(sample.seqs), 10) for(i in with.error) { num.errors <- sample(1:2, 1) sites <- sample(1:length(sample.seqs[[i]]), num.errors) sample.seqs[[i]][sites] <- "n" } hap.assign <- labelHaplotypes(sample.seqs, prefix = "Hap.") hap.assign
'landscape2gtypes' creates a gtypes object from an Rmetasim landscape object. 'landscape2df' creates a data.frame.
landscape2gtypes(Rland) landscape2df(Rland)
landscape2gtypes(Rland) landscape2df(Rland)
Rland |
rmetasim landscape object |
Eric Archer [email protected]
Calculate linkage disequilibrium p-values using GENEPOP.
LDgenepop( g, dememorization = 10000, batches = 100, iterations = 5000, delete.files = TRUE, label = NULL )
LDgenepop( g, dememorization = 10000, batches = 100, iterations = 5000, delete.files = TRUE, label = NULL )
g |
a gtypes object. |
dememorization , batches , iterations
|
parameters for GENEPOP MCMC
LD procedure as defined in |
delete.files |
logical. Delete GENEPOP input and output files when done? |
label |
character string to use to label GENEPOP input and output files. |
data.frame of disequilibrium estimates between pairs of loci
Eric Archer [email protected]
## Not run: data(msats.g) msats.ld <- LDgenepop(msats.g) head(msats.ld) ## End(Not run)
## Not run: data(msats.g) msats.ld <- LDgenepop(msats.g) head(msats.ld) ## End(Not run)
Estimate Ne from linkage disequilibrium based on Pearson correlation approximation following Waples et al 2016. Adapted from code by R. Waples and W. Larson.
ldNe( g, maf.threshold = 0, by.strata = FALSE, ci = 0.95, drop.missing = FALSE, num.cores = 1 )
ldNe( g, maf.threshold = 0, by.strata = FALSE, ci = 0.95, drop.missing = FALSE, num.cores = 1 )
g |
a gtypes object. |
maf.threshold |
smallest minimum allele frequency permitted to include a locus in calculation of Ne. |
by.strata |
apply the 'maf.threshold' by strata. If 'TRUE' then loci that are below this threshold in any strata will be removed from the calculation of Ne for all strata. Loci below 'maf.threshold' within a stratum are always removed for calculations of Ne for that stratum. |
ci |
central confidence interval. |
drop.missing |
drop loci with missing genotypes? If 'FALSE', a slower procedure is used where individuals with missing genotypes are removed in a pairwise fashion. |
num.cores |
The number of cores to use to distribute computations over.
If set to |
a data.frame with one row per strata and the following columns:
stratum
stratum being summarized
S
harmonic mean of sample size across pairwise comparisons of loci
num.comp
number of pairwise loci comparisons used
mean.rsq
mean r^2 over all loci
mean.E.rsq
mean expected r^2 over all loci
Ne
estimated Ne
param.lci, param.uci
parametric lower and upper CIs
Eric Archer [email protected]
Waples, R.S. 2006. A bias correction for estimates of effective
population size based on linkage disequilibrium at unlinked gene loci.
Conservation Genetics 7:167-184.
Waples RK, Larson WA, and Waples RS. 2016. Estimating contemporary
effective population size in non-model species using linkage
disequilibrium across thousands of loci. Heredity 117:233-240;
doi:10.1038/hdy.2016.60
Check nucleotide sites for low frequency substitutions.
lowFreqSubs(x, min.freq = 3, motif.length = 10, simplify = TRUE)
lowFreqSubs(x, min.freq = 3, motif.length = 10, simplify = TRUE)
x |
a |
min.freq |
minimum frequency of base to be flagged. |
motif.length |
length of motif around low frequency base to output. |
simplify |
if there is a single locus, return result in a simplified
form? If |
data.frame listing id, site number, and motif around low frequency base call.
Eric Archer [email protected]
data(dolph.haps) lowFreqSubs(dolph.haps)
data(dolph.haps) lowFreqSubs(dolph.haps)
Calculate minor allele frequencies for each locus.
maf(g, by.strata = FALSE, maf.within = FALSE)
maf(g, by.strata = FALSE, maf.within = FALSE)
g |
a gtypes object. |
by.strata |
logical - return results grouped by strata? |
maf.within |
if |
A vector or matrix of minor allele frequencies at each locus.
Eric Archer [email protected]
data(msats.g) maf(msats.g) # minor allele identified from all indivudals maf(msats.g, by.strata = TRUE) # minor allele identified within each strata maf(msats.g, by.strata = TRUE, maf.within = TRUE)
data(msats.g) maf(msats.g) # minor allele identified from all indivudals maf(msats.g, by.strata = TRUE) # minor allele identified within each strata maf(msats.g, by.strata = TRUE, maf.within = TRUE)
Align a set of sequences using the MAFFT executable.
mafft( x, op = 3, ep = 0.123, maxiterate = 0, quiet = TRUE, num.cores = 1, opts = "--auto", simplify = TRUE )
mafft( x, op = 3, ep = 0.123, maxiterate = 0, quiet = TRUE, num.cores = 1, opts = "--auto", simplify = TRUE )
x |
a list or a matrix of DNA sequences
(see |
op |
gap opening penalty. |
ep |
offset value, which works like gap extension penalty. |
maxiterate |
number cycles of iterative refinement are performed. |
quiet |
logical. Run MAFFT quietly? |
num.cores |
The number of cores to use. If set to |
opts |
character string other options to provide to command line. |
simplify |
if |
a DNAbin
object with aligned sequences.
MAFFT is not included with strataG
and must be downloaded
separately. Additionally, it must be installed such that it can be run from
the command line in the current working directory. See the vignette
for external.programs
for installation instructions.
Eric Archer [email protected]
Katoh, M., Kumar, M. 2002. MAFFT: a novel method for rapid multiple sequence
alignment based on fast Fourier transform. Nucleic Acids Res. 30:3059-3066.
Available at: http://mafft.cbrc.jp/alignment/software
## Not run: data(dolph.seqs) dolph.aln <- mafft(dolph.seqs, op = 3, ep = 2) dolph.aln ## End(Not run)
## Not run: data(dolph.seqs) dolph.aln <- mafft(dolph.seqs, op = 3, ep = 2) dolph.aln ## End(Not run)
Run MavericK clustering algorithm
maverickRun( g, params = NULL, label = "MavericK_files", data_fname = "data.txt", param_fname = "parameters.txt", exec = "Maverick1.0.5" )
maverickRun( g, params = NULL, label = "MavericK_files", data_fname = "data.txt", param_fname = "parameters.txt", exec = "Maverick1.0.5" )
g |
a gtypes object. |
params |
a list specifying parameters for MavericK. All parameters are available and can be specified by partial matching. The function will automatically specify parameters related to data formatting (data, headerRow_on, missingData, ploidy, ploidyCol_on, popCol_on), so those will be ignored. For a full list of available parameters and their definitions, see the MavericK documentation distributed with the program. |
label |
folder where input and output files will be written to. |
data_fname |
file name of data input file. |
param_fname |
file name of parameters file. |
exec |
name of executable for MavericK. |
MavericK is not included with strataG
and must be downloaded
separately. It can be obtained from http://www.bobverity.com/.
Additionally, it must be installed such that it can be run from
the command line in the current working directory. See the vignette
for external.programs
for OS-specific installation instructions.
Eric Archer [email protected]
Robert Verity and Richard Nichols. (2016) Estimating the number of
subpopulations (K) in structured populations. Genetics
Robert Verity and Richard Nichols. (2016) Documentation for MavericK
software: Version 1.0
Read and write MEGA formatted files.
read.mega(file) write.mega( g, file = NULL, label = NULL, line.width = 60, locus = 1, as.haplotypes = TRUE )
read.mega(file) write.mega( g, file = NULL, label = NULL, line.width = 60, locus = 1, as.haplotypes = TRUE )
file |
a MEGA-formatted file of sequences. |
g |
a gtypes object. |
label |
label for MEGA filename (.meg). If |
line.width |
width of sequence lines. |
locus |
number or name of locus to write. |
as.haplotypes |
output sequences as haplotypes? If |
for read.mega
, a list of:
title of MEGA file
DNA sequences in DNAbin
format
Eric Archer [email protected]
Sudhir Kumar, Glen Stecher, and Koichiro Tamura (2015) MEGA7: Molecular Evolutionary Genetics Analysis version 7.0. Molecular Biology and Evolution (submitted). Available at: http://www.megasoftware.net
Finds the set of sequences that are on the edges of the cloud of distances. These are the ones that have the greatest mean pairwise distance and greatest variance in distances.
mostDistantSequences( x, num.seqs = NULL, model = "raw", pairwise.deletion = TRUE, as.haplotypes = TRUE, simplify = TRUE )
mostDistantSequences( x, num.seqs = NULL, model = "raw", pairwise.deletion = TRUE, as.haplotypes = TRUE, simplify = TRUE )
x |
a set of sequences or a gtypes object with sequences. |
num.seqs |
number of sequences to return. If |
model |
a character string specifying the evolutionary model to be used. See dist.dna for more information. |
pairwise.deletion |
a logical indicating whether to delete sites with missing data. See dist.dna for more information. |
as.haplotypes |
treat sequences as haplotypes ( |
simplify |
if there is a single locus, return result in a simplified
form? If |
a vector of the num.seqs
sequence names that are the most
divergent sorted from greatest to least distant.
Eric Archer [email protected]
data(dolph.haps) mostDistantSequences(dolph.haps, 5)
data(dolph.haps) mostDistantSequences(dolph.haps, 5)
Finds the set of sequences that represent the requested number of clusters.
mostRepresentativeSequences( x, num.seqs = NULL, model = "raw", pairwise.deletion = TRUE, as.haplotypes = TRUE, simplify = TRUE )
mostRepresentativeSequences( x, num.seqs = NULL, model = "raw", pairwise.deletion = TRUE, as.haplotypes = TRUE, simplify = TRUE )
x |
a |
num.seqs |
number of sequences to return. If |
model |
a character string specifying the evolutionary model to be used. See dist.dna for more information. |
pairwise.deletion |
a logical indicating whether to delete sites with missing data. See dist.dna for more information. |
as.haplotypes |
treat sequences as haplotypes ( |
simplify |
if there is a single locus, return result in a simplified
form? If |
a vector of the sequence names.
Eric Archer [email protected]
data(dolph.seqs) mostRepresentativeSequences(dolph.seqs, 5) mostRepresentativeSequences(dolph.seqs, 3)
data(dolph.seqs) mostRepresentativeSequences(dolph.seqs, 5) mostRepresentativeSequences(dolph.seqs, 3)
Calculate Garza-Williamson M ratio (bottleneck) statistic for microsattelite data.
mRatio(g, by.strata = FALSE, rpt.size = 8:2)
mRatio(g, by.strata = FALSE, rpt.size = 8:2)
g |
a gtypes object. |
by.strata |
calculate ratio for each stratum separately? |
rpt.size |
set of values to check for allele repeat size. Function will
use the largest common denominator found in this vector or return
|
The function will only compute the metric for microastellite loci,
which is defined as loci with allele labels that can be converted to
numeric values in their entirety and have a fixed repeat size. NA
is
returned for all loci that do not have all numeric alleles.
NA
will also be returned if a locus is monomorphic, the locus has no
genotypes, or a minimum repeat size cannot be found for all alleles at a
locus.
Eric Archer [email protected]
Garza, J.C. and E.G. Williamson. 2001. Detection of reduction in population size using data form microsatellite loci. Molecular Ecology 10(2):305-318.
data(msats.g) m.by.strata <- mRatio(msats.g, TRUE) m.by.strata m.overall <- mRatio(msats.g, FALSE) m.overall
data(msats.g) m.by.strata <- mRatio(msats.g, TRUE) m.by.strata m.overall <- mRatio(msats.g, FALSE) m.overall
A gtypes object of 126 samples and 4 microsatellite loci
data(msats.g)
data(msats.g)
gtypes
Lowther-Thieleking J.L., F.I. Archer, A.R. Lang, and D.W. Weller. 2015. Genetic variation of coastal and offshore bottlenose dolphins, Tursiops truncatus, in the eastern North Pacific Ocean. Marine Mammal Science 31:1-20
Calcuate frequency-based Nei's Da for haploid or diploid data.
neiDa(g)
neiDa(g)
g |
a gtypes object. |
Returns Nei's Da for each pair of strata.
Eric Archer [email protected]
Nei et al 1983 Accuracy of Estimated Phylogenetic Trees from
Molecular Data. J Mol Evol 19:153-170 (eqn 7)
Nei, M., and S. Kumar (2000) Molecular Evolution and Phylogenetics.
Oxford University Press, Oxford. (pp. 268, eqn 13.6)
data(msats.g) neiDa(msats.g)
data(msats.g) neiDa(msats.g)
Calculate distributions of between- and within-strata
nucleotide divergence (sequence distance), which includes
Nei's (usually referred to as "nucleotide diversity") and
Nei's dA between strata.
nucleotideDivergence(g, probs = c(0, 0.025, 0.5, 0.975, 1), model = "raw", ...)
nucleotideDivergence(g, probs = c(0, 0.025, 0.5, 0.975, 1), model = "raw", ...)
g |
a gtypes object. |
probs |
a numeric vector of probabilities of the pairwise distance
distributions with values in |
model |
evolutionary model to be used. see |
... |
other arguments passed to |
a list with summaries of the $within
and $between
strata
pairwise distances including Nei's dA (in $between
).
Nei's is the mean between-strata divergence.
Eric Archer [email protected]
Nei, M., and S. Kumar (2000) Molecular Evolution and Phylogenetics. Oxford University Press, Oxford. (dA: pp. 256, eqn 12.67)
data(dloop.g) nd <- nucleotideDivergence(dloop.g) nd$within nd$between
data(dloop.g) nd <- nucleotideDivergence(dloop.g) nd$within nd$between
Calculate nucleotide diversity for set of sequences. Note that
this is NOT Nei's nucleotide diversity
(usually referred to as ). Nei's
is the mean number
of nucleotide differences between sequences. See
nucleotideDivergence
for this value.
nucleotideDiversity(x, bases = c("a", "c", "g", "t"), simplify = TRUE)
nucleotideDiversity(x, bases = c("a", "c", "g", "t"), simplify = TRUE)
x |
a set of sequences or a gtypes object with sequences. |
bases |
nucleotides to consider when calculating diversity. |
simplify |
if |
Vector of diversity of nucleotides by site.
Eric Archer [email protected]
data(dloop.g) nd <- nucleotideDiversity(dloop.g) quantile(nd)
data(dloop.g) nd <- nucleotideDiversity(dloop.g) quantile(nd)
Return the number of alleles for each locus.
numAlleles(g, by.strata = FALSE)
numAlleles(g, by.strata = FALSE)
g |
a gtypes object. |
by.strata |
logical - return results grouped by strata? |
vector of number of alleles per locus.
Eric Archer [email protected]
data(msats.g) numAlleles(msats.g)
data(msats.g) numAlleles(msats.g)
Return the number of individuals genotyped for each locus.
numGenotyped(g, by.strata = FALSE, prop = FALSE)
numGenotyped(g, by.strata = FALSE, prop = FALSE)
g |
a gtypes object. |
by.strata |
logical - return results grouped by strata? |
prop |
logical determining whether to return proportion missing. |
vector of number of alleles per locus.
Eric Archer [email protected]
data(msats.g) numGenotyped(msats.g)
data(msats.g) numGenotyped(msats.g)
Calculate the number of individuals with missing data by locus.
numMissing(g, by.strata = FALSE, prop = FALSE)
numMissing(g, by.strata = FALSE, prop = FALSE)
g |
a gtypes object. |
by.strata |
logical - return results grouped by strata? |
prop |
logical determining whether to return proportion missing. |
a vector of loci with number (or, if prop = TRUE
,
the proportion) of individuals missing data for at least one allele.
Eric Archer [email protected]
data(msats.g) numMissing(msats.g) numMissing(msats.g, prop = TRUE)
data(msats.g) numMissing(msats.g) numMissing(msats.g, prop = TRUE)
Permute the strata slot within a gtypes object.
permuteStrata(g)
permuteStrata(g)
g |
a gtypes object. |
a gtypes object with the strata randomly permuted.
Eric Archer [email protected]
data(msats.g) msats.g <- stratify(msats.g, "fine") summary(msats.g) ran.msats <- permuteStrata(msats.g) summary(ran.msats)
data(msats.g) msats.g <- stratify(msats.g, "fine") summary(msats.g) ran.msats <- permuteStrata(msats.g) summary(ran.msats)
Run PHASE to estimate the phase of loci in diploid data.
phase( g, loci, positions = NULL, type = NULL, num.iter = 1e+05, thinning = 100, burnin = 1e+05, model = "new", ran.seed = NULL, final.run.factor = NULL, save.posterior = FALSE, in.file = "phase_in", out.file = "phase_out", delete.files = TRUE ) phaseReadSample(out.file, type) phaseReadPair(out.file) phaseWrite( g, loci, positions = NULL, type = rep("S", length(loci)), in.file = "phase_in" ) phasePosterior(ph.res, keep.missing = TRUE) phaseFilter(ph.res, thresh = 0.5, keep.missing = TRUE)
phase( g, loci, positions = NULL, type = NULL, num.iter = 1e+05, thinning = 100, burnin = 1e+05, model = "new", ran.seed = NULL, final.run.factor = NULL, save.posterior = FALSE, in.file = "phase_in", out.file = "phase_out", delete.files = TRUE ) phaseReadSample(out.file, type) phaseReadPair(out.file) phaseWrite( g, loci, positions = NULL, type = rep("S", length(loci)), in.file = "phase_in" ) phasePosterior(ph.res, keep.missing = TRUE) phaseFilter(ph.res, thresh = 0.5, keep.missing = TRUE)
g |
a gtypes object. |
loci |
vector or data.frame of loci in 'g' that are to be phased. If a
data.frame, it should have columns named
|
positions |
position along chromosome of each locus. |
type |
type of each locus. |
num.iter |
number of PHASE MCMC iterations. |
thinning |
number of PHASE MCMC iterations to thin by. |
burnin |
number of PHASE MCMC iterations for burnin. |
model |
PHASE model type. |
ran.seed |
PHASE random number seed. |
final.run.factor |
optional. |
save.posterior |
logical. Save posterior sample in output list? |
in.file |
name to use for PHASE input file. |
out.file |
name to use for PHASE output files. |
delete.files |
logical. Delete PHASE input and output files when done? |
ph.res |
result from |
keep.missing |
logical. T = keep missing data from original data set. F = Use estimated genotypes from PHASE. |
thresh |
minimum probability for a genotype to be selected (0.5 - 1). |
phase |
runs PHASE assuming that the executable is installed properly and available on the command line. |
phaseWrite |
writes a PHASE formatted file. |
phaseReadPair |
reads the '_pair' output file. |
phaseReadSample |
reads the '_sample' output file. |
phaseFilter |
filters the result from phase.run to
extract one genotype for each sample. |
phasePosterior |
create a data.frame of all genotypes for each posterior sample. |
a list containing:
locus.name |
new locus name, which is a combination of loci in group. |
gtype.probs |
a data.frame listing the estimated genotype for every sample along with probability. |
orig.gtypes |
the original gtypes object for the composite loci. |
posterior |
a list of num.iter data.frames
representing posterior sample of genotypes for each sample. |
a list with the input filename and the gtypes object used.
a data.frame of genotype probabilities.
a list of data.frames representing the posterior sample of genotypes for one set of loci for each sample.
a matrix of genotypes for each sample.
a list of data.frames representing the posterior sample of all genotypes for each sample.
PHASE is not included with strataG
and must be downloaded
separately. Additionally, it must be installed such that it can be run from
the command line in the current working directory. See the vignette
for external.programs
for installation instructions.
Eric Archer [email protected]
Stephens, M., and Donnelly, P. (2003). A comparison of Bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics 73:1162-1169. Available at: http://stephenslab.uchicago.edu/software.html#phase
## Not run: data(bowhead.snps) data(bowhead.snp.position) snps <- df2gtypes(bowhead.snps, ploidy = 2, description = "Bowhead SNPS") summary(snps) # Run PHASE on all data phase.results <- phase(snps, bowhead.snp.position, num.iter = 100, save.posterior = FALSE) # Filter phase results filtered.results <- phaseFilter(phase.results, thresh = 0.5) # Convert phased genotypes to gtypes ids <- rownames(filtered.results) strata <- bowhead.snps$Stock[match(ids, bowhead.snps$LABID)] filtered.df <- cbind(id = ids, strata = strata, filtered.results) phased.snps <- df2gtypes(filtered.df, ploidy = 2, description = "Bowhead phased SNPs") summary(phased.snps) ## End(Not run)
## Not run: data(bowhead.snps) data(bowhead.snp.position) snps <- df2gtypes(bowhead.snps, ploidy = 2, description = "Bowhead SNPS") summary(snps) # Run PHASE on all data phase.results <- phase(snps, bowhead.snp.position, num.iter = 100, save.posterior = FALSE) # Filter phase results filtered.results <- phaseFilter(phase.results, thresh = 0.5) # Convert phased genotypes to gtypes ids <- rownames(filtered.results) strata <- bowhead.snps$Stock[match(ids, bowhead.snps$LABID)] filtered.df <- cbind(id = ids, strata = strata, filtered.results) phased.snps <- df2gtypes(filtered.df, ploidy = 2, description = "Bowhead phased SNPs") summary(phased.snps) ## End(Not run)
Collection of classical population genetics equations.
wrightFst(Ne, dispersal, gen.time, ploidy) numGensEq(fst, Ne, gen.time) fstToNm(fst, ploidy) expectedNumAlleles(n, theta, ploidy)
wrightFst(Ne, dispersal, gen.time, ploidy) numGensEq(fst, Ne, gen.time) fstToNm(fst, ploidy) expectedNumAlleles(n, theta, ploidy)
Ne |
Effective population size. |
dispersal |
Migration rate in terms of probability of an individual migrating in a generation. |
gen.time |
Number of generations since ancestral population. |
ploidy |
Ploidy of the locus. |
fst |
value of Fst at equilibrium. |
n |
Sample size. |
theta |
Product of effective population size (Ne) and mutation rate (mu). |
Calculate Wright's Fst from Ne, dispersal, and generation time.
Calculate the number of generations to equilibrium based on a an ideal Wright model.
Calculate Nm (number of migrants per generation) for a given value of Fst.
Calculate the expected number of alleles in a sample of a given size and value of theta.
a two element vector with the expected number of alleles
(num.alleles
) and variance (var.num.alleles
).
Eric Archer [email protected]
Ewens, W. 1972. The sampling theory of selectively neutral alleles. Theoretical Population Biology 3:87-112. Eqns. 11 and 24.
dispersal <- seq(0.05, 0.8, by = 0.05) fst <- wrightFst(100, dispersal, 20, 2) plot(dispersal, fst, type = "l") numGensEq(0.15, 100, 20) numGensEq(0.3, 100, 20) numGensEq(0.15, 50, 20) fst <- seq(0.001, 0.2, length.out = 100) Nm <- fstToNm(fst, 2) plot(fst, Nm, type = "l") expectedNumAlleles(20, 1, 2) # double the samples expectedNumAlleles(40, 1, 2) # for a haploid locus expectedNumAlleles(40, 1, 1) # double theta expectedNumAlleles(40, 2, 1)
dispersal <- seq(0.05, 0.8, by = 0.05) fst <- wrightFst(100, dispersal, 20, 2) plot(dispersal, fst, type = "l") numGensEq(0.15, 100, 20) numGensEq(0.3, 100, 20) numGensEq(0.15, 50, 20) fst <- seq(0.001, 0.2, length.out = 100) Nm <- fstToNm(fst, 2) plot(fst, Nm, type = "l") expectedNumAlleles(20, 1, 2) # double the samples expectedNumAlleles(40, 1, 2) # for a haploid locus expectedNumAlleles(40, 1, 1) # double theta expectedNumAlleles(40, 2, 1)
Conduct multiple tests of population structure / differentiation.
Overall tests can be conducted for the current stratification scheme
(overallTest()
), or can be conducted for all unique
pairs of strata (pairwiseTest()
). All statistics appropriate to
the ploidy of the data are estimated at once. See Note
for a
description of each statistic.
overallTest( g, nrep = 1000, by.locus = FALSE, hap.locus = 1, quietly = FALSE, max.cores = 1, ... ) pairwiseTest( g, nrep = 1000, by.locus = FALSE, hap.locus = 1, quietly = FALSE, max.cores = 1, ... ) pairwiseMatrix(pws, stat, locus = "All") pairwiseSummary(pws, locus = "All")
overallTest( g, nrep = 1000, by.locus = FALSE, hap.locus = 1, quietly = FALSE, max.cores = 1, ... ) pairwiseTest( g, nrep = 1000, by.locus = FALSE, hap.locus = 1, quietly = FALSE, max.cores = 1, ... ) pairwiseMatrix(pws, stat, locus = "All") pairwiseSummary(pws, locus = "All")
g |
a |
nrep |
number specifying number of permutation replicates to use for permutation test. |
by.locus |
return by-locus values of statistics? If |
hap.locus |
which locus to use if |
quietly |
logical. print progress to screen? |
max.cores |
the maximum number of cores to use to distribute replicates
for permutation tests over. If set to |
... |
parameters passed to |
pws |
a list returned from a call to |
stat |
the name of a statistic in the |
locus |
the name of a single locus. If |
overallTest()
a list containing:
strata.freq
a table of the sample sizes for each stratum
result
an array with the statistic estimate and p-value
for each statistic. If by.locus = FALSE
or g
is a haploid dataset,
this is a two-dimensional array, with one row per statistic,
statistic estimate in the first column and permutation test p-value
in the second column. If by.locus = TRUE
and g
has ploidy > 1,
then this is a three-dimensional array where the first dimension
is loci, second dimension is statistics, and third dimension is
statistic estimate and p-value.
pairwiseTest()
a list containing a list of results as described above
for overallTest()
for each pairwise comparison.
pairwiseMatrix()
a matrix summarizing a chosen statistic
(stat
) for a chosen locus (locus
) between pairs of strata
with the statistic estimate in the lower left and the p-value in the upper right.
pairwiseSummary()
a data frame summarizing all pairwise statistics and p-values along with strata sample sizes.
The computed statistics are:
CHIsq |
chi-squared estimate measuring random allele frequency distribution distributed across strata (haploid and diploid) |
Ho, Hs, Ht |
Nei and Kumar 2002 :
observed heterozygosity (Ho ),
within population diversity (Hs ), overall diversity (Ht )
|
Ht_prime |
description |
Dst |
description |
Dst_prime |
description |
Fst |
For haploid data, equivalent to PHIst with pairwise distances set to 1. For diploid data, |
Fst_prime |
description |
Fis |
description |
Gst_prime |
description |
Gst_dbl_prime |
description |
Dest, Dest_Chao |
population differentiation (Jost 2008) |
wcFit, wcFst, wcFit |
(Weir and Cockerham 1984) |
PHIst |
Haploid AMOVA estimate of differentiation derived
from matrix of pairwise distances between sequences.
See dist.dna for details on distance computation.
(Excoffier et al 1992) |
Eric Archer [email protected]
Excoffier, L., Smouse, P.E. and Quattro, J.M. 1992. Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131:479–491. Jost, L. 2008. GST and its relatives do not measure differentiation. Molecular Ecology 17:4015-4026. Nei M. and Chesser R. 1983. Estimation of fixation indexes and gene diversities. Annals of Human Genetics 47:253-259. Nei M. 1987. Molecular Evolutionary Genetics. Columbia University Press Weir, B.S. and Cockerham, C.C. 1984. Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370. Weir, B.S. and Hill, W.G. 2002. Estimating F-statistics. Annual Review of Genetics 36:721–750.
# An overall test with microsatellite data data(msats.g) ovl <- overallTest(msats.g, nrep = 100) ovl #' A pairwise test with control region sequences data(dloop.g) pws <- pairwiseTest(dloop.g, nrep = 100) pws
# An overall test with microsatellite data data(msats.g) ovl <- overallTest(msats.g, nrep = 100) ovl #' A pairwise test with control region sequences data(dloop.g) pws <- pairwiseTest(dloop.g, nrep = 100) pws
The number of private alleles in each strata and locus.
privateAlleles(g)
privateAlleles(g)
g |
a gtypes object. |
matrix with the number of private alleles in each strata at each locus. This is the number of alleles only present in one stratum.
Eric Archer [email protected]
data(msats.g) privateAlleles(msats.g)
data(msats.g) privateAlleles(msats.g)
Calculate the proportion of alleles that are unique.
propUniqueAlleles(g, by.strata = FALSE)
propUniqueAlleles(g, by.strata = FALSE)
g |
a gtypes object. |
by.strata |
logical - return results grouped by strata? |
a vector of the proportion of unique (occuring only in one individual) alleles for each locus.
Eric Archer [email protected]
data(msats.g) propUniqueAlleles(msats.g)
data(msats.g) propUniqueAlleles(msats.g)
A wrapper for fread
that sets
common values for missing data and removes blank lines.
readGenData(file, na.strings = c("NA", "", "?", "."), ...)
readGenData(file, na.strings = c("NA", "", "?", "."), ...)
file |
filename of .csv file. |
na.strings |
see |
... |
other arguments passed to |
a data.frame
.
Eric Archer [email protected]
Remove sequences not used by samples.
removeUnusedSequences(g)
removeUnusedSequences(g)
g |
a gtypes object. |
a new gtypes object with unused sequences removed.
Eric Archer [email protected]
gtypes
Create a gtypes object from sequence data.
sequence2gtypes( x, strata = NULL, seq.names = NULL, schemes = NULL, description = NULL, other = NULL )
sequence2gtypes( x, strata = NULL, seq.names = NULL, schemes = NULL, description = NULL, other = NULL )
x |
DNA sequences as a character matrix, a |
strata |
a vector or factor giving stratification for each sequence. If not provided all individuals are assigned to the same stratum (Default). |
seq.names |
names for each set of sequences. If not provided default names are generated. |
schemes |
an optional data.frame of stratification schemes. |
description |
an optional label for the object. |
other |
a list to carry other related information (optional). |
a gtypes object.
Eric Archer [email protected]
gtypes.initialize, as.matrix.gtypes, as.data.frame.gtypes, gtypes2genind, gtypes2loci, gtypes2phyDat
#--- create a haploid sequence (mtDNA) gtypes object data(dolph.strata) data(dolph.seqs) strata <- dolph.strata$fine names(strata) <- dolph.strata$ids dloop.fine <- sequence2gtypes(dolph.seqs, strata, seq.names = "dLoop", description = "dLoop: fine-scale stratification")
#--- create a haploid sequence (mtDNA) gtypes object data(dolph.strata) data(dolph.seqs) strata <- dolph.strata$fine names(strata) <- dolph.strata$ids dloop.fine <- sequence2gtypes(dolph.seqs, strata, seq.names = "dLoop", description = "dLoop: fine-scale stratification")
Calculate likelihood of each sequence based on gamma distribution of pairwise distances.
sequenceLikelihoods( x, model = "N", pairwise.deletion = FALSE, n = NULL, plot = TRUE, simplify = TRUE, ... )
sequenceLikelihoods( x, model = "N", pairwise.deletion = FALSE, n = NULL, plot = TRUE, simplify = TRUE, ... )
x |
a |
model |
a character string specifying the evolutionary model to be
used. Passed to |
pairwise.deletion |
a logical indicating whether to delete the
sites with missing data in a pairwise way. Passed to
|
n |
number of sequences with lowest delta(log-likelihoods) to plot. Defaults to all sequences Set to 0 to supress plotting. |
plot |
generate a plot of top |
simplify |
if there is a single locus, return result in a simplified
form? If |
... |
arguments passed from other functions (ignored). |
Fits a Gamma distribution to the pairwise distances of sequences and calculates the log-likelihood for each (sum of all pairwise log-likelihoods for that sequence). Sequences that are extremely different from all others will have low log-likelihoods. Values returned as delta(log-likelhoods) = difference of log-likelihoods from maximum observed values.
vector of delta(log-Likelihoods) for each sequence, sorted from smallest to largest, and a plot of their distributions.
Eric Archer [email protected]
data(dolph.haps) sequenceLikelihoods(dolph.haps)
data(dolph.haps) sequenceLikelihoods(dolph.haps)
Calculate the SFS from a data frame of SNP genotypes
sfs( x, strata.col = 2, locus.col = 3, fsc.dimnames = TRUE, sort.strata = TRUE, na.action = c("fail", "filter") )
sfs( x, strata.col = 2, locus.col = 3, fsc.dimnames = TRUE, sort.strata = TRUE, na.action = c("fail", "filter") )
x |
a data frame of SNP genotypes where the first two columns are id and strata designations and SNPs start on the third column. SNP genotypes are coded as integers where 0 and 2 are the major and minor homozygotes and 1 is the heterozygote. |
strata.col |
column number that strata designations are in. |
locus.col |
column number that loci start in. All columns after this are assumed to be loci. |
fsc.dimnames |
format matrix dimnames for fastsimcoal2? If |
sort.strata |
if |
na.action |
action to take if genotypes are missing for some samples.
If |
A list of the marginal (1D) and joint (2D) site frequency spectra.
If only one stratum is present, then $marginal
will be NULL
.
Eric Archer [email protected]
Simulate a haplotypic frequency distribution based on a specified gamma distribution.
simGammaHaps(pop.size, num.haps, shape, scale, return.freq = TRUE, plot = TRUE)
simGammaHaps(pop.size, num.haps, shape, scale, return.freq = TRUE, plot = TRUE)
pop.size |
size of population. |
num.haps |
number of haplotypes to generate. |
shape , scale
|
parameters of Gamma distribution (see |
return.freq |
logical. Return frequency table of haplotypes? If |
plot |
logical. Show plot of haplotypic frequency distribution? |
Frequency table of haplotypes.
Eric Archer [email protected]
haps <- simGammaHaps(1000, 15, 1, 2.5) print(haps)
haps <- simGammaHaps(1000, 15, 1, 2.5) print(haps)
Return a list of gtypes
for each stratum.
strataSplit(g, strata = NULL, remove.sequences = FALSE)
strataSplit(g, strata = NULL, remove.sequences = FALSE)
g |
a gtypes object. |
strata |
a character vector giving a subset of strata to select.
If |
remove.sequences |
logical. If |
A named list where each element is a gtypes
object
for a single stratum in g
.
Eric Archer [email protected]
data(msats.g) # Proportion of unique alleles in each stratum msats.list <- strataSplit(msats.g) lapply(msats.list, propUniqueAlleles)
data(msats.g) # Proportion of unique alleles in each stratum msats.list <- strataSplit(msats.g) lapply(msats.list, propUniqueAlleles)
Choose a new stratification scheme from the schemes
slot in a gtypes object.
stratify(g, scheme = NULL, drop = TRUE)
stratify(g, scheme = NULL, drop = TRUE)
g |
a gtypes object. |
scheme |
either the column name of a stratification scheme stored
in the data.frame of the |
drop |
remove samples not assigned to a stratum? (those assigned |
A new gtypes object with an updated strata
slot.
If scheme
is a vector or factor and has names, then the
they will be used to match with getIndNames
of g
.
Otherwise scheme
should be the same length as the number of
samples in g
or values in scheme
will be recycled as
necessary.
Eric Archer [email protected]
data(msats.g) msats.g broad.msats <- stratify(msats.g, "broad") broad.msats
data(msats.g) msats.g broad.msats <- stratify(msats.g, "broad") broad.msats
Run STRUCTURE to assess group membership of samples.
structureRun( g, k.range = NULL, num.k.rep = 1, label = NULL, delete.files = TRUE, exec = "structure", ... ) structureWrite( g, label = NULL, maxpops = getNumStrata(g), burnin = 1000, numreps = 1000, noadmix = TRUE, freqscorr = FALSE, randomize = TRUE, seed = 0, pop.prior = NULL, locpriorinit = 1, maxlocprior = 20, gensback = 2, migrprior = 0.05, pfrompopflagonly = TRUE, popflag = NULL, inferalpha = FALSE, alpha = 1, unifprioralpha = TRUE, alphamax = 20, alphapriora = 0.05, alphapriorb = 0.001, ... ) structureRead(file, pops = NULL)
structureRun( g, k.range = NULL, num.k.rep = 1, label = NULL, delete.files = TRUE, exec = "structure", ... ) structureWrite( g, label = NULL, maxpops = getNumStrata(g), burnin = 1000, numreps = 1000, noadmix = TRUE, freqscorr = FALSE, randomize = TRUE, seed = 0, pop.prior = NULL, locpriorinit = 1, maxlocprior = 20, gensback = 2, migrprior = 0.05, pfrompopflagonly = TRUE, popflag = NULL, inferalpha = FALSE, alpha = 1, unifprioralpha = TRUE, alphamax = 20, alphapriora = 0.05, alphapriorb = 0.001, ... ) structureRead(file, pops = NULL)
g |
a gtypes object. |
k.range |
vector of values to for |
num.k.rep |
number of replicates for each value in |
label |
label to use for input and output files |
delete.files |
logical. Delete all files when STRUCTURE is finished? |
exec |
name of executable for STRUCTURE. Defaults to "structure". |
... |
arguments to be passed to |
maxpops |
number of groups. |
burnin |
number of iterations for MCMC burnin. |
numreps |
number of MCMC replicates. |
noadmix |
logical. No admixture? |
freqscorr |
logical. Correlated frequencies? |
randomize |
randomize. |
seed |
set random seed. |
pop.prior |
a character specifying which population prior model to use: "locprior" or "usepopinfo". |
locpriorinit |
parameterizes locprior parameter r - how
informative the populations are. Only used when |
maxlocprior |
specifies range of locprior parameter r. Only used
when |
gensback |
integer defining the number of generations back to test for
immigrant ancestry. Only used when |
migrprior |
numeric between 0 and 1 listing migration prior. Only used
when |
pfrompopflagonly |
logical. update allele frequencies from individuals
specified by |
popflag |
a vector of integers (0, 1) or logicals identifiying whether
or not to use strata information. Only used when |
inferalpha |
logical. Infer the value of the model parameter # from the
data; otherwise is fixed at the value |
alpha |
Dirichlet parameter for degree of admixture. This is the initial
value if |
unifprioralpha |
logical. Assume a uniform prior for |
alphamax |
maximum for uniform prior on |
alphapriora , alphapriorb
|
parameters of Gamma prior on |
file |
name of the output file from STRUCTURE. |
pops |
vector of population labels to be used in place of numbers in STRUCTURE file. |
structureRun
a list where each element is a
list with results from structureRead
and a vector of the filenames
used
structureWrite
a vector of the filenames used by STRUCTURE
structureRead
a list containing:
summary
new locus name, which is a combination of loci in group
q.mat
data.frame of assignment probabilities for each id
prior.anc
list of prior ancestry estimates for each individual where population priors were used
files
vector of input and output files used by STRUCTURE
label
label for the run
STRUCTURE is not included with strataG
and must be downloaded
separately. Additionally, it must be installed such that it can be run from
the command line in the current working directory. See the vignette for
external.programs
for installation instructions.
Eric Archer [email protected]
Pritchard, J.K., M. Stephens, P. Donnelly. 2000. Inference of
population structure using multilocus genotype data. Genetics
155:945-959.
http://web.stanford.edu/group/pritchardlab/structure.html
## Not run: data(msats.g) # Run STRUCTURE sr <- structureRun(msats.g, k.range = 1:4, num.k.rep = 10) # Calculate Evanno metrics evno <- evanno(sr) evno # Run CLUMPP to combine runs for K = 2 q.mat <- clumpp(sr, k = 3) q.mat # Plot CLUMPP results structurePlot(q.mat) ## End(Not run)
## Not run: data(msats.g) # Run STRUCTURE sr <- structureRun(msats.g, k.range = 1:4, num.k.rep = 10) # Calculate Evanno metrics evno <- evanno(sr) evno # Run CLUMPP to combine runs for K = 2 q.mat <- clumpp(sr, k = 3) q.mat # Plot CLUMPP results structurePlot(q.mat) ## End(Not run)
Plot Q-matrix from a call to structure
or
clumpp
.
structurePlot( q.mat, pop.col = 3, prob.col = 4, sort.probs = TRUE, label.pops = TRUE, col = NULL, horiz = TRUE, type = NULL, legend.position = c("top", "left", "right", "bottom", "none"), plot = TRUE )
structurePlot( q.mat, pop.col = 3, prob.col = 4, sort.probs = TRUE, label.pops = TRUE, col = NULL, horiz = TRUE, type = NULL, legend.position = c("top", "left", "right", "bottom", "none"), plot = TRUE )
q.mat |
matrix or data.frame of assignment probabilities. |
pop.col |
column number identifying original population designations. |
prob.col |
column number of first assignment probabilities to first
group. It is assumed that the remainder of columns
( |
sort.probs |
logical. Sort individuals by probabilities within
populations? If |
label.pops |
logical. Label the populations on the plot? |
col |
colors to use for each group. |
horiz |
logical. Plot bars horizontally. |
type |
either |
legend.position |
the position of the legend ( |
plot |
display plot? |
invisibly, the ggplot object
Eric Archer [email protected]
Conducts a suite of data summaries. Summarizes missing data and
homozygosity by individual and locus, and looks for duplicate genotypes
(see dupGenotypes
). For sequence data, identifies low
frequency substitutions (see lowFreqSubs
), and computes
sequence likelihoods (see sequenceLikelihoods
).
summarizeAll(g, write.files = FALSE, label = NULL, ...)
summarizeAll(g, write.files = FALSE, label = NULL, ...)
g |
a gtypes object. |
write.files |
logical determining whether to write .csv files of summaries |
label |
optional label for output folder and prefix for files. |
... |
optional arguments to pass on to summary functions. |
If write.files = TRUE
, files are written for by-sample and
by-locus summaries, and duplicate genotypes if any are found. If
sequences are present, files are written identifying low frequency
substitutions and sequence likelihoods.
The return value is a list with the following elements:
data.frame of by-sample summaries
by.locus
data.frame of by-locus summaries
dup.df
data.frame identifying potential duplicates
by.seq
list of low frequency substitutions and haplotype likelihoods for each gene
Eric Archer [email protected]
summarizeInds
, summarizeLoci
,
dupGenotypes
, lowFreqSubs
,
sequenceLikelihoods
Compile standard by-individual summaries.
summarizeInds(g)
summarizeInds(g)
g |
a gtypes object. |
A data.frame with rows for each sample and columns containing:
id
The individual id
stratum
The stratum of the individual
num.loci.missing.genotypes
The number of genotypes missing
pct.loci.missing.genotypes
The proportion of genotypes missing
pct.loci.homozygous
The proportion of loci homozygous
Eric Archer [email protected]
data(msats.g) summarizeInds(msats.g)
data(msats.g) summarizeInds(msats.g)
Compile standard by-locus summaries.
summarizeLoci(g, by.strata = FALSE)
summarizeLoci(g, by.strata = FALSE)
g |
a gtypes object. |
by.strata |
logical. If |
A matrix with rows for each locus and columns containing summaries of:
num.genotyped
The number of samples genotyped
prop.genotyped
The proportion of samples genotyped
num.alleles
The number of alleles in the locus
allelic.richness
The allelic richness of the locus
prop.unique.alleles
Proportion of alleles found in a single sample
expt.heterozygosity
Expected heterozygosity
obsvd.heterozygosity
Observed heterozygosity
Eric Archer [email protected]
data(msats.g) msats.g <- stratify(msats.g, "fine") summarizeLoci(msats.g)
data(msats.g) msats.g <- stratify(msats.g, "fine") summarizeLoci(msats.g)
Summaries for each sequence.
summarizeSeqs(x)
summarizeSeqs(x)
x |
a |
a matrix listing the start and end positions of each sequence (excluding beginning and trailing N's), the length, the number of N's, and the number of indels.
Eric Archer [email protected]
library(apex) data(woodmouse) summarizeSeqs(woodmouse)
library(apex) data(woodmouse) summarizeSeqs(woodmouse)
Generate a summary of a gtypes
object.
## S4 method for signature 'gtypes' summary(object, ...)
## S4 method for signature 'gtypes' summary(object, ...)
object |
a gtypes object. |
... |
other arguments (ignored). |
a list with the following elements:
num.ind
number of individuals
num.loc
number of loci
num.strata
number of strata
unstratified
number of unstratified samples
schemes
names of stratification schemes
allele.freqs
a list with tables of allele frequencies by strata
strata.smry
a by-strata data.frame summarizing haplotypes or loci
locus.smry
a data.frame summarizing each locus for
non-haploid objects, NULL
for haploid objects
seq.smry
a summary of the sequence length and base frequencies
Eric Archer [email protected]
Calculate Tajima's D for a set of sequences to test for selection.
tajimasD(x, CI = 0.95)
tajimasD(x, CI = 0.95)
x |
set of DNA sequences or a haploid gtypes object with sequences. |
CI |
desired central confidence interval. |
A named vector with the estimate for D
and
the p.value
that it is different from 0.
Eric Archer [email protected]
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595.
data(dolph.seqs) tajimasD(dolph.seqs)
data(dolph.seqs) tajimasD(dolph.seqs)
Calculate theta from heterozygosity of each locus.
theta(g, by.strata = FALSE)
theta(g, by.strata = FALSE)
g |
a gtypes object. |
by.strata |
logical - return results grouped by strata? |
Calculates theta for each locus using the
theta.h
function.
vector of theta values for each locus.
Eric Archer [email protected]
data(msats.g) theta(msats.g)
data(msats.g) theta(msats.g)
Calculate transition/transversion ratio. Test substitution type of two bases.
TiTvRatio(x) subType(b1, b2) isTi(b1, b2) isTv(b1, b2)
TiTvRatio(x) subType(b1, b2) isTi(b1, b2) isTv(b1, b2)
x |
a gtypes object with aligned sequences or a list of aligned DNA sequences. |
b1 , b2
|
two bases to be compared. |
TiTvRatio
: a vector providing the number of
transitions (Ti
), transversions (Tv
), and the
transition/transversion ratio (Ti.Tv.ratio
).subType
: either "ti" for transition, or "tv" for transversion.isTi
and isTv
: a logical identifying whether
the b1
to b2
is a transition or transversion.
Eric Archer [email protected]
data(dolph.seqs) TiTvRatio(dolph.seqs) subType("a", "c") isTi("a", "c") isTv("a", "c")
data(dolph.seqs) TiTvRatio(dolph.seqs) subType("a", "c") isTi("a", "c") isTv("a", "c")
Removes N's from beginning and end of sequences.
trimNs(x)
trimNs(x)
x |
a |
sequences with beginning and trailing N's removed.
Eric Archer [email protected]
test.seqs <- list( A = c(rep("n", 5), "a", "c", "g", "t", rep("n", 3)), B = c(rep("n", 3), "a", "c", "g", "t", rep("n", 5)), C = c("a", "c", "g", "t", rep("n", 8)) ) test.seqs trimmed <- trimNs(test.seqs) as.character(trimmed)
test.seqs <- list( A = c(rep("n", 5), "a", "c", "g", "t", rep("n", 3)), B = c(rep("n", 3), "a", "c", "g", "t", rep("n", 5)), C = c("a", "c", "g", "t", rep("n", 8)) ) test.seqs trimmed <- trimNs(test.seqs) as.character(trimmed)
Identify variable sites among sequences.
variableSites(x, bases = c("a", "c", "g", "t", "-"), simplify = TRUE)
variableSites(x, bases = c("a", "c", "g", "t", "-"), simplify = TRUE)
x |
a gtypes object with sequences,
a |
bases |
character vector of bases to consider. |
simplify |
if there is a single locus, return result in a simplified
form? If |
A list with:
a DNAbin
object composed of variable sites.
a matrix of base pair frequencies by site.
Eric Archer [email protected]
data(dolph.haps) variableSites(dolph.haps)
data(dolph.haps) variableSites(dolph.haps)
Write NEXUS File for SNAPP
write.nexus.snapp(g, file = "snapp.data.nex")
write.nexus.snapp(g, file = "snapp.data.nex")
g |
a gtypes object. |
file |
the filename the NEXUS file to output. |
Eric Archer [email protected]
gtypes
Write a gtypes object to file(s).
writeGtypes( g, label = NULL, folder = NULL, by.strata = TRUE, as.frequency = FALSE, freq.type = c("freq", "prop"), as.haplotypes = TRUE, ... )
writeGtypes( g, label = NULL, folder = NULL, by.strata = TRUE, as.frequency = FALSE, freq.type = c("freq", "prop"), as.haplotypes = TRUE, ... )
g |
a gtypes object. |
label |
label for filename(s). Default is the gtypes description if present. |
folder |
folder where file(s) should be written to. If |
by.strata |
if |
as.frequency |
logical indicating if haploid data should be output as frequency tables. |
freq.type |
if |
as.haplotypes |
write sequences as haplotypes ( |
... |
optional arguments controlling what information is included in the genotype file and how it is formatted passed to as.matrix. |
Writes a comma-delimited (.csv) file of genotypes and if sequences
are present, a .fasta file for each locus. If haploid and as.frequency
is TRUE
, then frequency tables for each locus are written to
separate files.
Eric Archer [email protected]
## Not run: # Write microsatellites with one column per locus data(msats.g) writeGtypes(msats.g, one.col = TRUE) # Write control region data as frequency tables data(dloop.g) writeGtypes(dloop.g, as.frequency = TRUE) ## End(Not run)
## Not run: # Write microsatellites with one column per locus data(msats.g) writeGtypes(msats.g, one.col = TRUE) # Write control region data as frequency tables data(dloop.g) writeGtypes(dloop.g, as.frequency = TRUE) ## End(Not run)