Title: | Estimation and Tests of Hierarchical F-Statistics |
---|---|
Description: | Estimates hierarchical F-statistics from haploid or diploid genetic data with any numbers of levels in the hierarchy, following the algorithm of Yang (Evolution(1998), 52:950). Tests via randomisations the significance of each F and variance components, using the likelihood-ratio statistics G (Goudet et al. (1996) <https://academic.oup.com/genetics/article/144/4/1933/6017091>). Estimates genetic diversity statistics for haploid and diploid genetic datasets in various formats, including inbreeding and coancestry coefficients, and population specific F-statistics following Weir and Goudet (2017) <https://academic.oup.com/genetics/article/206/4/2085/6072590>. |
Authors: | Jerome Goudet [aut, cre], Thibaut Jombart [aut], Zhian N. Kamvar [ctb], Eric Archer [ctb], Olivier Hardy [ctb] |
Maintainer: | Jerome Goudet <jerome.goudet@unil.ch> |
License: | GPL (>=2) |
Version: | 0.5-11 |
Built: | 2024-11-20 20:33:47 UTC |
Source: | https://github.com/jgx65/hierfstat |
Calculates corrected Assignment Index as described in Goudet etal. (2002)
AIc(dat)
AIc(dat)
dat |
a data frane with nlocs+1 columns, |
aic The corrected assignment index of each individual
Jerome Goudet jerome.goudet@unil.ch
Goudet J, Perrin N, Waser P (2002) Tests for sex-biased dispersal using bi-parentally inherited genetic markers 11, 1103:1114
Counts the number of copies of the different alleles at each locus and population
allele.count(data,diploid=TRUE)
allele.count(data,diploid=TRUE)
data |
A data frame containing the population of origin in the first column and the genotypes in the following ones |
diploid |
Whether the data are from diploid individuals |
A list of tables, –each with np (number of populations) columns and nl (number of loci) rows– of the count of each allele
Jerome Goudet jerome.goudet@unil.ch
data(gtrunchier) allele.count(gtrunchier[,-2])
data(gtrunchier) allele.count(gtrunchier[,-2])
Estimates allelic richness, the rarefied allelic counts, per locus and population
allelic.richness(data,min.n=NULL,diploid=TRUE)
allelic.richness(data,min.n=NULL,diploid=TRUE)
data |
A data frame, with as many rows as individuals. The first column contains the population to which the individual belongs, the following to the different loci |
min.n |
The number of alleles down to which the number of alleles should be rarefied. The default is the minimum number of individuals genotyped (times 2 for diploids) |
diploid |
a boolean specifying wether individuals are diploid (default) or haploid |
min.all |
The number of alleles used for rarefaction |
Ar |
A table with as many rows as loci and columns as populations containing the rarefied allele counts |
Jerome Goudet jerome.goudet@unil.ch
El Mousadik A. and Petit R.J. (1996) High level of genetic differentiation for allelic richness among populations of the argan tree argania spinosa skeels endemic to Morocco. Theoretical and Applied Genetics, 92:832-839
Hurlbert S.H. (1971) The nonconcept of species diversity: a critique and alternative parameters. Ecology, 52:577-586
Petit R.J., El Mousadik A. and Pons O. (1998) Identifying populations for conservation on the basis of genetic markers. Conservation Biology, 12:844-855
data(gtrunchier) allelic.richness(gtrunchier[,-1])
data(gtrunchier) allelic.richness(gtrunchier[,-1])
Estimates individual counts, allelic frequencies, observed heterozygosities and genetic diversities per locus and population. Also Estimates mean observed heterozygosities, mean gene diversities within population Hs, Gene diversities overall Ht and corrected Htp, and Dst, Dstp. Finally, estimates Fst and Fstp as well as Fis following Nei (1987) per locus and overall loci
basic.stats(data,diploid=TRUE,digits=4) ## S3 method for class 'basic.stats' print(x,...) Hs(data,...) Ho(data,...)
basic.stats(data,diploid=TRUE,digits=4) ## S3 method for class 'basic.stats' print(x,...) Hs(data,...) Ho(data,...)
data |
a data frame where the first column contains the population to which the different individuals belong, and the following columns contain the genotype of the individuals -one locus per column- |
diploid |
Whether individuals are diploids (default) or haploids |
digits |
how many digits to print out in the output (default is 4) |
x |
an object of class basic.stats |
... |
further arguments to pass to print.bas.stats |
n.ind.samp |
A table –with np (number of populations) columns and nl (number of loci) rows– of genotype counts |
pop.freq |
A list containing allele frequencies. Each element of the list is one locus. For each locus, Populations are in columns and alleles in rows |
Ho |
A table –with np (number of populations) columns and nl (number of loci) rows– of observed heterozygosities |
Hs |
A table –with np (number of populations) columns and nl (number of loci) rows– of observed gene diversities |
Fis |
A table –with np (number of populations) columns and nl (number of loci) rows–of observed Fis |
perloc |
A table –with as many rows as loci– containing basic statistics Ho, Hs, Ht, Dst, Ht', Dst', Fst, Fst' ,Fis, Dest |
overall |
Basic statistics averaged over loci |
For the perloc and overall tables (see value section), the following statistics, defined in eq.7.38– 7.43 pp.164–5 of Nei (1987) are estimated:
The observed heterozygosity
where represents the proportion of homozygote
in sample
and
the number of samples.
The within population gene diversity (sometimes misleadingly called expected heterozygosity):
where and
The overall gene diversity
where .
The amount of gene diversity among samples
.(This is not the same as Nei's
,
Nei's
is an estimator of
based on allele frequencies only)
Last, a measure of population
differentiation as defined by Jost (2008) is also given
Here, the are unweighted by sample size. These statistics are
estimated for each locus and an overall loci estimates is also given, as the
unweighted average of the per locus estimates. In this way, monomorphic loci
are accounted for (with estimated value of 0) in the overall estimates.
Note that the equations used here all rely on genotypic rather than allelic number and are corrected for heterozygosity.
Jerome Goudet jerome.goudet@unil.ch
Nei M. (1987) Molecular Evolutionary Genetics. Columbia University Press
Jost L (2008) GST and its relatives do not measure differentiation. Molecular Ecology, 17, 4015-4026.
Nei M, Chesser R (1983) Estimation of fixation indexes and gene diversities. Annals of Human Genetics, 47, 253-259.
data(gtrunchier) basic.stats(gtrunchier[,-1]) Hs(gtrunchier[,-2]) Ho(gtrunchier[,-2])
data(gtrunchier) basic.stats(gtrunchier[,-1]) Hs(gtrunchier[,-2]) Ho(gtrunchier[,-2])
Estimates pairwise kinships (coancestries) and individual inbreeding coefficient using Weir and Goudet (2017) beta estimator.
beta.dosage(dos,inb=TRUE,Mb=FALSE,matching=FALSE) kas.dosage(dos, inb = TRUE, Mb = FALSE, matching = FALSE)
beta.dosage(dos,inb=TRUE,Mb=FALSE,matching=FALSE) kas.dosage(dos, inb = TRUE, Mb = FALSE, matching = FALSE)
dos |
A matrix of 0, 1 and 2s with loci (SNPs) in columns and individuals in rows. Missing values are allowed |
inb |
whether individual inbreeding coefficient should be estimated (rather than self-coancestries) |
Mb |
whether to output the mean matching of off-diagonal elements |
matching |
if |
This function is written for dosage data, i.e., how many doses of an allele (0, 1 or 2) an individual carries.
It should be use for bi-allelic markers only (e.g. SNPs), although you might "force" a k multiallelic locus to k biallelic
loci (see fstat2dos
).
Matching proportions can be obtained by the following equation:
By default (inb=TRUE
) the inbreeding coefficient is returned on the main diagonal.
With inb=FALSE
, self coancestries are reported.
Twice the betas with self-coancestries on the diagonal gives the Genomic Relationship Matrix (GRM)
Following a suggestion from Olivier Hardy, missing data are removed from the estimation procedure, rather than imputed (this is taken care off automatically)
if Mb
=FALSE, a matrix of pairwise kinships and inbreeding coefficients (if inb
=TRUE) or self-coancestries
(inb
=FALSE);
if Mb
=TRUE, a list with elements inb
(whether inbreeding coefficients rather than kinships should
be returned on the main diagonal),
MB
(the average off-diagonal matching) and betas
the kinships or inbreeding coefficients.
Jerome Goudet jerome.goudet@unil.ch
Weir, BS and Goudet J. 2017 A Unified Characterization of Population Structure and Relatedness. Genetics (2017) 206:2085
Goudet, J., Kay, T. and Weir BS. 2018 How to estimate kinship. Molecular Ecology 27:4121.
## Not run: dos<-matrix(sample(0:2,size=10000,replace=TRUE),ncol=100) beta.dosage(dos,inb=TRUE) #matrix of kinship/inbreeding coeff data(gtrunchier) beta.dosage(fstat2dos(gtrunchier[,-c(1:2)])) #individual inbreeding coefficients dat<-sim.genot(size=100,nbloc=100,nbal=20,mig=0.01,f=c(0,0.3,0.7)) hist(diag(beta.dosage(fstat2dos(dat[,-1]))),breaks=-10:100/100,main="",xlab="",ylab="") abline(v=c(0.0,0.3,0.7),col="red") #only 20 loci hist(diag(beta.dosage(fstat2dos(dat[,2:21]))),breaks=-5:20/20,main="",xlab="",ylab="") abline(v=c(0.0,0.3,0.7),col="red") ## End(Not run)
## Not run: dos<-matrix(sample(0:2,size=10000,replace=TRUE),ncol=100) beta.dosage(dos,inb=TRUE) #matrix of kinship/inbreeding coeff data(gtrunchier) beta.dosage(fstat2dos(gtrunchier[,-c(1:2)])) #individual inbreeding coefficients dat<-sim.genot(size=100,nbloc=100,nbal=20,mig=0.01,f=c(0,0.3,0.7)) hist(diag(beta.dosage(fstat2dos(dat[,-1]))),breaks=-10:100/100,main="",xlab="",ylab="") abline(v=c(0.0,0.3,0.7),col="red") #only 20 loci hist(diag(beta.dosage(fstat2dos(dat[,2:21]))),breaks=-5:20/20,main="",xlab="",ylab="") abline(v=c(0.0,0.3,0.7),col="red") ## End(Not run)
s per population and a bootstrap confidence intervalEstimate populations (Population specific FST) or individual coancestries and a bootstrap confidence interval, assuming random mating
betas(dat,nboot=0,lim=c(0.025,0.975),diploid=TRUE,betaijT=FALSE) ## S3 method for class 'betas' print(x, digits = 4, ...)
betas(dat,nboot=0,lim=c(0.025,0.975),diploid=TRUE,betaijT=FALSE) ## S3 method for class 'betas' print(x, digits = 4, ...)
dat |
data frame with genetic data and pop identifier |
nboot |
number of bootstrap samples. |
lim |
width of the bootstrap confidence interval |
diploid |
whether the data comes from a diploid organism |
betaijT |
whether to estimate individual coancestries |
x |
a betas object |
digits |
number of digits to print |
... |
further arguments to pass to print |
If betaijT=TRUE, and the first column contains a unique identifier for each individual, the function returns the matrix of individual coancestries/kinships. Individual inbreeding coefficients can be obtained by multiplying by 2 the diagonal and substracting 1.
Hi Within population gene diversities (complement to 1 of matching probabilities)
Hb Between populations gene diversities
betaiovl Average over loci (Population specific FSTs), Table 3 of
Weir and Goudet, 2017 (Genetics)
betaW Average of the betaiovl over loci (overall population FST)
ci The bootstrap confidence interval of population specific FSTs (only if more than 100 bootstraps requested AND if more than 10 loci are present)
if betaijT=TRUE, return the matrix of pairwise kinships only.
print(betas)
: print function for betas class
Jerome Goudet jerome.goudet@unil.ch
Weir and Goudet, 2017 (Genetics) A unified characterization of population structure and relatedness.
fs.dosage
, beta.dosage
for Fst estimates (not assuming Random Mating)
and kinship estimates from dosage data, respectively
## Not run: #3 different population sizes lead to 3 different betais dat<-sim.genot(size=40,N=c(50,200,1000),nbloc=50,nbal=10) betas(dat,nboot=100) #individual coancestries from the smallest population are large ind.coan<-betas(cbind(1:120,dat[,-1]),betaij=T) diag(ind.coan$betaij)<-NA graphics::image(1:120,1:120,ind.coan$betaij,xlab="Inds",ylab="Inds") ## End(Not run)
## Not run: #3 different population sizes lead to 3 different betais dat<-sim.genot(size=40,N=c(50,200,1000),nbloc=50,nbal=10) betas(dat,nboot=100) #individual coancestries from the smallest population are large ind.coan<-betas(cbind(1:120,dat[,-1]),betaij=T) diag(ind.coan$betaij)<-NA graphics::image(1:120,1:120,ind.coan$betaij,xlab="Inds",ylab="Inds") ## End(Not run)
Converts bi-allelic SNPs hierfstat format to dosage format, the number of alternate allele copies at a locus for an individual, i.e. 11 -> 0; 12 or 21 >1 and 22 ->2
biall2dos(dat,diploid=TRUE)
biall2dos(dat,diploid=TRUE)
dat |
a hierfstat data frame without the first column (the population identifier), individuals in rows, columns with individual genotypes encoded as 11, 12, 21 and 22 |
diploid |
whether the data set is from a diploid organism |
a matrix containing allelic dosages
## Not run: biall2dos(sim.genot(nbal=2,nbloc=10)[,-1]) # a 10 column matrix ## End(Not run)
## Not run: biall2dos(sim.genot(nbal=2,nbloc=10)[,-1]) # a 10 column matrix ## End(Not run)
Estimates bootstrap confidence intervals for pairwise betas FST estimates.
boot.ppbetas(dat=dat,nboot=100,quant=c(0.025,0.975),diploid=TRUE,digits=4)
boot.ppbetas(dat=dat,nboot=100,quant=c(0.025,0.975),diploid=TRUE,digits=4)
dat |
A data frame containing population of origin as the first column and multi-locus genotypes in following columns |
nboot |
the number of bootstrap samples to draw |
quant |
the limit of the confidence intervals |
diploid |
whether the data is from a diploid (default) or haploid organism |
digits |
how many digits to print out |
a matrix with upper limit of the bootstrap CI above the diagonal and lower limit below the diagonal
## Not run: data(gtrunchier) boot.ppbetas(gtrunchier[,-2]) ## End(Not run)
## Not run: data(gtrunchier) boot.ppbetas(gtrunchier[,-2]) ## End(Not run)
Performs bootstrapping over loci of population's Fis
boot.ppfis(dat=dat,nboot=100,quant=c(0.025,0.975),diploid=TRUE,dig=4,...)
boot.ppfis(dat=dat,nboot=100,quant=c(0.025,0.975),diploid=TRUE,dig=4,...)
dat |
a genetic data frame |
nboot |
number of bootstraps |
quant |
quantiles |
diploid |
whether diploid data |
dig |
digits to print |
... |
further arguments to pass to the function |
call |
function call |
fis.ci |
Bootstrap ci of Fis per population |
Jerome Goudet jerome.goudet@unil.ch
dat<-sim.genot(nbpop=4,nbloc=20,nbal=10,f=c(0,0.2,0.4,0.6)) boot.ppfis(dat)
dat<-sim.genot(nbpop=4,nbloc=20,nbal=10,f=c(0,0.2,0.4,0.6)) boot.ppfis(dat)
Performs bootstrapping over loci of pairwise Fst using Weir and Cockerham (1984) estimator of Fst
boot.ppfst(dat=dat,nboot=100,quant=c(0.025,0.975),diploid=TRUE,...)
boot.ppfst(dat=dat,nboot=100,quant=c(0.025,0.975),diploid=TRUE,...)
dat |
a genetic data frame |
nboot |
number of bootstraps |
quant |
the quantiles for bootstrapped ci |
diploid |
whether data are from diploid organisms |
... |
further arguments to pass to the function |
call |
call to the function |
ll |
lower limit ci |
ul |
upper limit ci |
vc.per.loc |
for each pair of population, the variance components per locus |
Jerome Goudet jerome.goudet@unil.ch
data(gtrunchier) x<-boot.ppfst(gtrunchier[,-2]) x$ll x$ul
data(gtrunchier) x<-boot.ppfst(gtrunchier[,-2]) x$ll x$ul
Provides a bootstrap confidence interval (over loci) for sums of the different variance components (equivalent to gene diversity estimates at the different levels), and the derived F-statistics, as suggested by Weir and Cockerham (1984). Will not run with less than 5 loci. Raymond and Rousset (199X) points out shortcomings of this method.
boot.vc(levels=levels,loci=loci,diploid=TRUE,nboot=1000,quant=c(0.025,0.5,0.975))
boot.vc(levels=levels,loci=loci,diploid=TRUE,nboot=1000,quant=c(0.025,0.5,0.975))
levels |
a data frame containing the different levels (factors) from the outermost (e.g. region) to the innermost before the individual |
loci |
a data frame containing the different loci |
diploid |
Specify whether the data are coming from diploid or haploid organisms (diploid is the default) |
nboot |
Specify the number of bootstrap to carry out. Default is 1000 |
quant |
Specify which quantile to produce. Default is c(0.025,0.5,0.975) giving the percentile 95% CI and the median |
boot |
a data frame with the bootstrapped variance components. Could be used for obtaining bootstrap ci of statistics not listed here. |
res |
a data frame with the bootstrap derived statistics. H stands for gene diversity, F for F-statistics |
ci |
Confidence interval for each statistic. |
Raymond M and Rousset F, 1995. An exact test for population differentiation. Evolution. 49:1280-1283
Weir, B.S. (1996) Genetic Data Analysis II. Sinauer Associates.
Weir BS and Cockerham CC, 1984. Estimating F-statistics for the analysis of population structure. Evolution 38:1358-1370.
#load data set data(gtrunchier) boot.vc(gtrunchier[,c(1:2)],gtrunchier[,-c(1:2)],nboot=100)
#load data set data(gtrunchier) boot.vc(gtrunchier[,c(1:2)],gtrunchier[,-c(1:2)],nboot=100)
A simple diploid dataset, with allele encoded as one digit number. Up to 4 alleles per locus
data(cont.isl)
data(cont.isl)
A data frame with 150 rows and 6 columns:
Population identifier, from 1 to 3
genotype at loc.1
genotype at loc.2
genotype at loc.3
genotype at loc.4
genotype at loc.5
...
generated with function sim.genot()
data(cont.isl) allele.count(cont.isl)
data(cont.isl) allele.count(cont.isl)
A simple diploid dataset, with alleles encoded as two digits numbers. Up to 99 alleles per locus
data(cont.isl99)
data(cont.isl99)
A data frame with 150 rows and 6 columns:
Population identifier, from 1 to 3
genotype at loc.1
genotype at loc.2
genotype at loc.3
genotype at loc.4
genotype at loc.5
...
generated with function sim.genot(nbal=99)
data(cont.isl99) allele.count(cont.isl99)
data(cont.isl99) allele.count(cont.isl99)
A dataset containing microsatellite genotypes, population and sex of 140 Crocidura russula individuals
data(crocrussula)
data(crocrussula)
Favre et al. (1997) Female-biased dispersal in the monogamous mammal Crocidura russula: evidence from field data and microsatellite patterns. Proceedings of the Royal Society, B (264): 127-132
Goudet J, Perrin N, Waser P (2002) Tests for sex-biased dispersal using bi-parentally inherited genetic markers 11, 1103:1114
data(crocrussula) aic<-AIc(crocrussula$genot) boxplot(aic~crocrussula$sex) sexbias.test(crocrussula$genot,crocrussula$sex)
data(crocrussula) aic<-AIc(crocrussula$genot) boxplot(aic~crocrussula$sex) sexbias.test(crocrussula$genot,crocrussula$sex)
A simple diploid dataset, with allele encoded as one digit number
data(diploid)
data(diploid)
A data frame with 44 rows and 6 columns:
Population identifier, from 1 to 6
genotype at loc-1 (only allele 4 present)
genotype at loc-1 (alleles 3 and 4)
genotype at loc-1 (alleles 2, 3 and 4)
genotype at loc-1 (alleles 1, 2, 3 and 4)
genotype at loc-1 (only allele 4)
...
Given in Weir, B.S. Genetic Data Analysis. Sinauer
data(diploid) basic.stats(diploid)
data(diploid) basic.stats(diploid)
Example data set with 4 levels, one diploid and one haploid locus
data(exhier)
data(exhier)
lev1 |
outermost level |
lev2 |
level 2 |
lev3 |
Level 3 |
lev4 |
Level 4 |
diplo |
Diploid locus |
haplo |
Haploid locus |
data(exhier) varcomp(exhier[,1:5]) varcomp(exhier[,c(1:4,6)],diploid=FALSE)
data(exhier) varcomp(exhier[,1:5]) varcomp(exhier[,c(1:4,6)],diploid=FALSE)
Reports individual inbreeding coefficients, Population specific and pairwise Fsts, and Fiss from dosage data
fs.dosage(dos, pop, matching = FALSE) ## S3 method for class 'fs.dosage' plot(x, ...) ## S3 method for class 'fs.dosage' print(x, digits = 4, ...) fst.dosage(dos, pop, matching = FALSE) fis.dosage(dos, pop, matching = FALSE) pairwise.fst.dosage(dos, pop, matching = FALSE)
fs.dosage(dos, pop, matching = FALSE) ## S3 method for class 'fs.dosage' plot(x, ...) ## S3 method for class 'fs.dosage' print(x, digits = 4, ...) fst.dosage(dos, pop, matching = FALSE) fis.dosage(dos, pop, matching = FALSE) pairwise.fst.dosage(dos, pop, matching = FALSE)
dos |
either a matrix with snps columns and individuals in rows containing allelic dosage (number [0,1 or 2] of alternate alleles); or a square matrix with as many rows and columns as the number of individuals and containing the proportion of matching alleles |
pop |
a vector containing the identifier of the population to which the individual in the corresponding row belongs |
matching |
logical:TRUE if dos is a square matrix of allelic matching; FALSE otherwise |
x |
a fs.dosage object |
... |
further arguments to pass |
digits |
number of digits to print |
Fi list of individual inbreeding coefficients, estimated with the reference being the population to which the individual belongs.
FsM matrix containing population specific FSTs on the diagonal. The off diagonal elements contains the average of the kinships for pairs of individuals, one from each population, relative to the mean kinship for pairs of individuals between populations.
Fst2x2 matrix containing pairwise FSTs
Fs The first row contains population specific and overall Fis, the second row population specific
(average over loci) FSTs and overall Fst
(see Table 3 of
Weir and Goudet, 2017 (Genetics))
plot(fs.dosage)
: Plot function for fs.dosage class
print(fs.dosage)
: Print function for fs.dosage class
Jerome Goudet jerome.goudet@unil.ch
Weir, BS and Goudet J. 2017 A Unified Characterization of Population Structure and Relatedness. Genetics (2017) 206:2085
## Not run: dos<-matrix(sample(0:2,size=10000,replace=TRUE),ncol=100) fs.dosage(dos,pop=rep(1:5,each=20)) plot(fs.dosage(dos,pop=rep(1:5,each=20))) ## End(Not run)
## Not run: dos<-matrix(sample(0:2,size=10000,replace=TRUE),ncol=100) fs.dosage(dos,pop=rep(1:5,each=20)) plot(fs.dosage(dos,pop=rep(1:5,each=20))) ## End(Not run)
Converts a hierfstat genetic data frame to dosage. For each allele at each locus, allelic dosage (number of copies of the allele) is reported. The column name is the allele identifier
fstat2dos(dat,diploid=TRUE)
fstat2dos(dat,diploid=TRUE)
dat |
data frame with genetic data without the first column (population identifier) |
diploid |
whether the data set is from a diploid organism |
a matrix with columns (where
is the number of alleles
at locus l), as many rows as individuals, and containing the number of copies (dosage) of the
corresponding allele
## Not run: dat<-sim.genot(nbal=5,nbloc=10) dos<-fstat2dos(dat[,-1]) dim(dos) wc(dat) fst.dosage(dos,pop=dat[,1]) ## End(Not run)
## Not run: dat<-sim.genot(nbal=5,nbloc=10) dos<-fstat2dos(dat[,-1]) dim(dos) wc(dat) fst.dosage(dos,pop=dat[,1]) ## End(Not run)
Calculates the likelihood ratio G-statistic on a contingency table of alleles at one locus X sampling unit. The sampling unit could be any hierarchical level
g.stats(data,diploid=TRUE)
g.stats(data,diploid=TRUE)
data |
a two-column data frame. The first column contains the sampling unit, the second the genotypes |
diploid |
Whether the data are from diploid (default) organisms |
obs |
Observed contingency table |
exp |
Expected number of allelic observations |
X.squared |
The chi-squared statistics, |
g.stats |
The likelihood ratio statistics, |
Jerome Goudet, DEE, UNIL, CH-1015 Lausanne Switzerland
Goudet J., Raymond, M., DeMeeus, T. and Rousset F. (1996) Testing differentiation in diploid populations. Genetics. 144: 1933-1940
Goudet J. (2005). Hierfstat, a package for R to compute and test variance components and F-statistics. Molecular Ecology Notes. 5:184-186
Petit E., Balloux F. and Goudet J.(2001) Sex-biased dispersal in a migratory bat: A characterization using sex-specific demographic parameters. Evolution 55: 635-640.
data(gtrunchier) attach(gtrunchier) g.stats(data.frame(Patch,L21.V))
data(gtrunchier) attach(gtrunchier) g.stats(data.frame(Patch,L21.V))
Calculates the likelihood ratio G-statistic on a contingency table of alleles at one locus X sampling unit, and sums this statistic over the loci provided. The sampling unit could be any hierarchical level (patch, locality, region,...). By default, diploid data are assumed
g.stats.glob(data,diploid=TRUE)
g.stats.glob(data,diploid=TRUE)
data |
a data frame made of nl+1 column, nl being the number of loci. The first column contains the sampling unit, the others the multi-locus genotype. Only complete multi-locus genotypes are kept for calculation |
diploid |
Whether the data are from diploid (default) organisms |
g.stats.l |
Per locus likelihood ratio statistic |
g.stats |
Overall loci likelihood ratio statistic |
Jerome Goudet, DEE, UNIL, CH-1015 Lausanne Switzerland
Goudet J. (2005). Hierfstat, a package for R to compute and test variance components and F-statistics. Molecular Ecology Notes. 5:184-186
Goudet J., Raymond, M., DeMeeus, T. and Rousset F. (1996) Testing differentiation in diploid populations. Genetics. 144: 1933-1940
Petit E., Balloux F. and Goudet J.(2001) Sex-biased dispersal in a migratory bat: A characterization using sex-specific demographic parameters. Evolution 55: 635-640.
g.stats
, samp.within
,samp.between
.
## Not run: data(gtrunchier) attach(gtrunchier) nperm<-99 nobs<-length(Patch) gglobs.o<-vector(length=(nperm+1)) gglobs.p<-vector(length=(nperm+1)) gglobs.l<-vector(length=(nperm+1)) gglobs.o[nperm+1]<-g.stats.glob(data.frame(Patch,gtrunchier[,-c(1,2)]))$g.stats gglobs.p[nperm+1]<-g.stats.glob(data.frame(Patch,gtrunchier[,-c(1,2)]))$g.stats gglobs.l[nperm+1]<-g.stats.glob(data.frame(Locality,gtrunchier[,-c(1,2)]))$g.stats for (i in 1:nperm) #careful, might take a while { gglobs.o[i]<-g.stats.glob(data.frame(Patch,gtrunchier[sample(Patch),-c(1,2)]))$g.stats gglobs.p[i]<-g.stats.glob(data.frame(Patch,gtrunchier[samp.within(Locality),-c(1,2)]))$g.stats gglobs.l[i]<-g.stats.glob(data.frame(Locality,gtrunchier[samp.between(Patch),-c(1,2)]))$g.stats } #p-value of first test (among patches) p.globs.o<-sum(gglobs.o>=gglobs.o[nperm+1])/(nperm+1) #p-value of second test (among patches within localities) p.globs.p<-sum(gglobs.p>=gglobs.p[nperm+1])/(nperm+1) #p-value of third test (among localities) p.globs.l<-sum(gglobs.l>=gglobs.l[nperm+1])/(nperm+1) #Are alleles associated at random among patches p.globs.o #Are alleles associated at random among patches within localities? #Tests differentiation among patches within localities p.globs.p #Are alleles associated at random among localities, keeping patches as one unit? #Tests differentiation among localities p.globs.l ## End(Not run)
## Not run: data(gtrunchier) attach(gtrunchier) nperm<-99 nobs<-length(Patch) gglobs.o<-vector(length=(nperm+1)) gglobs.p<-vector(length=(nperm+1)) gglobs.l<-vector(length=(nperm+1)) gglobs.o[nperm+1]<-g.stats.glob(data.frame(Patch,gtrunchier[,-c(1,2)]))$g.stats gglobs.p[nperm+1]<-g.stats.glob(data.frame(Patch,gtrunchier[,-c(1,2)]))$g.stats gglobs.l[nperm+1]<-g.stats.glob(data.frame(Locality,gtrunchier[,-c(1,2)]))$g.stats for (i in 1:nperm) #careful, might take a while { gglobs.o[i]<-g.stats.glob(data.frame(Patch,gtrunchier[sample(Patch),-c(1,2)]))$g.stats gglobs.p[i]<-g.stats.glob(data.frame(Patch,gtrunchier[samp.within(Locality),-c(1,2)]))$g.stats gglobs.l[i]<-g.stats.glob(data.frame(Locality,gtrunchier[samp.between(Patch),-c(1,2)]))$g.stats } #p-value of first test (among patches) p.globs.o<-sum(gglobs.o>=gglobs.o[nperm+1])/(nperm+1) #p-value of second test (among patches within localities) p.globs.p<-sum(gglobs.p>=gglobs.p[nperm+1])/(nperm+1) #p-value of third test (among localities) p.globs.l<-sum(gglobs.l>=gglobs.l[nperm+1])/(nperm+1) #Are alleles associated at random among patches p.globs.o #Are alleles associated at random among patches within localities? #Tests differentiation among patches within localities p.globs.p #Are alleles associated at random among localities, keeping patches as one unit? #Tests differentiation among localities p.globs.l ## End(Not run)
Estimates one of several genetic distances among all pairs of populations.
genet.dist(dat,diploid=TRUE,method="Dch")
genet.dist(dat,diploid=TRUE,method="Dch")
dat |
A data frame containing population of origin as the first column and multi-locus genotypes in following columns |
diploid |
whether the data is from a diploid (default) or haploid organism. |
method |
One of “Dch”,“Da”,“Ds”,“Fst”,“Dm”,“Dr”,“Cp” or “X2”, all described in Takezaki and Nei (1996). Additionally “Nei87” and “WC84” return pairwise FSTs estimated following Nei (1987) pairwise.neifst and Weir & Cockerham (1984) pp.fst respectively |
the method argument specify which genetic distance to use, among eight, all briefly described in Takezaki and Nei (1996)
“Dch” By default, Cavalli-Sforza and Edwards Chord distance (eqn 6 in the reference) is returned. This distance is used as default since Takezaki & Nei (1996) found that it was the best to retrieve the relation among samples.
“Da” This is Nei's et al genetic distance (eqn 7), performing nearly as well as “Dch”
“Ds” Nei's standard genetic distance (eqn 1). Increases linearly with diverence time but has larger variance
“Fst” Latter's and also approximately Reynolds et al Genetic distance (eqn 3)
“Dm” Nei's minimum distance (eqn 2)
“Dr” Rogers's distance (eqn 4)
“Cp” Prevosti et al's distance (eqn 5)
“X2” Sanghvi's distance (eqn 8)
“Nei87” see pairwise.neifst
“WC84” see pairwise.WCfst
A matrix of pairwise genetic distance
Jerome Goudet jerome.goudet@unil.ch
Takezaki & Nei (1996) Genetic distances and reconstruction of Phylogenetic trees from microsatellite DNA. Genetics 144:389-399
Nei, M. (1987) Molecular Evolutionary Genetics. Columbia University Press
Weir B.S. and Cockerham C.C. (1984) Estimating F-Statistics for the Analysis of Population Structure. Evolution 38:1358
pairwise.WCfst
pairwise.neifst
data(gtrunchier) genet.dist(gtrunchier[,-1]) genet.dist(gtrunchier[,-1],method="Dr")
data(gtrunchier) genet.dist(gtrunchier[,-1]) genet.dist(gtrunchier[,-1],method="Dr")
Converts genind objects from adegenet into a hierfstat data frame
genind2hierfstat(dat,pop=NULL)
genind2hierfstat(dat,pop=NULL)
dat |
a genind object |
pop |
a vector containing the population to which each individual belongs. If pop=NULL, pop taken from slot pop of the genind object |
a data frame with nloci+1 columns and ninds rows. The first column contains the population identifier, the following the genotypes at each locus
## Not run: library(adegenet) data(nancycats) genind2hierfstat(nancycats) basic.stats(nancycats) genet.dist(nancycats) data(H3N2) basic.stats(genind2hierfstat(H3N2,pop=rep(1,dim(H3N2@tab)[1])),diploid=FALSE) ## End(Not run)
## Not run: library(adegenet) data(nancycats) genind2hierfstat(nancycats) basic.stats(nancycats) genet.dist(nancycats) data(H3N2) basic.stats(genind2hierfstat(H3N2,pop=rep(1,dim(H3N2@tab)[1])),diploid=FALSE) ## End(Not run)
Separates the input vector of diploid genotypes in two vectors each containing one allele, and returns a vector of length 2*length(y) with the second part being the second allele
genot2al(y)
genot2al(y)
y |
the diploid genotypes at one locus |
returns a vector of length 2*length(y), with the second half of the vector containing the second alleles
Jerome Goudet, DEE, UNIL, CH-1015 Lausanne Switzerland
Goudet J. (2004). A library for R to compute and test variance components and F-statistics. In Prep
data(gtrunchier) genot2al(gtrunchier[,4])
data(gtrunchier) genot2al(gtrunchier[,4])
Converts diploid genotypic data into allelic data
getal(data)
getal(data)
data |
a data frame where the first column contains the population to which the different individuals belong, and the following columns contain the genotype of the individuals -one locus per column- |
data.al |
a new data frame, with twice as many row as the input data frame and one extra column. each row of the first half of the data frame contains the first allele for each locus, and each row of the second half of the data frame contains the second allel at the locus. The extra column in second position corresponds to the identifier of the individual to which the allele belongs |
Jerome Goudet jerome.goudet@unil.ch
data(gtrunchier) getal(data.frame(gtrunchier[,-2]))
data(gtrunchier) getal(data.frame(gtrunchier[,-2]))
Converts a data frame of genotypic diploid data with as many lines as individuals (ni) and as many columns as loci (nl) into an array [ni,nl,2] of allelic data
getal.b(data)
getal.b(data)
data |
a data frame with ni rows and nl columns. Each line encodes one individual, each column contains the genotype at one locus of the individual |
an array [ni,nl,2] of alleles. The two alleles are stored in the third dimension of the array
Jerome Goudet jerome.goudet@unil.ch
data(gtrunchier) #multilocus diploid genotype of the first individual gtrunchier[1,-c(1:2)] #the diploid genotype splitted in its two constituent alleles getal.b(gtrunchier[,-c(1:2)])[1,,]
data(gtrunchier) #multilocus diploid genotype of the first individual gtrunchier[1,-c(1:2)] #the diploid genotype splitted in its two constituent alleles getal.b(gtrunchier[,-c(1:2)])[1,,]
Converts a Genetic Relationship Matrix (GRM) to a kinship matrix
grm2kinship(x)
grm2kinship(x)
x |
a square (GRM) matrix |
a kinship matrix
Jerome Goudet jerome.goudet@unil.ch
Data set consisting of the microsatellite genotypes of 370 Galba truncatula, a tiny freshwater snail, collecting from different localities and several patches within localities in Western Switzerland.
data(gtrunchier)
data(gtrunchier)
Locality |
Identifier of the locality of origin |
Patch |
Identifier of the patch of origin |
L21.V |
Genotype at locus L21.V. For instance the first individual carries allele 2 and 2 at this locus gtrunchier\$L21.V[1] |
L37.J |
Genotype at locus L37.J |
L20.B |
Genotype at locus L20.B |
L29.V |
Genotype at locus L29.V |
L36.B |
Genotype at locus L36.B |
L16.J |
Genotype at locus L16.J |
Trouve S., L. Degen et al. (2000) Microsatellites in the hermaphroditic snail, Lymnaea truncatula, intermediate host of the liver fluke, Fasciola hepatica.Molecular Ecology 9: 1662-1664.
Trouve S., Degen L. and Goudet J. (2005) Ecological components and evolution of selfing in the freshwater snail Galba truncatula. Journal of Evolutionary Biology. 18, 358-370
This package contains functions to estimate hierarchical F-statistics for any number of hierarchical levels using the method described in Yang (1998). It also contains functions allowing to test the significance of population differentiation at any given level using the likelihood ratio G-statistic, showed previoulsly to be the most powerful statistic to test for differnetiation (Goudet et al., 1996) . The difficulty in a hierarchical design is to identify which units should be permutted. Functions samp.within and samp.between give permutations of a sequence that allows reordering of the observations in the original data frame. An exemple of application is given in the help page for function g.stats.glob.
Hierfstat includes now all the capabilities of Fstat, and many others. A new serie of functions implementing the statistics described in Weir and Goudet (2017) and Goudet et al. (2018) (beta.dosage, fs.dosage) have been written to deal with large genomic data sets and take as input a matrix of allelic dosages, the number of alternate alleles an individual carries at a locus.
Several functions have been written to simulate genetic data, or to import them from existing sofwares such as quantiNemo or Hudson's ms
Hierfstat links easily with the gaston, SNPRelate and adegenet packages, among others.
Jerome Goudet jerome.goudet@unil.ch
Goudet J. (2005) Hierfstat, a package for R to compute and test variance components and F-statistics. Molecular Ecology Notes. 5:184-186
Goudet J., Raymond, M., DeMeeus, T. and Rousset F. (1996) Testing differentiation in diploid populations. Genetics. 144: 1933-1940
Weir B.S. and Goudet J. (2017) A Unified Characterization of Population Structure and Relatedness. Genetics. 206: 2085-2103
Goudet J., Kay T. and Weir B.S. (2018) How to estimate kinship. Molecular Ecology. 27: 4121:4135
Weir, B.S. (1996) Genetic Data Analysis II. Sinauer Associates.
Yang, R.C. (1998) Estimating hierarchical F-statistics. Evolution 52(4):950-956
Counts the number of individual genotyped per locus and population
ind.count(data)
ind.count(data)
data |
a data frame containing the population of origin in the first column and the genotypes in the following ones |
A table –with np (number of populations) columns and nl (number of loci) rows– of genotype counts
Jerome Goudet jerome.goudet@unil.ch
data(gtrunchier) ind.count(gtrunchier[,-2])
data(gtrunchier) ind.count(gtrunchier[,-2])
Carry out a PCA on the centered, unscaled matrix of individual's allele frequencies.
indpca(dat,ind.labels=NULL,scale=FALSE) ## S3 method for class 'indpca' print(x,...) ## S3 method for class 'indpca' plot(x,eigen=FALSE,ax1=1,ax2=2,...)
indpca(dat,ind.labels=NULL,scale=FALSE) ## S3 method for class 'indpca' print(x,...) ## S3 method for class 'indpca' plot(x,eigen=FALSE,ax1=1,ax2=2,...)
dat |
A data frame with population of origin as first column, and genotypes in following columns. |
ind.labels |
a vector of labels for the different individuals |
scale |
whether to standardize each column to variance 1 or to leave it as is (default) |
x |
an indpca object |
eigen |
whether to plot in an additional windows screeplot of the inertias for the different axes |
ax1 |
which PCA coordinates to plot on the x axis |
ax2 |
which PCA coordinates to plot on the y axis |
... |
further arguments to pass to print or plot |
An object of class indpca
with components
call |
The function call |
ipca |
an object of class pca and dudi (see dudi.pca) in package ade4 |
mati |
the original non centered matrice of individuals X alleles frequencies |
Jerome Goudet jerome.goudet@unil.ch
##not run data(gtrunchier) x<-indpca(gtrunchier[,-2],ind.labels=gtrunchier[,2]) plot(x,col=gtrunchier[,1],cex=0.7)
##not run data(gtrunchier) x<-indpca(gtrunchier[,-2],ind.labels=gtrunchier[,2]) plot(x,col=gtrunchier[,1],cex=0.7)
genind
Test if the argument to the function is of class
genind
, a class from the adegenet
library)
is.genind(dat)
is.genind(dat)
dat |
an object |
TRUE or FALSE
Converts a kinship matrix to a distance matrix
kinship2dist(x)
kinship2dist(x)
x |
A square matrix containg kinship coefficients |
A distance matrix
Jerome Goudet jerome.goudet@unil.ch
Converts a kinship matrix to a Genetic Relation Matrix (GRM)
kinship2grm(x)
kinship2grm(x)
x |
a square matrix containing kinship coefficients |
for off-diagonal elements, ; for diagonal elements,
a GRM matrix
Jerome Goudet jerome.goudet@unil.ch
## Not run: dos<-matrix(sample(0:2,replace=TRUE,size=1000),nrow=10) #dosage matrix for 10 inds at 100 loci ks<-beta.dosage(dos) # kinship matrix kinship2grm(ks) ## End(Not run)
## Not run: dos<-matrix(sample(0:2,replace=TRUE,size=1000),nrow=10) #dosage matrix for 10 inds at 100 loci ks<-beta.dosage(dos) # kinship matrix kinship2grm(ks) ## End(Not run)
Shifts a kinship matrix
kinshipShift(x,shift=NULL)
kinshipShift(x,shift=NULL)
x |
a square matrix |
shift |
the amount by which the elements of x should be shifted. if |
The kinship matrix produced by beta.dosage
is relative to the average kinship
of the set of individuals analysed ().
Another reference point might be useful, for instance to avoid negative kinship values, one might
want to shift the matrix by
.
the shifted kinship matrix
Jerome Goudet jerome.goudet@unil.ch
creates a vector from a matrix
mat2vec(mat,upper=FALSE)
mat2vec(mat,upper=FALSE)
mat |
a symmetric matrix |
upper |
whether the upper triangular matrix is to be copied to the vector |
a vector
{ mat2vec(matrix(1:16,nrow=4)) mat2vec(matrix(1:16,nrow=4),upper=TRUE) }
{ mat2vec(matrix(1:16,nrow=4)) mat2vec(matrix(1:16,nrow=4),upper=TRUE) }
Estimates matching (or Allele Sharing) between pairs of individuals (for each locus, gives 1 if the two individuals are homozygous for the same allele, 0 if they are homozygous for a different allele, and 1/2 if at least one individual is heterozygous. Matching is the average of these 0, 1/2 and 1s)
matching(dos) AlleleSharing(dos)
matching(dos) AlleleSharing(dos)
dos |
A matrix of 0, 1 and 2s with loci (SNPs) in columns and individuals in rows. missing values are allowed |
This function is written for dosage data, i.e., how many doses of an allele (0, 1 or 2) an individual carries.
It should be use for bi-allelic markers only (e.g. SNPs), although you might "force" a k multiallelic locus to k
biallelic loci (see fstat2dos
).
a matrix of pairwise matching / Allele Sharing
ms
program in a BED
objectImport the output of the ms
program into a BED
object, as defined in the
gaston package
ms2bed(fname,chrom=NULL)
ms2bed(fname,chrom=NULL)
fname |
the name of the text file containing |
chrom |
the number of chromosomes (replicates) to import (default to all) |
ms2bed
relies on ms2dos
.
The population identifier is in bed@ped$famid
a bed object
ms
outputImport the output of the ms
program into suitable format for further manipulations
ms2dos(fname,chrom=NULL)
ms2dos(fname,chrom=NULL)
fname |
a text file containing the output of the |
chrom |
the chromosomes (replicates) to be imported. default to all |
alldat a matrix with as many row as (haploid) individuals and as many columns as SNPs
bim a data frame with two components chr contains the chromosome (replicate) id; pos contains the SNPs positions on the chromosome
Counts the number of different alleles at each locus and population
nb.alleles(data,diploid=TRUE)
nb.alleles(data,diploid=TRUE)
data |
A data frame containing the population of origin in the first column and the genotypes in the following ones |
diploid |
whether individuals are diploid |
A table, –with np (number of populations) columns and nl (number of loci) rows– of the number of different alleles
Jerome Goudet jerome.goudet@unil.ch
data(gtrunchier) nb.alleles(gtrunchier[,-2])
data(gtrunchier) nb.alleles(gtrunchier[,-2])
Estimates pairwise betas according to Weir and Goudet (2017)
pairwise.betas(dat,diploid=TRUE)
pairwise.betas(dat,diploid=TRUE)
dat |
A data frame containing population of origin as the first column and multi-locus genotypes in following columns |
diploid |
whether the data is from a diploid (default) or haploid organism |
a matrix of pairwise betas
Jerome Goudet jerome.goudet@unil.ch
Weir, BS and Goudet J. 2017 A Unified Characterization of Population Structure and Relatedness. Genetics (2017) 206:2085
data(gtrunchier) pairwise.betas(gtrunchier[,-2],diploid=TRUE)
data(gtrunchier) pairwise.betas(gtrunchier[,-2],diploid=TRUE)
Estimate pairwise FSTs according to Nei (1987)
pairwise.neifst(dat,diploid=TRUE)
pairwise.neifst(dat,diploid=TRUE)
dat |
A data frame containing population of origin as the first column and multi-locus genotypes in following columns |
diploid |
whether the data is from a diploid (default) or haploid organism |
FST are calculated using Nei (87) equations for FST', as described in the note section of basic.stats
A matrix of pairwise FSTs
Jerome Goudet jerome.goudet@unil.ch
Nei, M. (1987) Molecular Evolutionary Genetics. Columbia University Press
pairwise.WCfst genet.dist basic.stats
data(gtrunchier) pairwise.neifst(gtrunchier[,-2],diploid=TRUE)
data(gtrunchier) pairwise.neifst(gtrunchier[,-2],diploid=TRUE)
Estimates pairwise FSTs according to Weir and Cockerham (1984)
pairwise.WCfst(dat,diploid=TRUE)
pairwise.WCfst(dat,diploid=TRUE)
dat |
A data frame containing population of origin as the first column and multi-locus genotypes in following columns |
diploid |
whether the data is from a diploid (default) or haploid organism |
FST are calculated using Weir & Cockerham (1984) equations for FST', as described in the note section of wc
A matrix of pairwise FSTs
Jerome Goudet jerome.goudet@unil.ch
Weir, B.S. (1996) Genetic Data Analysis II. Sinauer Associates.
Weir B.S. and Cockerham C.C. (1984) Estimating F-Statistics for the Analysis of Population Structure. Evolution 38:1358
data(gtrunchier) pairwise.WCfst(gtrunchier[,-2],diploid=TRUE)
data(gtrunchier) pairwise.WCfst(gtrunchier[,-2],diploid=TRUE)
principal coordinates analysis as described in Legendre & Legendre Numerical Ecology
pcoa(mat,plotit=TRUE,...)
pcoa(mat,plotit=TRUE,...)
mat |
a distance matrix |
plotit |
Whether to produce a plot of the pcoa |
... |
further arguments (graphical for instance) to pass to the function |
valp |
the eigen values of the pcoa |
vecp |
the eigen vectors of the pcoa (the coordinates of observations) |
eucl |
The cumulative euclidian distances among observations, |
Jerome Goudet jerome.goudet@unil.ch
data(gtrunchier) colo<-c("black","red","blue","yellow","orange","green") pcoa(as.matrix(genet.dist(gtrunchier[,-1])),col=rep(colo,c(5,5,4,5,5,5)))
data(gtrunchier) colo<-c("black","red","blue","yellow","orange","green") pcoa(as.matrix(genet.dist(gtrunchier[,-1])),col=rep(colo,c(5,5,4,5,5,5)))
) from dosage dataEstimates nucleotide diversity from a
dosage matrix
pi.dosage(dos,L=NULL)
pi.dosage(dos,L=NULL)
dos |
a ni X nl dosage matrix containing the number of derived/alternate alleles each individual carries at each SNP |
L |
the length of the sequence |
if L=NULL
(default), returns the sum over SNPs of nucleotide diversity;
otherwise return the average nucleotide diversity per nucleotide given the length L
of the sequence
Estimates allelic frequencies for each population and locus
pop.freq(dat,diploid=TRUE)
pop.freq(dat,diploid=TRUE)
dat |
a data frame where the first column contains the population to which the different individuals belong, and the following columns contain the genotype of the individuals -one locus per column- |
diploid |
specify whether the data set consists of diploid (default) or haploid data |
A list containing allele frequencies. Each element of the list is one locus. For each locus, Populations are in columns and alleles in rows
Jerome Goudet jerome.goudet@unil.ch
data(gtrunchier) pop.freq(gtrunchier[,-2])
data(gtrunchier) pop.freq(gtrunchier[,-2])
fst per pair following Weir and Cockerham (1984)
pp.fst(dat=dat,diploid=TRUE,...)
pp.fst(dat=dat,diploid=TRUE,...)
dat |
a genetic data frame |
diploid |
whether data from diploid organism |
... |
further arguments to pass to the function |
call |
function call |
fst.pp |
pairwise Fsts |
vc.per.loc |
for each pair of population, the variance components per locus |
Jerome Goudet jerome.goudet@unil.ch
Weir B.S. and Cockerham C.C. (1984) Estimating F-Statistics for the Analysis of Population Structure. Evolution 38:1358
Weir, B.S. (1996) Genetic Data Analysis II. Sinauer Associates.
wrapper to return per locus variance components between pairs of samples x & y
pp.sigma.loc(x,y,dat=dat,diploid=TRUE,...)
pp.sigma.loc(x,y,dat=dat,diploid=TRUE,...)
x , y
|
samples 1 and 2 |
dat |
a genetic data set |
diploid |
whether dats are diploid |
... |
further arguments to pass to the function |
sigma.loc |
variance components per locus |
Jerome Goudet jerome.goudet@unil.ch
print function for pp.fst
## S3 method for class 'pp.fst' print(x,...)
## S3 method for class 'pp.fst' print(x,...)
x |
an object of class pp.fst |
... |
further arguments to pass to the function |
Jerome Goudet jerome.goudet@unil.ch
Read QuantiNemo extended format for genotype files
Read QuantiNemo (http://www2.unil.ch/popgen/softwares/quantinemo/) genotype files extended format (option 2)
qn2.read.fstat(fname, na.s = c("NA","NaN"))
qn2.read.fstat(fname, na.s = c("NA","NaN"))
fname |
quantinemo file name |
na.s |
na string used |
dat a data frame with nloc+1 columns, the first being the population to which the individual belongs and the next being the genotypes, one column per locus; and ninds rows
sex the sex of the individuals
Jerome Goudet jerome.goudet@unil.ch
Neuenschwander S, Michaud F, Goudet J (2019) QuantiNemo 2: a Swiss knife to simulate complex demographic and genetic scenarios, forward and backward in time. Bioinformatics 35:886
Neuenschwander S, Hospital F, Guillaume F, Goudet J (2008) quantiNEMO: an individual-based program to simulate quantitative traits with explicit genetic architecture in a dynamic metapopulation. Bioinformatics 24:1552
dat<-qn2.read.fstat(system.file("extdata","qn2_sex.dat",package="hierfstat")) sexbias.test(dat[[1]],sex=dat[[2]])
dat<-qn2.read.fstat(system.file("extdata","qn2_sex.dat",package="hierfstat")) sexbias.test(dat[[1]],sex=dat[[2]])
Imports a FSTAT data file into R. The data frame created is made of nl+1 columns, nl being the number of loci. The first column corresponds to the Population identifier, the following columns contains the genotypes of the individuals.
read.fstat(fname, na.s = c("0","00","000","0000","00000","000000","NA"))
read.fstat(fname, na.s = c("0","00","000","0000","00000","000000","NA"))
fname |
a file in the FSTAT format (http://www.unil.ch/popgen/softwares/fstat.htm): The file must have the following format: The first line contains 4 numbers: the number of samples, np , the number of loci, nl, the highest number used to label an allele, nu, and a 1 if the code for alleles is a one digit number (1-9), a 2 if code for alleles is a 2 digit number (01-99) or a 3 if code for alleles is a 3 digit number (001-999). These 4 numbers need to be separated by any number of spaces. |
The first line is immediately followed by nl lines, each containing the name of a locus, in the order they will appear in the rest of the file.
On line nl+2, a series of numbers as follow:
1 0102 0103 0101 0203 0 0303
The first number identifies the sample to which the individual belongs, the second is the genotype of the individual at the first locus, coded with a 2 digits number for each allele, the third is the genotype at the second locus, until locus nl is entered (in the example above, nl=6). Missing genotypes are encoded with 0, 00, 0000, 000000 or NA. Note that 0001 or 0100 are not a valid format, as both alleles at a locus have to be known, otherwise, the genotype is considered as missing. No empty lines are needed between samples.
na.s |
The strings that correspond to the missing value. You should note have to change this |
a data frame containing the desired data, in a format adequate to pass to varcomp
Goudet J. (1995). FSTAT (Version 1.2): A computer program to calculate F- statistics. Journal of Heredity 86:485-486
Goudet J. (2005). Hierfstat, a package for R to compute and test variance components and F-statistics. Molecular Ecology Notes. 5:184-186
read.fstat(paste(path.package("hierfstat"),"/extdata/diploid.dat",sep="",collapse=""))
read.fstat(paste(path.package("hierfstat"),"/extdata/diploid.dat",sep="",collapse=""))
Imports a FSTAT data file into R. The data frame created is made of nl+1 columns, nl being the number of loci. The first column corresponds to the Population identifier, the following columns contains the genotypes of the individuals.
read.fstat.data(fname, na.s = c("0","00","000","0000","00000","000000","NA"))
read.fstat.data(fname, na.s = c("0","00","000","0000","00000","000000","NA"))
fname |
a file in the FSTAT format (http://www.unil.ch/popgen/softwares/fstat.htm): The file must have the following format: The first line contains 4 numbers: the number of samples, np , the number of loci, nl, the highest number used to label an allele, nu, and a 1 if the code for alleles is a one digit number (1-9), a 2 if code for alleles is a 2 digit number (01-99) or a 3 if code for alleles is a 3 digit number (001-999). These 4 numbers need to be separated by any number of spaces. |
The first line is immediately followed by nl lines, each containing the name of a locus, in the order they will appear in the rest of the file.
On line nl+2, a series of numbers as follow:
1 0102 0103 0101 0203 0 0303
The first number identifies the sample to which the individual belongs, the second is the genotype of the individual at the first locus, coded with a 2 digits number for each allele, the third is the genotype at the second locus, until locus nl is entered (in the example above, nl=6). Missing genotypes are encoded with 0, 00, 0000, 000000 or NA. Note that 0001 or 0100 are not a valid format, as both alleles at a locus have to be known, otherwise, the genotype is considered as missing. No empty lines are needed between samples.
na.s |
The strings that correspond to the missing value. You should note have to change this |
a data frame containing the desired data, in a format adequate to pass to varcomp
Goudet J. (1995). FSTAT (Version 1.2): A computer program to calculate F- statistics. Journal of Heredity 86:485-486
Goudet J. (2005). Hierfstat, a package for R to compute and test variance components and F-statistics. Molecular Ecology Notes. 5:184-186
read.fstat.data(paste(path.package("hierfstat"),"/extdata/diploid.dat",sep="",collapse=""))
read.fstat.data(paste(path.package("hierfstat"),"/extdata/diploid.dat",sep="",collapse=""))
With argument what="SNP", each site is read as a SNP, with the ancestral allele encoded as 0 and the alternate allele encoded as 1. If the ms output file contains several replicates, the different replicates will be collated together. Hence, the number of loci is the sum of all sites from all replicates.
read.ms(fname,what=c("SNP","Haplotype"))
read.ms(fname,what=c("SNP","Haplotype"))
fname |
file name containing ms output |
what |
whether to read ms output as SNPs or haplotypes |
With argument what="Haplotype", each different sequence from a replicate is read as a haplotype, by converting it first to a factor, and then to an integer. There will be as many loci as there are replicates, and the number of alleles per locus will be the number of different haplotypes in the corresponding replicate.
alldat a data frame with nloc+1 columns, the first being the population to which the individual belongs and the next being the genotypes, one column per locus; and one row per (haploid) individual.
Jerome Goudet jerome.goudet@unil.ch
Hudson, R. R. (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18 : 337-338.
## Not run: datH<-read.ms(system.file("extdata","2pops_asspop.txt",package="hierfstat"),what="Haplotype") dim(datH) head(datH[,1:10] datS<-read.ms(system.file("extdata","2pops_asspop.txt",package="hierfstat"),what="SNP") dim(datS) head(datS[,1:10]) ## End(Not run)
## Not run: datH<-read.ms(system.file("extdata","2pops_asspop.txt",package="hierfstat"),what="Haplotype") dim(datH) head(datH[,1:10] datS<-read.ms(system.file("extdata","2pops_asspop.txt",package="hierfstat"),what="SNP") dim(datS) head(datS[,1:10]) ## End(Not run)
Reads a https://samtools.github.io/hts-specs/Variant Call Format (VCF) file into a BED object, retaining bi-allelic SNPs only
read.VCF(fname,BiAllelic=TRUE,...)
read.VCF(fname,BiAllelic=TRUE,...)
fname |
VCF file name. The VCF file can be compressed (VCF.gz) |
BiAllelic |
Logical. If TRUE, only bi-allelic SNPs are retained, otherwise, all variant are kept |
... |
other arguments to pass to the function |
A bed.matrix-class
object
filepath <-system.file("extdata", "LCT.vcf.gz", package="gaston") x1 <- read.VCF( filepath ) x1
filepath <-system.file("extdata", "LCT.vcf.gz", package="gaston") x1 <- read.VCF( filepath ) x1
Used to generate a permutation of a sequence 1:length(lev). blocks of observations are permutted, according to the vector lev passed to the function.
samp.between(lev)
samp.between(lev)
lev |
a vector containing the groups to be permuted. |
a vector 1:length(lev) (with blocks defined by data) randomly permuted. Usually, one passes the result to reorder observations in a data set in order to carry out permutation-based tests
Jerome Goudet, DEE, UNIL, CH-1015 Lausanne Switzerland
Goudet J. (2005). Hierfstat, a package for R to compute and test variance components and F-statistics. Molecular Ecology Notes. 5:184-186
samp.between(rep(1:4,each=4)) #for an application see example in g.stats.glob
samp.between(rep(1:4,each=4)) #for an application see example in g.stats.glob
Used to generate a permutation of a sequence 1:length(inner.lev). blocks of observations defined by inner.lev are permutted within blocks defined by outer.lev
samp.between.within(inner.lev, outer.lev)
samp.between.within(inner.lev, outer.lev)
inner.lev |
a vector containing the groups to be permuted. |
outer.lev |
a vector containing teh blocks within which observations are to be kept. |
a vector 1:length(lev) (with blocks defined by data) randomly permuted. Usually, one passes the result to reorder observations in a data set in order to carry out permutation-based tests
Used to generate a permutation of a sequence 1:length(lev). observations are permutted within blocks, according to the vector lev passed to the function.
samp.within(lev)
samp.within(lev)
lev |
a vector containing the group to which belongs the observations to be permuted. |
a vector 1:length(lev) (with blocks defined by
lev
) randomly permuted. Usually, one passes the result to reorder observations in a data set in order to carry out permutation-based tests.
Jerome Goudet, DEE, UNIL, CH-1015 Lausanne Switzerland
Goudet J. (2005). Hierfstat, a package for R to compute and test variance components and F-statistics. Molecular Ecology Notes. 5:184-186
samp.within(rep(1:4,each=4)) #for an application see example in g.stats.glob
samp.within(rep(1:4,each=4)) #for an application see example in g.stats.glob
Test whether one sex disperses more than the other using the method described in Goudet etal. (2002)
sexbias.test(dat,sex,nperm=NULL,test="mAIc",alternative="two.sided")
sexbias.test(dat,sex,nperm=NULL,test="mAIc",alternative="two.sided")
dat |
a data frame with n.locs+1 columns and n.inds rows |
sex |
a vector containing the individual's sex |
nperm |
the number of permutation to carry out |
test |
one of "mAIc" (default), "vAIc","FIS" or "FST" |
alternative |
one of "two.sided" (default),"less" or "greater" |
call the function call
res the observation for each sex
statistic the observed statistic for the chosen test
p.value the p-value of the hypothesis
Jerome Goudet jerome.goudet@unil.ch
Goudet J, Perrin N, Waser P (2002) Tests for sex-biased dispersal using bi-parentally inherited genetic markers 11, 1103:1114
data(crocrussula) sexbias.test(crocrussula$genot,crocrussula$sex) dat<-qn2.read.fstat(system.file("extdata","qn2_sex.dat",package="hierfstat")) sexbias.test(dat[[1]],sex=dat[[2]]) ## Not run: sexbias.test(crocrussula$genot,crocrussula$sex,nperm=1000) sexbias.test(dat[[1]],sex=dat[[2]],nperm=100,test="FST",alternative="greater") ## End(Not run)
data(crocrussula) sexbias.test(crocrussula$genot,crocrussula$sex) dat<-qn2.read.fstat(system.file("extdata","qn2_sex.dat",package="hierfstat")) sexbias.test(dat[[1]],sex=dat[[2]]) ## Not run: sexbias.test(crocrussula$genot,crocrussula$sex,nperm=1000) sexbias.test(dat[[1]],sex=dat[[2]],nperm=100,test="FST",alternative="greater") ## End(Not run)
Simulates frequencies, for internal use only
Simulates genotypes from several individuals in several populations at several loci in an island model at equilibrium. The islands may differ in size and inbreeding coeeficients.
sim.genot(size=50,nbal=4,nbloc=5,nbpop=3,N=1000,mig=0.001,mut=0.0001,f=0)
sim.genot(size=50,nbal=4,nbloc=5,nbpop=3,N=1000,mig=0.001,mut=0.0001,f=0)
size |
The number of individuals to sample per population |
nbal |
The maximum number of alleles present at a locus |
nbloc |
The number of loci to simulate |
nbpop |
The number of populations to simulate |
N |
The population sizes for each island |
mig |
the proportion of migration among islands |
mut |
The loci mutation rate |
f |
the inbreeding coefficient for each island |
a data frame with nbpop*size lines and nbloc+1 columns. Individuals are in rows and genotypes in columns, the first column being the population identifier
Jerome Goudet jerome.goudet@unil.ch
## Not run: dat<-sim.genot(nbpop=4,nbal=20,nbloc=10,mig=0.001,mut=0.0001,N=c(100,100,1000,1000),f=0) betas(dat)$betaiovl ## End(Not run)
## Not run: dat<-sim.genot(nbpop=4,nbal=20,nbloc=10,mig=0.001,mut=0.0001,N=c(100,100,1000,1000),f=0) betas(dat)$betaiovl ## End(Not run)
This function allows to simulate genetic data from a metapopulation model, where each population can have a different size and a different inbreeding coefficient, and migration between each population is given in a migration matrix.
This function simulates genetic data under a migration matrix model.
Each population sends a proportion of migrant alleles
to population
and receives a proportion of migrant alleles
from population
.
sim.genot.metapop.t(size=50,nbal=4,nbloc=5,nbpop=3,N=1000, mig=diag(3),mut=0.0001,f=0,t=100)
sim.genot.metapop.t(size=50,nbal=4,nbloc=5,nbpop=3,N=1000, mig=diag(3),mut=0.0001,f=0,t=100)
size |
the number of sampled individuals per population |
nbal |
the number of alleles per locus (maximum of 99) |
nbloc |
the number of loci to simulate |
nbpop |
the number of populations to simulate |
N |
the effective population sizes of each population. If only one number, all populations are assumed to be of the same size |
mig |
a matrix with nbpop rows and columns giving the migration rate from population i (in row) to population j (in column). Each row must sum to 1. |
mut |
the mutation rate of the loci |
f |
the inbreeding coefficient for each population |
t |
the number of generation since the islands were created |
In this model, can be written as a function of population size
, migration rate
, mutation rate
and
.
The rational is as follows:
With probability , 2 alleles from 2 different individuals in
the current generation are sampled from the same individual of the previous
generation:
-Half the time, the same allele is drawn from the parent;
-The other half, two different alleles are drawn, but they are identical in
proportion .
-With probability , the 2 alleles are drawn from different
individuals in the previous generation, in which case they are identical in
proportion
.
This holds providing that neither alleles have mutated or migrated. This is
the case with probability .
If an allele is a mutant, then its coancestry with another allele
is 0.
Note also that the mutation scheme assumed is the infinite allele (or site)
model. If the number of alleles is finite (as will be the case in what follows),
the corresponding mutation model is the K-allele model and the mutation rate
has to be adjusted to .
Continue derivation
A data frame with size*nbpop rows and nbloc+1 columns. Each row is an individual, the first column contains the identifier of the population to which the individual belongs, the following nbloc columns contain the genotype for each locus.
Jerome Goudet jerome.goudet@unil.ch
#2 populations psize<-c(10,1000) mig.mat<-matrix(c(0.99,0.01,0.1,0.9),nrow=2,byrow=TRUE) dat<-sim.genot.metapop.t(nbal=10,nbloc=100,nbpop=2,N=psize,mig=mig.mat,mut=0.00001,t=100) betas(dat)$betaiovl # Population specific estimator of FST #1D stepping stone ## Not run: np<-10 m<-0.2 mig.mat<-diag(np)*(1-m) diag(mig.mat[-1,-np])<-m/2 diag(mig.mat[-np,-1])<-m/2 mig.mat[1,1:2]<-c(1-m/2,m/2) mig.mat[np,(np-1):np]<-c(m/2,1-m/2) dat<-sim.genot.metapop.t(nbal=10,nbloc=50,nbpop=np,mig=mig.mat,t=400) pcoa(as.matrix(genet.dist(dat))) # principal coordinates plot ## End(Not run)
#2 populations psize<-c(10,1000) mig.mat<-matrix(c(0.99,0.01,0.1,0.9),nrow=2,byrow=TRUE) dat<-sim.genot.metapop.t(nbal=10,nbloc=100,nbpop=2,N=psize,mig=mig.mat,mut=0.00001,t=100) betas(dat)$betaiovl # Population specific estimator of FST #1D stepping stone ## Not run: np<-10 m<-0.2 mig.mat<-diag(np)*(1-m) diag(mig.mat[-1,-np])<-m/2 diag(mig.mat[-np,-1])<-m/2 mig.mat[1,1:2]<-c(1-m/2,m/2) mig.mat[np,(np-1):np]<-c(m/2,1-m/2) dat<-sim.genot.metapop.t(nbal=10,nbloc=50,nbpop=np,mig=mig.mat,t=400) pcoa(as.matrix(genet.dist(dat))) # principal coordinates plot ## End(Not run)
This function allows to simulate genetic data from a non-equilibrium continent-island model, where each island can have a different size and a different inbreeding coefficient.
This function simulates genetic data under the continent-islands model (IIM=TRUE)
or the finite island model (IIM=FALSE).
In the IIM, a continent of
infinite size sends migrants to islands of finite sizes at a rate
. Alleles can also mutate to a new state at a rate
. Under this model,
the expected
, can be calculated and compared to empirical
estimates.
sim.genot.t(size=50,nbal=4,nbloc=5,nbpop=3,N=1000, mig=0.001,mut=0.0001,f=0,t=100,IIM=TRUE)
sim.genot.t(size=50,nbal=4,nbloc=5,nbpop=3,N=1000, mig=0.001,mut=0.0001,f=0,t=100,IIM=TRUE)
size |
the number of sampled individuals per island |
nbal |
the number of alleles per locus (maximum of 99) |
nbloc |
the number of loci to simulate |
nbpop |
the number of islands to simulate |
N |
the effective population sizes of each island. If only one number, all islands are assumed to be of the same size |
mig |
the migration rate from the continent to the islands |
mut |
the mutation rate of the loci |
f |
the inbreeding coefficient for each island |
t |
the number of generation since the islands were created |
IIM |
whether to simulate a continent island Model (default) or a migrant pool island Model |
In this model, can be written as a function of population size
, migration rate
, mutation rate
and
.
The rational is as follows:
With probability , 2 alleles from 2 different individuals in
the current generation are sampled from the same individual of the previous
generation:
-Half the time, the same allele is drawn from the parent;
-The other half, two different alleles are drawn, but they are identical in
proportion .
-With probability , the 2 alleles are drawn from different
individuals in the previous generation, in which case they are identical in
proportion
.
This holds providing that neither alleles have mutated or migrated. This is
the case with probability .
If an allele is a mutant or a migrant, then its coancestry with another allele
is 0 in the infinite continent-islands model (it is not the case in the finite island model).
Note also that the mutation scheme assumed is the infinite allele (or site)
model. If the number of alleles is finite (as will be the case in what follows),
the corresponding mutation model is the K-allele model and the mutation rate
has to be adjusted to .
Lets substitute for
and
for
.
The expectation of ,
can be written as:
which reduces to if
.
Transition equations for in the migrant-pool island model (IIM=FALSE) are given in Rouseet (1996).
Currently, the migrant pool is made of equal contribution from each island, irrespective of their size.
A data frame with size*nbpop rows and nbloc+1 columns. Each row is an individual, the first column contains the island to which the individual belongs, the following nbloc columns contain the genotype for each locus.
Jerome Goudet jerome.goudet@unil.ch
Rousset, F. (1996) Equilibrium values of measures of population subdivision for stepwise mutation processes. Genetics 142:1357
psize<-c(100,1000,10000,100000,1000000) dat<-sim.genot.t(nbal=4,nbloc=20,nbpop=5,N=psize,mig=0.001,mut=0.0001,t=100) summary(wc(dat)) #Weir and cockerham overall estimators of FST & FIS betas(dat) # Population specific estimator of FST
psize<-c(100,1000,10000,100000,1000000) dat<-sim.genot.t(nbal=4,nbloc=20,nbpop=5,N=psize,mig=0.001,mut=0.0001,t=100) summary(wc(dat)) #Weir and cockerham overall estimators of FST & FIS betas(dat) # Population specific estimator of FST
Subsample a given number of individuals from a FSTAT data frame
subsampind(dat,sampsize = 10)
subsampind(dat,sampsize = 10)
dat |
A data frame with population of origin as first column, and genotypes in following columns. |
sampsize |
the number of individuals to sample in each population. |
A data frame with population of origin as first column, and genotypes in following columns. Each population is made of at most sampsize individuals
Jerome Goudet jerome.goudet@unil.ch
data(gtrunchier) subsampind(gtrunchier[,-1],6) # check the warning
data(gtrunchier) subsampind(gtrunchier[,-1],6) # check the warning
Estimates Tajima's D from dosage data
TajimaD.dosage(dos)
TajimaD.dosage(dos)
dos |
a ni X nl dosage matrix containing the number of derived/alternate alleles each individual carries at each SNP |
Tajima's D (eqn 38 of Tajima, 1989)
Tajima F. 1989 Statistical Method for Testing the Neutral Mutation Hypothesis by DNA Polymorphism. Genetics 123:585-595.
Tests the significance of the effect of test.lev on genetic differentiation
test.between(data, test.lev, rand.unit, nperm, ...)
test.between(data, test.lev, rand.unit, nperm, ...)
data |
a data frame containing the genotypes for the different loci |
test.lev |
A vector containing the units from which to construct the contingency tables |
rand.unit |
A vector containing the assignment of each observation to the units to be permutted |
nperm |
The number of permutations to carry out for the test |
... |
Mainly here to allow passing diploid=FALSE if necessary |
g.star |
A vector containing all the generated g-statistics, the last one beeing the observed |
p.val |
The p-value associated with the test |
Jerome Goudet jerome.goudet@unil.ch
data(gtrunchier) attach(gtrunchier) #test whether the locality level has a significant effect on genetic structuring test.between(gtrunchier[,-c(1,2)],test.lev=Locality,rand.unit=Patch)
data(gtrunchier) attach(gtrunchier) #test whether the locality level has a significant effect on genetic structuring test.between(gtrunchier[,-c(1,2)],test.lev=Locality,rand.unit=Patch)
Tests, using permutations of rand.unit within units defined by the vector within the significance of the contingency tables allele X (levels of test.lev)
test.between.within(data, within, test.lev, rand.unit, nperm, ...)
test.between.within(data, within, test.lev, rand.unit, nperm, ...)
data |
a data frame containing the genotypes for the different loci |
within |
A vector containing the units in which to keep the observations |
test.lev |
A vector containing the units from which to construct the contingency tables |
rand.unit |
A vector containing the assignment of each observation to the units to be permutted |
nperm |
The number of permutations to carry out for the test |
... |
Mainly here to allow passing diploid=FALSE if necessary |
g.star |
A vector containing all the generated g-statistics, the last one beeing the observed |
p.val |
The p-value associated with the test |
Jerome Goudet jerome.goudet@unil.ch
data(yangex) attach(yangex) #tests for the effect of spop on genetic structure test.between.within(data.frame(genot),within=pop,test=spop,rand=sspop)
data(yangex) attach(yangex) #tests for the effect of spop on genetic structure test.between.within(data.frame(genot),within=pop,test=spop,rand=sspop)
Tests the significance of the effect of level on genetic differentiation
test.g(data = data, level, nperm = 100,...)
test.g(data = data, level, nperm = 100,...)
data |
a data frame containing the genotypes for the different loci |
level |
A vector containing the assignment of each observation to its level |
nperm |
The number of permutations to carry out for the test |
... |
Mainly here to allow passing diploid=FALSE if necessary |
g.star |
A vector containing all the generated g-statistics, the last one beeing the observed |
p.val |
The p-value associated with the test |
Jerome Goudet jerome.goudet@unil.ch
data(gtrunchier) attach(gtrunchier) test.g(gtrunchier[,-c(1,2)],Locality)
data(gtrunchier) attach(gtrunchier) test.g(gtrunchier[,-c(1,2)],Locality)
Tests the significance of the effect of inner.level on genetic differentiation within blocks defined by outer.level
test.within(data, within, test.lev, nperm, ...)
test.within(data, within, test.lev, nperm, ...)
data |
a data frame containing the genotypes for the different loci |
within |
A vector containing the units in which to keep the observations |
test.lev |
A vector containing the units from which to construct the contingency tables |
nperm |
The number of permutations to carry out for the test |
... |
Mainly here to allow passing diploid=FALSE if necessary |
g.star |
A vector containing all the generated g-statistics, the last one beeing the observed |
p.val |
The p-value associated with the test |
Jerome Goudet jerome.goudet@unil.ch
data(gtrunchier) attach(gtrunchier) #tests whether the patch level has a significant effect on genetic structure test.within(gtrunchier[,-c(1,2)],within=Locality,test.lev=Patch)
data(gtrunchier) attach(gtrunchier) #tests whether the patch level has a significant effect on genetic structure test.within(gtrunchier[,-c(1,2)],within=Locality,test.lev=Patch)
from dosage dataEstimates , where
is the number of segregating sites
in a set of sequences and
.
theta.Watt.dosage(dos,L=NULL)
theta.Watt.dosage(dos,L=NULL)
dos |
a ni X nl dosage matrix containing the number of derived/alternate alleles each individual carries at each SNP |
L |
the length of the sequence |
if L=NULL
(default), returns , else return
Estimates variance components for each allele for a (fully) hierarchical random design defined by all but the last column of the data frame data, the last column containing the genetic data to analyse. Columns for the hierarchical design should be given from the outermost to the innermost before the individual (e.g. continent, region, population, patch,...)
varcomp(data,diploid=TRUE)
varcomp(data,diploid=TRUE)
data |
a data frame that contains the different factors from the outermost (e.g. region) to the innermost before the individual. the last column of the data frame 'data' contains the locus to analyse, which can be multiallelic. Missing data are allowed. |
diploid |
a boolean stating whether the data come from diploid (TRUE=default) or haploid (FALSE) organisms |
The format for genotypes is simply the code for the 2 alleles put one behind the other, without space in between. For instance if allele 1 at the locus has code 23 and allele 2 39, the genotype format is 2339.
df |
the degrees of freedom for each level |
k |
the k matrix, the coefficients associated with the variance components |
res |
the variance components for each allele |
overall |
the variance components summed over alleles |
F |
a matrix of hierarchical F-statistics type-coefficients
with the first line corresponding to |
Jerome Goudet, DEE, UNIL, CH-1015 Lausanne Switzerland
http://www.unil.ch/popgen/people/jerome.htm
Goudet J. (2005). Hierfstat, a package for R to compute and test variance components and F-statistics. Molecular Ecology Notes. 5:184-186
Weir, B.S. (1996) Genetic Data Analysis II. Sinauer Associates.
Yang, R.C. (1998). Estimating hierarchical F-statistics. Evolution 52(4):950-956
#load data set data(gtrunchier) attach(gtrunchier) # varcomp(data.frame(Locality,Patch,L21.V))
#load data set data(gtrunchier) attach(gtrunchier) # varcomp(data.frame(Locality,Patch,L21.V))
Return multilocus estimators of variance components and F-statistics
varcomp.glob(levels=levels,loci=loci,diploid=TRUE)
varcomp.glob(levels=levels,loci=loci,diploid=TRUE)
levels |
a data frame containing the different levels (factors) from the outermost (e.g. region) to the innermost before the individual |
loci |
a data frame containing the different loci |
diploid |
Specify whether the data are coming from diploid or haploid organisms (diploid is the default) |
loc |
The variance components for each locus |
overall |
The variance components summed over all loci |
F |
a matrix of hierarchical F-statistics type-coefficients
with the first line corresponding to |
Jerome Goudet DEE, UNIL, CH-1015 Lausanne Switzerland
Weir, B.S. (1996) Genetic Data Analysis II. Sinauer Associates.
Yang, R.C. (1998). Estimating hierarchical F-statistics. Evolution 52(4):950-956
Goudet J. (2005). Hierfstat, a package for R to compute and test variance components and F-statistics. Molecular Ecology Notes. 5:184-186
#load data set data(gtrunchier) attach(gtrunchier) varcomp.glob(data.frame(Locality,Patch),gtrunchier[,-c(1,2)])
#load data set data(gtrunchier) attach(gtrunchier) varcomp.glob(data.frame(Locality,Patch),gtrunchier[,-c(1,2)])
Fills a triangular matrix from the inputed vector
vec2mat(x,diag=FALSE,upper=FALSE)
vec2mat(x,diag=FALSE,upper=FALSE)
x |
a vector |
diag |
whether the vector contains the diagonal elements |
upper |
whether the vector contains the upper trinagular matrix elements |
a matrix
{ vec2mat(1:10) vec2mat(1:10,diag=TRUE) vec2mat(1:10,upper=TRUE) }
{ vec2mat(1:10) vec2mat(1:10,diag=TRUE) vec2mat(1:10,upper=TRUE) }
Computes Weir and Cockerham estimates of Fstatistics
wc(ndat,diploid=TRUE,pol=0.0) ## S3 method for class 'wc' print(x,...)
wc(ndat,diploid=TRUE,pol=0.0) ## S3 method for class 'wc' print(x,...)
ndat |
data frame with first column indicating population of origin and following representing loci |
diploid |
Whether data are diploid |
pol |
level of polymorphism reqesuted for inclusion. Note used for now |
x |
an object of class wc |
... |
further arguments to pass to print.wc |
sigma |
variance components of allele frequencies for each allele, in the order among populations, among individuals within populations and within individuals |
sigma.loc |
variance components per locus |
per.al |
FST and FIS per allele |
per.loc |
FST and FIS per locus |
FST |
FST overall loci |
FIS |
FIS overall loci |
Jerome Goudet jerome.goudet@unil.ch
data(gtrunchier) wc(gtrunchier[,-1])
data(gtrunchier) wc(gtrunchier[,-1])
write the genotypes in a format suitable for analysis with bayescan
write.bayescan(dat=dat,diploid=TRUE,fn="dat.bsc")
write.bayescan(dat=dat,diploid=TRUE,fn="dat.bsc")
dat |
a genotype data frame |
diploid |
whether the dataset is diploid or haploid |
fn |
file name for output |
a text file fn is written in the current directory
Jerome Goudet jerome.goudet@unil.ch
Foll M and OE Gaggiotti (2008) Genetics 180: 977-993
http://cmpg.unibe.ch/software/BayeScan/
Write a data frame to a text file in the fstat data format, see read.fstat
write.fstat(dat,fname="genotypes.dat")
write.fstat(dat,fname="genotypes.dat")
dat |
A data frame with first column containing the population identifier and remaining columns containing genotypes |
fname |
The name of teh text file to which the data frame should be written |
None
Jerome Goudet
Goudet J. (1995). FSTAT (Version 1.2): A computer program to calculate F- statistics. Journal of Heredity 86:485-486
## Not run: data(gtrunchier) write.fstat(gtrunchier[,-1],"galba.dat") ## End(Not run)
## Not run: data(gtrunchier) write.fstat(gtrunchier[,-1],"galba.dat") ## End(Not run)
write a ped and a map file suitable for analysis with PLINK
write.ped(dat, ilab = NULL, pop = NULL, fname = "dat",na.str="0",f.id=NULL,m.id=NULL,loc.pos=NULL,sex=NULL)
write.ped(dat, ilab = NULL, pop = NULL, fname = "dat",na.str="0",f.id=NULL,m.id=NULL,loc.pos=NULL,sex=NULL)
dat |
a hierfstat data frame. if pop=NULL, the first column should contain the population identifier, otherwise it contains genotypes at the first locus |
ilab |
individual labels |
pop |
population id |
fname |
filename for ped file |
na.str |
character string to use for missing values |
f.id |
father id. default to unknown |
m.id |
mother id. default to unknown |
loc.pos |
the loci position default to unknown |
sex |
the individual sex. default to unknown |
a map file containing the loci positions
a ped file containing genotypes etc...
Chang et al. (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets
Write a genotype data set to a file in the structure format
write.struct(dat,ilab=NULL,pop=NULL,MARKERNAMES=FALSE,MISSING=-9,fname="dat.str")
write.struct(dat,ilab=NULL,pop=NULL,MARKERNAMES=FALSE,MISSING=-9,fname="dat.str")
dat |
a genotype dataframe |
ilab |
an (optional) column with individual labels |
pop |
an (optional) column with population identifiers |
MARKERNAMES |
whether to add a row with marker names. If TRUE, takes the loci names from dat |
MISSING |
The code for missing alleles |
fname |
a string containing the file name (default to "dat.str") |
a text file in the structure format
Jerome Goudet jerome.goudet@unil.ch
Pritchard JK etal. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945-959
Reproduce the example data set used in Yang's paper appendix. The genotype (column genot) is invented
data(exhier)
data(exhier)
pop |
outermost level |
spop |
sub pop level |
sspop |
sub sub pop level |
genot |
dummy diploid genotype |
Yang, R.C. (1998). Estimating hierarchical F-statistics. Evolution 52(4):950-956
data(yangex) varcomp(yangex) #the k matrix should be the same as matrix (A2) in Yang's appendix, p. 956
data(yangex) varcomp(yangex) #the k matrix should be the same as matrix (A2) in Yang's appendix, p. 956