The goal of {epikit} is to provide miscellaneous functions for applied epidemiologists. This is a product of the R4EPIs project; learn more at https://r4epis.netlify.com.
You can install {epikit} from CRAN (see details for the latest version):
If there is a bugfix or feature that is not yet on CRAN, you can install it via the {drat} package:
You can also install the in-development version from GitHub using the {remotes} package (but there’s no guarantee that it will be stable):
The {epikit} was primarily designed to house convenience functions for applied epidemiologists to use in tidying their reports. The functions in {epikit} come in a few categories:
A couple of functions are dedicated to constructing age categories and partitioning them into separate chunks.
age_categories()
takes in a vector of numbers and
returns formatted age categories.group_age_categories()
will take a data frame with
different age categories in columns (e.g. years, months, weeks) and
combine them into a single column, selecting the column with the lowest
priority.library("knitr")
library("magrittr")
set.seed(1)
x <- sample(0:100, 20, replace = TRUE)
y <- ifelse(x < 2, sample(48, 20, replace = TRUE), NA)
df <- data.frame(
age_years = age_categories(x, upper = 80),
age_months = age_categories(y, upper = 16, by = 6)
)
df %>%
group_age_categories(years = age_years, months = age_months)
#> age_years age_months age_category
#> 1 60-69 <NA> 60-69 years
#> 2 30-39 <NA> 30-39 years
#> 3 0-9 16+ 16+ months
#> 4 30-39 <NA> 30-39 years
#> 5 80+ <NA> 80+ years
#> 6 40-49 <NA> 40-49 years
#> 7 10-19 <NA> 10-19 years
#> 8 80+ <NA> 80+ years
#> 9 50-59 <NA> 50-59 years
#> 10 50-59 <NA> 50-59 years
#> 11 80+ <NA> 80+ years
#> 12 80+ <NA> 80+ years
#> 13 20-29 <NA> 20-29 years
#> 14 50-59 <NA> 50-59 years
#> 15 70-79 <NA> 70-79 years
#> 16 0-9 <NA> 0-9 years
#> 17 70-79 <NA> 70-79 years
#> 18 70-79 <NA> 70-79 years
#> 19 80+ <NA> 80+ years
#> 20 30-39 <NA> 30-39 years
The inline functions make it easier to print estimates with confidence intervals in reports with the correct number of digits.
fmt_ci()
formats confidence intervals from three
numbers. (e.g. fmt_ci(50, 10, 80)
produces 50.00% (CI
10.00-80.00)fmt_pci()
formats confidence intervals from three
fractions, multiplying by 100 beforehand.The confidence interval manipulation functions take in a data frame and combine their confidence intervals into a single character string much like the inline functions do. There are two flavors:
merge_ci_df()
and merge_pci_df()
will
merge just the values of the confidence interval and leave the estimate
alone. Note: this WILL remove the lower and upper columns.unite_ci()
merges both the confidence interval and the
estimate into a single character column. This generally has more options
than merge_ci()
This is useful for reporting models:
fit <- lm(100/mpg ~ disp + hp + wt + am, data = mtcars)
df <- data.frame(v = names(coef(fit)), e = coef(fit), confint(fit), row.names = NULL)
names(df) <- c("variable", "estimate", "lower", "upper")
print(df)
#> variable estimate lower upper
#> 1 (Intercept) 0.740647656 -0.774822875 2.256118188
#> 2 disp 0.002702925 -0.002867999 0.008273849
#> 3 hp 0.005274547 -0.001400580 0.011949674
#> 4 wt 1.001303136 0.380088737 1.622517536
#> 5 am 0.155814790 -0.614677730 0.926307310
# unite CI has more options
unite_ci(df, "slope (CI)", estimate, lower, upper, m100 = FALSE, percent = FALSE)
#> variable slope (CI)
#> 1 (Intercept) 0.74 (-0.77-2.26)
#> 2 disp 0.00 (-0.00-0.01)
#> 3 hp 0.01 (-0.00-0.01)
#> 4 wt 1.00 (0.38-1.62)
#> 5 am 0.16 (-0.61-0.93)
# merge_ci just needs to know where the estimate is
merge_ci_df(df, e = 2)
#> variable estimate ci
#> 1 (Intercept) 0.740647656 (-0.77-2.26)
#> 2 disp 0.002702925 (-0.00-0.01)
#> 3 hp 0.005274547 (-0.00-0.01)
#> 4 wt 1.001303136 (0.38-1.62)
#> 5 am 0.155814790 (-0.61-0.93)
If you need a quick function to determine the number of breaks you
need for a grouping or color scale, you can use
find_breaks()
. This will always start from 1, so that you
can include zero in your scale when you need to.
To quickly pull together population counts for use in surveys or
demographic pyramids the gen_population()
function can
help. If you only know the proportions in each group the function will
convert this to counts for you - whereas if you have counts, you can
type those in directly. The default proportions are based on Doctors
Without Borders general emergency intervention standard values.
# get population counts based on proportion, stratified
gen_population(groups = c("0-4","5-14","15-29","30-44","45+"),
strata = c("Male", "Female"),
proportions = c(0.079, 0.134, 0.139, 0.082, 0.067))
#> Warning in gen_population(groups = c("0-4", "5-14", "15-29", "30-44", "45+"), : Given proportions (or counts) is not the same as
#> groups multiplied by strata length, they will be repeated to match
#> # A tibble: 10 × 4
#> groups strata proportions n
#> <fct> <fct> <dbl> <dbl>
#> 1 0-4 Male 0.079 79
#> 2 5-14 Male 0.134 134
#> 3 15-29 Male 0.139 139
#> 4 30-44 Male 0.082 82
#> 5 45+ Male 0.067 67
#> 6 0-4 Female 0.079 79
#> 7 5-14 Female 0.134 134
#> 8 15-29 Female 0.139 139
#> 9 30-44 Female 0.082 82
#> 10 45+ Female 0.067 67
Type in counts directly to get the groups in a data frame.
# get population counts based on counts, stratified - type out counts
# for each group and strata
gen_population(groups = c("0-4","5-14","15-29","30-44","45+"),
strata = c("Male", "Female"),
counts = c(20, 10, 30, 40, 0, 0, 40, 30, 20, 20))
#> # A tibble: 10 × 4
#> groups strata proportions n
#> <fct> <fct> <dbl> <dbl>
#> 1 0-4 Male 0.0952 20
#> 2 5-14 Male 0.0476 10
#> 3 15-29 Male 0.143 30
#> 4 30-44 Male 0.190 40
#> 5 45+ Male 0 0
#> 6 0-4 Female 0 0
#> 7 5-14 Female 0.190 40
#> 8 15-29 Female 0.143 30
#> 9 30-44 Female 0.0952 20
#> 10 45+ Female 0.0952 20
These functions all modify the appearance of a table displayed in a
report and work best with the knitr::kable()
function.
rename_redundant()
renames redundant columns with a
single name. (e.g. hopitalized_percent
and
confirmed_percent
can both be renamed to
%
)augment_redundant()
is similar to
rename_redundant()
, but it modifies the redundant column
names (e.g. hospitalized_n
and confirmed_n
can
become hospitalized (n)
and
confirmed (n)
)merge_ci()
combines estimate, lower bound, and upper
bound columns into a single column.
df <- data.frame(
`a n` = 1:6,
`a prop` = round((1:6) / 6, 2),
`a deff` = round(pi, 2),
`b n` = 6:1,
`b prop` = round((6:1) / 6, 2),
`b deff` = round(pi * 2, 2),
check.names = FALSE
)
knitr::kable(df)
a n | a prop | a deff | b n | b prop | b deff |
---|---|---|---|---|---|
1 | 0.17 | 3.14 | 6 | 1.00 | 6.28 |
2 | 0.33 | 3.14 | 5 | 0.83 | 6.28 |
3 | 0.50 | 3.14 | 4 | 0.67 | 6.28 |
4 | 0.67 | 3.14 | 3 | 0.50 | 6.28 |
5 | 0.83 | 3.14 | 2 | 0.33 | 6.28 |
6 | 1.00 | 3.14 | 1 | 0.17 | 6.28 |
df %>%
rename_redundant("%" = "prop", "Design Effect" = "deff") %>%
augment_redundant(" (n)" = " n$") %>%
knitr::kable()
a (n) | % | Design Effect | b (n) | % | Design Effect |
---|---|---|---|---|---|
1 | 0.17 | 3.14 | 6 | 1.00 | 6.28 |
2 | 0.33 | 3.14 | 5 | 0.83 | 6.28 |
3 | 0.50 | 3.14 | 4 | 0.67 | 6.28 |
4 | 0.67 | 3.14 | 3 | 0.50 | 6.28 |
5 | 0.83 | 3.14 | 2 | 0.33 | 6.28 |
6 | 1.00 | 3.14 | 1 | 0.17 | 6.28 |