---
title: "chensus"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{chensus}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup, warning=FALSE, message=FALSE}
library(chensus)
library(dplyr)
```
# Introduction
The `chensus` package estimates population frequencies, means, proportions and confidence intervals from surveys conducted by the Federal Statistical Office (FSO):
- structural survey: *Strukturerhebung* (SE) / *relevé structurel* (RS),
- mobility and transport survey: *Mikrozensus Mobilität und Verkehr* (MZMV) / *Microrecensement mobilité et transports* (MRMT).
In this vignette, we demonstrate the main features of the package using the built-in `nhanes` dataset, which contains a subset of data from the [National Health and Nutrition Examination Survey](https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.htm) for the period 2015-2016 (more with `?nhanes` and `vignette("nhanes")`). Its structure is similar to FSO survey data in that it contains `strata` and `weights` columns and demographic features such as `gender` and `household_size`.
# Structural Survey
## Total Estimates
Suppose we want to estimate the population in the `nhanes` data set by gender and birth country. We can use the main analysis function `se_total()`:
```{r}
se_total(
data = nhanes,
weight = weights,
strata = strata,
gender, birth_country
)
```
Column names can be passed programmatically with the help of `rlang`'s `!!sym()` and `!!!syms()` in the function call:
```{r}
w <- "weights"
s <- "strata"
v <- c("gender", "birth_country")
se_total(
data = nhanes,
strata = !!sym(s),
weight = !!sym(w),
!!!syms(v)
)
```
We can also estimate population in parallel for multiple groups:
```{r}
se_total_map(
nhanes,
weight = weights,
strata = strata,
gender, birth_country
)
```
If we wish to estimate population for all combinations of grouping variables including no or partial grouping, we can use `se_total_ogd()`, a wrapper function for the main `se_total()` function:
```{r}
se_total_ogd(nhanes, strata = strata, weight = weights, gender, birth_country)
```
## Proportion Estimates
We can also estimate the proportion of males and females by birth country in the `nhanes` survey:
```{r}
se_prop(
data = nhanes,
gender,
birth_country,
weight = weights,
strata = strata
)
```
and we can display total and proportion estimates in a single table using the FSO format. The FSO publication format qualifies the reliability of estimates and hides confidential estimates (fewer than five observations):
```{r}
se_total_prop(
data = nhanes,
gender,
birth_country,
weight = weights,
strata = strata
) |>
fso_flag_mask()
```
## Mean Estimates
If on the other hand we wish to estimate the mean household size then we can use the function `se_mean()`:
```{r}
se_mean(
data = nhanes,
variable = household_size,
strata = strata,
weight = weights
)
```
or the wrapper function `se_mean_ogd()` for all possible combinations of grouping variables `gender` and `interview_lang`:
```{r}
se_mean_ogd(
nhanes,
variable = household_size,
strata = strata,
weight = weights,
gender, interview_lang
)
```
and with FSO format:
```{r}
nhanes |>
se_mean_ogd(
variable = household_size,
gender, birth_country,
strata = strata,
weight = weights,
) |>
fso_flag_mask(lang = "en") # Default is "de", further possibilities: "fr", "it"
```
# Mobility Survey
If we want to estimate the mean household income then we can use `mzmv_mean()`:
```{r}
mzmv_mean(
data = nhanes,
variable = annual_household_income,
weight = weights
)
```
and grouped by gender (note the variable argument must be quoted here):
```{r}
mzmv_mean_map(
data = nhanes,
variable = "annual_household_income",
gender,
weight = weights
)
```
# Flagging Estimate Reliability
`fso_flag_mask` applies FSO's reliability rules for survey estimates, based on the number of observations (`occ`). It flags low reliability estimates and masks them when sample size is too small (occ \<= 4) as follows:
| | |
|--------------|-----------------------------|
| `occ <= 4` | No estimate (confidential) |
| `occ <= 49` | Estimate of low reliability |
| `occ > 49` | Reliable estimate |
```{r}
results <- nhanes |>
se_total(
strata = strata,
weight = weights,
gender,
birth_country,
interview_lang,
edu_level
)
results |>
filter(occ < 60) |>
fso_flag_mask() |>
select(gender, birth_country, interview_lang, occ, total, ci, obs_status)
```