--- title: "chensus" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{chensus} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, warning=FALSE, message=FALSE} library(chensus) library(dplyr) ``` # Introduction The `chensus` package estimates population frequencies, means, proportions and confidence intervals from surveys conducted by the Federal Statistical Office (FSO): - structural survey: *Strukturerhebung* (SE) / *relevé structurel* (RS), - mobility and transport survey: *Mikrozensus Mobilität und Verkehr* (MZMV) / *Microrecensement mobilité et transports* (MRMT). In this vignette, we demonstrate the main features of the package using the built-in `nhanes` dataset, which contains a subset of data from the [National Health and Nutrition Examination Survey](https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.htm) for the period 2015-2016 (more with `?nhanes` and `vignette("nhanes")`). Its structure is similar to FSO survey data in that it contains `strata` and `weights` columns and demographic features such as `gender` and `household_size`. # Structural Survey ## Total Estimates Suppose we want to estimate the population in the `nhanes` data set by gender and birth country. We can use the main analysis function `se_total()`: ```{r} se_total( data = nhanes, weight = weights, strata = strata, gender, birth_country ) ``` Column names can be passed programmatically with the help of `rlang`'s `!!sym()` and `!!!syms()` in the function call: ```{r} w <- "weights" s <- "strata" v <- c("gender", "birth_country") se_total( data = nhanes, strata = !!sym(s), weight = !!sym(w), !!!syms(v) ) ``` We can also estimate population in parallel for multiple groups: ```{r} se_total_map( nhanes, weight = weights, strata = strata, gender, birth_country ) ``` If we wish to estimate population for all combinations of grouping variables including no or partial grouping, we can use `se_total_ogd()`, a wrapper function for the main `se_total()` function: ```{r} se_total_ogd(nhanes, strata = strata, weight = weights, gender, birth_country) ``` ## Proportion Estimates We can also estimate the proportion of males and females by birth country in the `nhanes` survey: ```{r} se_prop( data = nhanes, gender, birth_country, weight = weights, strata = strata ) ``` and we can display total and proportion estimates in a single table using the FSO format. The FSO publication format qualifies the reliability of estimates and hides confidential estimates (fewer than five observations): ```{r} se_total_prop( data = nhanes, gender, birth_country, weight = weights, strata = strata ) |> fso_flag_mask() ``` ## Mean Estimates If on the other hand we wish to estimate the mean household size then we can use the function `se_mean()`: ```{r} se_mean( data = nhanes, variable = household_size, strata = strata, weight = weights ) ``` or the wrapper function `se_mean_ogd()` for all possible combinations of grouping variables `gender` and `interview_lang`: ```{r} se_mean_ogd( nhanes, variable = household_size, strata = strata, weight = weights, gender, interview_lang ) ``` and with FSO format: ```{r} nhanes |> se_mean_ogd( variable = household_size, gender, birth_country, strata = strata, weight = weights, ) |> fso_flag_mask(lang = "en") # Default is "de", further possibilities: "fr", "it" ``` # Mobility Survey If we want to estimate the mean household income then we can use `mzmv_mean()`: ```{r} mzmv_mean( data = nhanes, variable = annual_household_income, weight = weights ) ``` and grouped by gender (note the variable argument must be quoted here): ```{r} mzmv_mean_map( data = nhanes, variable = "annual_household_income", gender, weight = weights ) ``` # Flagging Estimate Reliability `fso_flag_mask` applies FSO's reliability rules for survey estimates, based on the number of observations (`occ`). It flags low reliability estimates and masks them when sample size is too small (occ \<= 4) as follows: | | | |--------------|-----------------------------| | `occ <= 4` | No estimate (confidential) | | `occ <= 49` | Estimate of low reliability | | `occ > 49` | Reliable estimate | ```{r} results <- nhanes |> se_total( strata = strata, weight = weights, gender, birth_country, interview_lang, edu_level ) results |> filter(occ < 60) |> fso_flag_mask() |> select(gender, birth_country, interview_lang, occ, total, ci, obs_status) ```