| Title: | Estimate Totals, Means, Proportions and Confidence Intervals of the Swiss Federal Statistic Office's Surveys |
|---|---|
| Description: | Estimates population totals, means, proportions and confidence intervals from the Federal Statistic Office's (FSO) structural and mobility surveys. Give it data from "Strukturerhebung" / "relevé structurel" or from "Mikrozensus für Mobilität und Verkehr" / "Microrecensement mobilité et transports", and obtain estimates of totals, means, proportions and confidence intervals. |
| Authors: | Souad Guemghar [aut, cre], Amt für Daten und Statistik, Basel-Landschaft [cph, fnd] |
| Maintainer: | Souad Guemghar <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 2.1.0 |
| Built: | 2026-05-29 10:28:32 UTC |
| Source: | https://github.com/afds-bl/chensus |
fso_flag_mask applies Swiss Federal Statistical Office (FSO) reliability rules for survey estimates,
based on the number of observations (occ). It flags low reliability estimates and masks them when sample size is too small
(occ <= 4).
fso_flag_mask(data, lang = c("de", "fr", "it", "en"))fso_flag_mask(data, lang = c("de", "fr", "it", "en"))
data |
A data frame or tibble. |
lang |
A character string for the language of the estimate reliability description, one of "de", "fr", "it", "en". Defaults to German if omitted. |
FSO estimate reliability criteria:
occ <= 4: No estimate (confidential).
occ <= 49: Estimate of low reliability.
occ > 49: Reliable estimate.
A tibble containing the original data with masked estimates when occ <= 4 and one additional column:
character column classifying reliability of estimates.
df <- data.frame(occ = c(3, 10, 60), mean_income = c(4000, 4200, 4500)) fso_flag_mask(df)df <- data.frame(occ = c(3, 10, 60), mean_income = c(4000, 4200, 4500)) fso_flag_mask(df)
mzmv_mean() estimates the means, proportions and confidence
intervals of FSO mobility surveys.
mzmv_mean(data, ..., weight, cf = 1.14, alpha = 0.1)mzmv_mean(data, ..., weight, cf = 1.14, alpha = 0.1)
data |
A data frame or tibble. |
... |
Names of variables to be estimated. Can be passed unquoted (e.g., |
weight |
Unquoted or quoted name of the sampling weights column. For programmatic use
with a string variable (e.g., |
cf |
Numeric correction factor of the confidence interval, supplied by FSO. Default is 1.14. |
alpha |
Numeric significance level for confidence intervals. Default is 0.1 (90% CI). |
Tibble (number of rows is length of variable) with the following columns:
id: estimated item
occ: number of survey responses
wmean: weighted mean estimate
ci: confidence interval estimate
See mzmv_mean_map for estimates on a set of conditions.
# Estimate two means mzmv_mean( data = nhanes, annual_household_income, annual_family_income, weight = weights ) # Programmatic use with strings v <- c("annual_household_income", "annual_family_income") mzmv_mean(nhanes, weight = "weights", !!!rlang::syms(v))# Estimate two means mzmv_mean( data = nhanes, annual_household_income, annual_family_income, weight = weights ) # Programmatic use with strings v <- c("annual_household_income", "annual_family_income") mzmv_mean(nhanes, weight = "weights", !!!rlang::syms(v))
mzmv_mean_map() estimates weighted means and confidence intervals for a set of features of the mobility survey, optionally grouped by one or more variables.
mzmv_mean_map(data, variable, ..., weight, cf = 1.14, alpha = 0.1)mzmv_mean_map(data, variable, ..., weight, cf = 1.14, alpha = 0.1)
data |
A data frame or tibble. |
variable |
Character vector of variable names to be estimated. Must be quoted (e.g., |
... |
Grouping variables. Can be passed unquoted (e.g., |
weight |
Unquoted or quoted name of the sampling weights column (must exist in |
cf |
Numeric correction factor for the confidence interval. Default is 1.14. |
alpha |
Numeric significance level for confidence intervals. Default is 0.1 (90% CI). |
A tibble with columns:
Name of the estimated variable.
Name of the grouping variable.
Value of the grouping variable.
Number of cases or observations.
Weighted mean.
Confidence interval.
# Multiple quoted variables mzmv_mean_map( nhanes, variable = c("annual_family_income", "annual_household_income"), gender, birth_country, weight = weights ) # No grouping variables mzmv_mean_map( nhanes, variable = "annual_family_income", weight = weights ) # Programmatic use wt <- "weights" mzmv_mean_map( nhanes, variable = "annual_family_income", gender, birth_country, weight = !!rlang::sym(wt) )# Multiple quoted variables mzmv_mean_map( nhanes, variable = c("annual_family_income", "annual_household_income"), gender, birth_country, weight = weights ) # No grouping variables mzmv_mean_map( nhanes, variable = "annual_family_income", weight = weights ) # Programmatic use wt <- "weights" mzmv_mean_map( nhanes, variable = "annual_family_income", gender, birth_country, weight = !!rlang::sym(wt) )
Demographic survey data from NHANES 2015 to 2016, with data on 9971 participants, including sampling weights.
nhanesnhanes
A data frame with 9971 rows and 13 variables:
SDMVPSU - Masked variance pseudo-PSU
WTINT2YR - Full sample 2 year interview weight
SDMVSTRA - Masked variance pseudo-stratum
RIAGENDR - Gender
RIDAGEYR - Age in years at screening
DMDBORN4 - Country of birth
DMDMARTL - Marital status
SIALANG - Language of interview
DMDHREDU - Household reference person's education level
DMDHHSIZ - Total number of people in the Household
DMDFMSIZ - Total number of people in the Family
INDHHIN2 - Annual household income
INDFMIN2 - Annual family income
The data sets provided in this package are derived from the NHANES database and have been adapted for educational purposes. As such, they are NOT suitable for use as a research database. For research purposes, you should download original data files from the NHANES website and follow the analysis instructions given there.
library(dplyr) glimpse(nhanes) nhanes |> dplyr::count(edu_level)library(dplyr) glimpse(nhanes) nhanes |> dplyr::count(edu_level)
se_mean() estimates the means of numeric variables along with variance
and confidence intervals for FSO's structural survey.
se_mean(data, variable, ..., strata, weight, alpha = 0.05)se_mean(data, variable, ..., strata, weight, alpha = 0.05)
data |
A data frame or tibble. |
variable |
Unquoted or quoted name of the numeric variable whose mean is to be estimated.
Programmatic usage (e.g., using |
... |
Optional grouping variables. Can be passed unquoted (e.g., |
strata |
Unquoted or quoted name of the strata column. Defaults to |
weight |
Unquoted or quoted name of the sampling weights column. For programmatic use
with a string variable (e.g., |
alpha |
Numeric significance level for confidence intervals. Default is 0.05 (95% CI). |
A tibble with columns:
Sample size (number of observations) per group.
Estimated mean of the specified numeric variable, named dynamically.
Estimated variance of the mean (vhat) and its standard deviation (stand_dev, square root of the variance).
Confidence interval: half-width (ci), lower (ci_l) and upper (ci_u) bounds.
# Direct column references (unquoted) se_mean( data = nhanes, variable = age, strata = strata, weight = weights, gender, birth_country ) # Quoted column names se_mean( data = nhanes, variable = "age", strata = "strata", weight = "weights", gender, birth_country ) # Programmatic use with strings v <- "age" wt <- "weights" vars <- c("gender", "birth_country") se_mean( data = nhanes, variable = !!rlang::sym(v), strata = strata, weight = !!rlang::sym(wt), !!!rlang::syms(vars) )# Direct column references (unquoted) se_mean( data = nhanes, variable = age, strata = strata, weight = weights, gender, birth_country ) # Quoted column names se_mean( data = nhanes, variable = "age", strata = "strata", weight = "weights", gender, birth_country ) # Programmatic use with strings v <- "age" wt <- "weights" vars <- c("gender", "birth_country") se_mean( data = nhanes, variable = !!rlang::sym(v), strata = strata, weight = !!rlang::sym(wt), !!!rlang::syms(vars) )
se_mean_ogd estimates survey means of a continuous variable for every combination of the supplied grouping variables,
using se_mean internally and returning results in a format suitable for Open Government Data (OGD).
The output includes means for each combination of grouping variables, as well as for the overall population.
se_mean_ogd(data, variable, ..., strata, weight, alpha = 0.05)se_mean_ogd(data, variable, ..., strata, weight, alpha = 0.05)
data |
A data frame or tibble. |
variable |
Variable to estimate the mean for (unquoted or programmatic). |
... |
Grouping variables (unquoted or programmatic). |
strata |
Stratification variable (unquoted or programmatic). Defaults to "zone" if omitted. |
weight |
Sampling weights variable (unquoted or programmatic). |
alpha |
Significance level for confidence intervals. Default is 0.05 (95% CI). |
A tibble with survey mean estimates for all combinations of grouping variables. Grouping variables are converted to factors with "Total" representing the overall group.
se_total_ogd, se_prop_ogd, se_ogd_wrapper, se_mean
# Unquoted variables se_mean_ogd(nhanes, variable = household_size, strata = strata, weight = weights, gender) # Programmatic use var <- "household_size" wt <- "weights" vars <- "gender" se_mean_ogd( nhanes, variable = !!rlang::sym(var), strata = strata, weight = !!rlang::sym(wt), !!!rlang::syms(vars) )# Unquoted variables se_mean_ogd(nhanes, variable = household_size, strata = strata, weight = weights, gender) # Programmatic use var <- "household_size" wt <- "weights" vars <- "gender" se_mean_ogd( nhanes, variable = !!rlang::sym(var), strata = strata, weight = !!rlang::sym(wt), !!!rlang::syms(vars) )
OGD Wrapper for Structural Survey Estimation Functions
se_ogd_wrapper( data, core_fun, ..., strata, weight, alpha = 0.05, variable = NULL, show_internal = FALSE )se_ogd_wrapper( data, core_fun, ..., strata, weight, alpha = 0.05, variable = NULL, show_internal = FALSE )
data |
A data frame or tibble. |
core_fun |
The core estimation function to use, one of |
... |
Grouping variables (unquoted or programmatic). |
strata |
Stratification variable (unquoted or programmatic). |
weight |
Sampling weights variable (unquoted or programmatic). |
alpha |
Significance level for confidence intervals. |
variable |
(Optional) Variable to estimate mean for (only needed for se_mean). |
show_internal |
Show internal estimates of variance, standard deviation (and percent confidence interval for |
A tibble with estimates for all combinations of grouping variables.
se_prop() estimates the proportions and confidence intervals for each level of one or multiple categorical variables
of FSO's structural survey, by first converting columns into dummy variables and then estimating proportions and confidence intervals.
se_prop(data, ..., strata, weight, alpha = 0.05)se_prop(data, ..., strata, weight, alpha = 0.05)
data |
A data frame or tibble. |
... |
Categorical variables. Can be passed unquoted (e.g., |
strata |
Unquoted or quoted name of the strata column. Defaults to |
weight |
Unquoted or quoted name of the sampling weights column. For programmatic use
with a string variable (e.g., |
alpha |
Numeric significance level for confidence intervals. Default is 0.05 (95% CI). |
A tibble with proportion estimates for all grouping column combinations, including:
Sample size (number of observations) per group.
Estimated proportion of the specified categorical variable in the corresponding group.
Estimated variance of the mean (vhat) and its standard deviation (stand_dev, square root of the variance).
Confidence interval: half-width (ci), lower (ci_l) and upper (ci_u) bounds.
# Direct column references (unquoted) se_prop( data = nhanes, interview_lang, birth_country, strata = strata, weight = weights ) # Quoted column names se_prop( data = nhanes, "interview_lang", gender, "birth_country", strata = "strata", weight = weights, ) # Programmatic use with strings wt <- "weights" vars <- c("interview_lang", "gender", "birth_country") se_prop( data = nhanes, strata = strata, weight = !!rlang::sym(wt), !!!rlang::syms(vars) )# Direct column references (unquoted) se_prop( data = nhanes, interview_lang, birth_country, strata = strata, weight = weights ) # Quoted column names se_prop( data = nhanes, "interview_lang", gender, "birth_country", strata = "strata", weight = weights, ) # Programmatic use with strings wt <- "weights" vars <- c("interview_lang", "gender", "birth_country") se_prop( data = nhanes, strata = strata, weight = !!rlang::sym(wt), !!!rlang::syms(vars) )
se_prop_ogd estimates survey proportions for every combination of the supplied grouping variables,
using se_prop internally and returning results in a format suitable for Open Government Data (OGD).
The output includes proportions for each combination of grouping variables, as well as for the overall population.
se_prop_ogd(data, ..., strata, weight, alpha = 0.05)se_prop_ogd(data, ..., strata, weight, alpha = 0.05)
data |
A data frame or tibble. |
... |
Grouping variables (unquoted or programmatic). |
strata |
Stratification variable (unquoted or programmatic). Defaults to "zone" if omitted. |
weight |
Sampling weights variable (unquoted or programmatic). |
alpha |
Significance level for confidence intervals. Default is 0.05 (95% CI). |
A tibble with survey proportion estimates for all combinations of grouping variables. Grouping variables are converted to factors with "Total" representing the overall group.
se_total_ogd, se_mean_ogd, se_ogd_wrapper, se_prop
# Unquoted variables se_prop_ogd(nhanes, strata = strata, weight = weights, gender, birth_country) # Programmatic use wt <- "weights" vars <- c("gender", "birth_country") se_prop_ogd(nhanes, strata = strata, weight = !!rlang::sym(wt), !!!rlang::syms(vars))# Unquoted variables se_prop_ogd(nhanes, strata = strata, weight = weights, gender, birth_country) # Programmatic use wt <- "weights" vars <- c("gender", "birth_country") se_prop_ogd(nhanes, strata = strata, weight = !!rlang::sym(wt), !!!rlang::syms(vars))
se_total() estimates the totals and confidence intervals of FSO structural surveys.
se_total(data, ..., strata, weight, alpha = 0.05)se_total(data, ..., strata, weight, alpha = 0.05)
data |
A data frame or tibble. |
... |
Optional grouping variables. Can be passed unquoted (e.g., |
strata |
Unquoted or quoted name of the strata column. Defaults to |
weight |
Unquoted or quoted name of the sampling weights column. For programmatic use
with a string variable (e.g., |
alpha |
Numeric significance level for confidence intervals. Default is 0.05 (95% CI). |
The condition argument has been deprecated and is no longer supported.
Please use ... to pass grouping variables either unquoted or programmatically using rlang:
* Interactive use:
se_total(data, weight = my_weight, group1, group2)
* Programmatic use:
weight_var <- "my_weight"
group_vars <- c("group1", "group2")
se_total(data, weight = !!rlang::sym(weight_var), !!!rlang::syms(group_vars))
A tibble with total estimates for all grouping column combinations, including:
Value of the grouping variables passed in ....
number of observations in survey sample.
population estimate.
Estimated variance of the total (vhat) and its standard deviation (stand_dev, square root of the variance).
Confidence interval: half-width (ci), percentage of the total (ci_per), lower (ci_l) and upper (ci_u) bounds.
se_total_map(), se_total_ogd().
# One grouping variable se_total( data = nhanes, strata = strata, weight = weights, gender ) # Multiple grouping variables se_total( data = nhanes, strata = strata, weight = weights, gender, marital_status, birth_country ) # Programmatic use and quoted variables v <- c("gender", "marital_status", "birth_country") se_total( nhanes, weight = "weights", strata = "strata", !!!rlang::syms(v) )# One grouping variable se_total( data = nhanes, strata = strata, weight = weights, gender ) # Multiple grouping variables se_total( data = nhanes, strata = strata, weight = weights, gender, marital_status, birth_country ) # Programmatic use and quoted variables v <- c("gender", "marital_status", "birth_country") se_total( nhanes, weight = "weights", strata = "strata", !!!rlang::syms(v) )
se_total_map() applies se_total() to a data frame for each of several grouping variables, returning a combined tibble of results.
se_total_map(data, ..., strata, weight, alpha = 0.05)se_total_map(data, ..., strata, weight, alpha = 0.05)
data |
A data frame or tibble. |
... |
One or more grouping variables. Can be passed unquoted (e.g., |
strata |
Unquoted or quoted name of the strata column. Defaults to |
weight |
Unquoted or quoted name of the sampling weights column. For programmatic use
with a string variable (e.g., |
alpha |
Numeric significance level for confidence intervals. Default is 0.05 (95% CI). |
This wrapper function allows to efficiently compute totals and confidence intervals for each grouping variable in the structural survey data in parallel.
This function iterates over each grouping variable supplied via ..., applies se_total() to the data grouped by that variable,
and combines the results into a single tibble. The grouping variable is renamed to value and its name is stored in the variable column for clarity.
A tibble with results for each grouping variable, including:
The name of the grouping variable.
The value of the grouping variable.
Sample size for the group.
Estimated total for the group.
Estimated variance of the total (vhat) and its standard deviation (stand_dev, square root of the variance).
Confidence interval: half-width (ci), percentage of the total (ci_per), lower (ci_l) and upper (ci_u) bounds.
# Unquoted variables se_total_map( nhanes, weight = weights, strata = strata, gender, marital_status, birth_country ) # Programmatic use and quoted variables v <- c("gender", "marital_status", "birth_country") se_total_map( nhanes, weight = "weights", strata = "strata", !!!rlang::syms(v) )# Unquoted variables se_total_map( nhanes, weight = weights, strata = strata, gender, marital_status, birth_country ) # Programmatic use and quoted variables v <- c("gender", "marital_status", "birth_country") se_total_map( nhanes, weight = "weights", strata = "strata", !!!rlang::syms(v) )
se_total_ogd estimates survey totals for every combination of the supplied grouping variables,
using se_total internally and returning results in a format suitable for Open Government Data (OGD).
The output includes totals for each combination of grouping variables, as well as for the overall population.
se_total_ogd(data, ..., strata, weight, alpha = 0.05)se_total_ogd(data, ..., strata, weight, alpha = 0.05)
data |
A data frame or tibble. |
... |
Grouping variables (unquoted or programmatic). |
strata |
Stratification variable (unquoted or programmatic). Defaults to "zone" if omitted. |
weight |
Sampling weights variable (unquoted or programmatic). |
alpha |
Significance level for confidence intervals. Default is 0.05 (95% CI). |
A tibble with survey estimates for all combinations of grouping variables. Grouping variables are converted to factors with "Total" representing the overall group.
se_prop_ogd, se_mean_ogd, se_ogd_wrapper, se_total
# Unquoted variables se_total_ogd(nhanes, strata = strata, weight = weights, gender, birth_country) # Programmatic use wt <- "weights" vars <- c("gender", "birth_country") se_total_ogd(nhanes, strata = strata, weight = !!rlang::sym(wt), !!!rlang::syms(vars))# Unquoted variables se_total_ogd(nhanes, strata = strata, weight = weights, gender, birth_country) # Programmatic use wt <- "weights" vars <- c("gender", "birth_country") se_total_ogd(nhanes, strata = strata, weight = !!rlang::sym(wt), !!!rlang::syms(vars))
se_total_prop is a wrapper function for se_total() and se_prop() which estimates totals and proportions for categorical variables.
se_total_prop(data, ..., strata, weight, alpha = 0.05)se_total_prop(data, ..., strata, weight, alpha = 0.05)
data |
A data frame or tibble. |
... |
Optional grouping variables (unquoted). |
strata |
The name of the strata variable (default is "zone"). |
weight |
The name of the weight variable. |
alpha |
Significance level for confidence intervals. Default is 0.05. |
A tibble with joined total and proportion estimates.
se_total_prop( data = nhanes, interview_lang, gender, birth_country, strata = strata, weight = weights )se_total_prop( data = nhanes, interview_lang, gender, birth_country, strata = strata, weight = weights )
se_total_prop_ogd estimates totals and proportions for each combination
of grouping variables using se_total_prop, returning results in a format compatible with Open Government Data (OGD) standards.
along with stratification and weighting.
se_total_prop_ogd(data, ..., strata, weight, alpha = 0.05)se_total_prop_ogd(data, ..., strata, weight, alpha = 0.05)
data |
A data frame or tibble containing the survey data. |
... |
Grouping variables (unquoted or programmatic) to compute combinations of totals and proportions. |
strata |
Stratification variable (unquoted or programmatic). Defaults to |
weight |
Sampling weight variable (unquoted or programmatic). |
alpha |
Significance level for confidence intervals. Default is 0.05 (for 95% CI). |
A tibble with totals and proportions for all combinations of the specified grouping variables. The output includes confidence intervals and handles missing values by representing them as "Total".
se_total_prop, se_ogd_wrapper, se_total_ogd, se_prop_ogd
# With unquoted variables se_total_prop_ogd(nhanes, gender, birth_country, strata = strata, weight = weights) # Programmatic usage vars <- c("gender", "birth_country") wt <- "weights" se_total_prop_ogd(nhanes, !!!rlang::syms(vars), strata = strata, weight = !!rlang::sym(wt))# With unquoted variables se_total_prop_ogd(nhanes, gender, birth_country, strata = strata, weight = weights) # Programmatic usage vars <- c("gender", "birth_country") wt <- "weights" se_total_prop_ogd(nhanes, !!!rlang::syms(vars), strata = strata, weight = !!rlang::sym(wt))