To run projections with {propop}
, you need a starting
population and projection parameters. If you already have this
information, you need to ensure that the input files have the required
structure. If you don’t have the relevant data, you can download them
from the Federal Statistical Office (FSO). This vignette explains how to
get the data. You’ll also learn how to prepare the relevant information
to run population projections {propop}
.
If you don’t have the information and data required to run
propop::propop()
(or propop::project_raw()
),
you can download most of the data from STAT-TAB. More
specifically, the information from the following tables are needed:
Table ID | Parameters expressed as… | Variables required for projection |
---|---|---|
px- x-0104020000_101 | number of people (reference scenario) |
|
px- x-0104020000_102 | number of people (high growth scenario) |
|
px- x-0104020000_103 | number of people (low growth scenario) |
|
px- x-0104020000_109 | rates / probabilities (five scenarios) |
|
px -x-0104020000_106 | share of newborns with Swiss nationality born to non-Swiss mothers |
|
Constant parameters not directly available from STAT-TAB must be provided as arguments |
|
The propop
package provides two convenience functions to
download data from the FSO.
To get the starting population for a spatial unit, you must use the spelling defined in the corresponding FSO table. The entries in the FSO tables may contain special characters. The spelling may also vary between FSO tables.
BFS::bfs_get_metadata()
is helpful to identify the
required spelling(s) (see further down on this page).
Here’s an example of how to get the population for the canton of Aargau:
library(propop)
ag_population <- get_population(
number_fso = "px-x-0102010000_101",
year_first = 2022,
year_last = 2022,
spatial_units = "- Aargau"
)
Get the parameters for a sample canton (mind using the same spelling as in the FSO tables; see comment above):
The projection can be run as follows:
# select reference scenario
ag_parameters_ref <- ag_parameters |>
dplyr::filter(scen == "reference")
propop(
parameters = ag_parameters_ref,
year_first = 2023,
year_last = 2026,
age_groups = 101,
fert_first = 16,
fert_last = 50,
share_born_female = 100 / 205,
population = ag_population,
subregional = FALSE,
binational = TRUE
)
Note of caution: As long as the FSO’s API interface and the underlying data structure remain stable, the functions will work. However, changes in the API are likely to break the functions.
In case the above shouldn’t work or if you want to retrace the necessary steps manually, we also provide a step-by-step description of how to get the population data and the projection parameters from the FSO.
To download the data, we need the following packages:
To make the data download faster, save disk space, and avoid
filtering after the download, it is advisable to specify and download
only the information that we really need. To prepare such a customised,
reduced data download, the instructions
from the BFS
package are very helpful.
Following these instructions, we can use the text
and
valueTexts
variables to generate a query
dimension object for each table and to download the
data (see following subsections). To illustrate, for table
px-x-0104020000_101
, we can obtain the meta data as
follows:
metadata <- BFS::bfs_get_metadata(number_bfs = "px-x-0104020000_101")
metadata_tidy <- metadata |>
select(-valueTexts) |>
unnest_longer(values) |>
dplyr::mutate(
valueTexts = metadata |>
select(valueTexts) |>
unnest_longer(valueTexts) |>
pull(valueTexts)
) |>
select(code, text, values, valueTexts, everything())
head(metadata_tidy)
#> # A tibble: 6 × 6
#> code text values valueTexts elimination title
#> <chr> <chr> <chr> <chr> <lgl> <chr>
#> 1 Kanton Kanton 0 Schweiz TRUE Szenarien zur Bevölkerungsentwi…
#> 2 Kanton Kanton 1 Zürich TRUE Szenarien zur Bevölkerungsentwi…
#> 3 Kanton Kanton 2 Bern / Berne TRUE Szenarien zur Bevölkerungsentwi…
#> 4 Kanton Kanton 3 Luzern TRUE Szenarien zur Bevölkerungsentwi…
#> 5 Kanton Kanton 4 Uri TRUE Szenarien zur Bevölkerungsentwi…
#> 6 Kanton Kanton 5 Schwyz TRUE Szenarien zur Bevölkerungsentwi…
Although the structure of the first three tables should be identical,
the low growth scenario (_103
) contains different meta
information and requires some changes.
Some of FSO’s expectations are expressed in “number of people” parameters (first three entries in the table). These parameters indicate FSO expectations about how many people do certain things (e.g., how many 64-year old Swiss men will emigrate to another country in 2043).
To prepare the download of these parameters, we can specify the following query:
# Specify the elements to download
dim1 <- metadata_tidy |>
dplyr::filter(
text == "Kanton" & # Canton
valueTexts %in% c("Aargau")
)
dim2 <- metadata_tidy |>
dplyr::filter(
text == "Geschlecht" & # sex
valueTexts %in% c(
"Mann", # male
"Frau"
)
) # female
dim3 <- metadata_tidy |>
dplyr::filter(
text == "Alter" & # get each age group
!(valueTexts %in% "Alter - Total")
) # but exclude "Total"
dim4 <- metadata_tidy |>
dplyr::filter(text == "Jahr") # get all years
# adapt to the different structure of the "low" scenario table
dim4_103 <- metadata_tidy |>
dplyr::filter(
text == "Jahr"
) |> # get all years
dplyr::mutate(values = as.character(0:31))
dim5 <- metadata_tidy |>
dplyr::filter(
text == "Staatsangehörigkeit (Kategorie)" & # nationality
valueTexts %in% c(
"Schweiz", # Swiss
"Ausland"
)
) # Foreign / international
dim6 <- metadata_tidy |>
dplyr::filter(
text == "Beobachtungseinheit" & # parameters for projection
valueTexts %in% c(
"Einwanderungen", # international immigration
"Auswanderungen", # international emigration
"Interkantonale Zuwanderungen", # inter-cantonal immigration
"Interkantonale Abwanderungen" # inter-cantonal emigration
)
)
# build dimensions list object
dimensions <- list(
dim1$values,
dim2$values,
dim3$values,
dim4$values,
dim5$values,
dim6$values
)
# add names
names(dimensions) <- c(
unique(dim1$code),
unique(dim2$code),
unique(dim3$code),
unique(dim4$code),
unique(dim5$code),
unique(dim6$code)
)
# version for _103
# build dimensions list object
dimensions_103 <- list(
dim1$values,
dim2$values,
dim3$values,
dim4_103$values,
dim5$values,
dim6$values
)
# add names
names(dimensions_103) <- c(
unique(dim1$code),
unique(dim2$code),
unique(dim3$code),
unique(dim4_103$code),
unique(dim5$code),
unique(dim6$code)
)
Using the above specifications, we can download the FSO “number of people” parameters as follows:
# reference scenario
fso_numbers_r <- BFS::bfs_get_data(
number_bfs = "px-x-0104020000_101",
query = dimensions
) |>
rename(value = paste0(
"Szenarien zur Bevölkerungsentwicklung der Kantone 2020-2050,",
" Referenzszenario AR-00-2020 - zukünftige Bevölkerungsentwicklung"
)) |>
dplyr::mutate(scen = "reference")
# high growth scenario
fso_numbers_h <- BFS::bfs_get_data(
number_bfs = "px-x-0104020000_102",
query = dimensions
) |>
rename(value = paste0(
"Szenarien zur Bevölkerungsentwicklung der Kantone 2020-2050,",
" 'hohes' Szenario BR-00-2020 - zukünftige Bevölkerungsentwicklung"
)) |>
dplyr::mutate(scen = "high")
# low growth scenario
fso_numbers_l <- BFS::bfs_get_data(
number_bfs = "px-x-0104020000_103",
query = dimensions_103
) |>
rename(value = paste0(
"Szenarien zur Bevölkerungsentwicklung der Kantone 2020-2050,",
" 'tiefes' Szenario CR-00-2020 - zukünftige Bevölkerungsentwicklung"
)) |>
dplyr::mutate(scen = "low")
# combine into a single data frame
fso_numbers_raw <- full_join(fso_numbers_r, fso_numbers_h) |>
full_join(fso_numbers_l)
The FSO indicates some of its expectations as “rates” or “probabilities” (row four in the overview table at the top). To illustrate, these parameters could indicate the likelihood of 24-year old Swiss women to have a child in the year 2034.
Before we can download the data, we again need the metadata:
metadata <- BFS::bfs_get_metadata(number_bfs = "px-x-0104020000_109")
metadata_tidy <- metadata |>
select(-valueTexts) |>
unnest_longer(values) |>
dplyr::mutate(
valueTexts = metadata |>
select(valueTexts) |>
unnest_longer(valueTexts) |>
pull(valueTexts)
) |>
select(code, text, values, valueTexts, everything())
head(metadata_tidy)
#> # A tibble: 6 × 6
#> code text values valueTexts elimination title
#> <chr> <chr> <chr> <chr> <lgl> <chr>
#> 1 Kanton Kanton 0 Zürich NA Szenarien zur Bevölkerungsentwi…
#> 2 Kanton Kanton 1 Bern / Berne NA Szenarien zur Bevölkerungsentwi…
#> 3 Kanton Kanton 2 Luzern NA Szenarien zur Bevölkerungsentwi…
#> 4 Kanton Kanton 3 Uri NA Szenarien zur Bevölkerungsentwi…
#> 5 Kanton Kanton 4 Schwyz NA Szenarien zur Bevölkerungsentwi…
#> 6 Kanton Kanton 5 Obwalden NA Szenarien zur Bevölkerungsentwi…
To download the “rate” and “probability” parameters (last row in the table), we can use the following specifications:
# Specify the elements to download
dim1 <- metadata_tidy |>
dplyr::filter(
text == "Kanton" & # Canton
valueTexts %in% c("Aargau")
)
dim2 <- metadata_tidy |>
dplyr::filter(
text == "Szenario-Variante" & # sex
valueTexts %in% c(
"Referenzszenario AR-00-2020", # reference scenario
"'hohes' Szenario BR-00-2020", # high growth
"'tiefes' Szenario CR-00-2020"
)
) # low growth
dim3 <- metadata_tidy |>
dplyr::filter(
text == "Staatsangehörigkeit (Kategorie)" & # nationality
valueTexts %in% c(
"Schweiz", # Swiss
"Ausland"
)
) # Foreign / international
dim4 <- metadata_tidy |>
dplyr::filter(
text == "Geschlecht" & # sex
valueTexts %in% c(
"Mann", # male
"Frau"
)
) # female
dim5 <- metadata_tidy |>
dplyr::filter(
text == "Alter" & # all 1-year age groups
!(valueTexts %in% "Alter - Total")
) # but exclude "Total"
dim6 <- metadata_tidy |>
dplyr::filter(
text == "Jahr"
) # get all years
dim7 <- metadata_tidy |>
dplyr::filter(
text == "Beobachtungseinheit" & # type of parameter types
valueTexts %in% c(
"Geburtenziffern", # births
"Prospektive Sterbewahrscheinlichkeiten", # mortality
"Auswanderungsziffern", # international emigration
"Interkantonale Abwanderungsziffern", # inter-cantonal emigration
"Einbürgerungsziffern"
)
) # acquisition of Swiss citizenship
# build dimensions list object
dimensions <- list(
dim1$values,
dim2$values,
dim3$values,
dim4$values,
dim5$values,
dim6$values,
dim7$values
)
# add names
names(dimensions) <- c(
unique(dim1$code),
unique(dim2$code),
unique(dim3$code),
unique(dim4$code),
unique(dim5$code),
unique(dim6$code),
unique(dim7$code)
)
Using the above specifications, we can download the FSO “rate” parameters as follows:
# Download rate parameters
fso_rates_raw <- BFS::bfs_get_data(
number_bfs = "px-x-0104020000_109",
query = dimensions
)
We need to process the data to ensure that the structure of the rate parameters conforms to the expectations of the projection function:
# Bring variable names and factor levels into the format required later
fso_rates <- fso_rates_raw |>
dplyr::rename(
nat = "Staatsangehörigkeit (Kategorie)",
sex = Geschlecht,
age = Alter,
year = Jahr,
fso_parameter = Beobachtungseinheit,
scen = "Szenario-Variante",
value =
"Szenarien zur Bevölkerungsentwicklung der Kantone 2020-2050 - Ziffern"
) |>
# change factor levels
dplyr::mutate(
scen = case_match(
scen,
"Referenzszenario AR-00-2020" ~ "reference",
"'hohes' Szenario BR-00-2020" ~ "high",
"'tiefes' Szenario CR-00-2020" ~ "low"
),
nat = case_match(
nat,
"Schweiz" ~ "ch",
"Ausland" ~ "int"
),
sex = case_when(
sex == "Mann" ~ "m",
sex == "Frau" ~ "f"
),
age = as.numeric(stringr::str_extract(age, "\\d+")),
fso_parameter = case_match(
fso_parameter,
"Prospektive Sterbewahrscheinlichkeiten" ~ "mor",
"Auswanderungsziffern" ~ "emi",
"Interkantonale Abwanderungsziffern" ~ "intercant",
"Einbürgerungsziffern" ~ "acq",
"Geburtenziffern" ~ "birth_rate"
)
)
Now we can merge “number of people” and “rate” parameters, make the
data frame wider, and compute the required parameter
inter-cantonal net migration
:
projection_parameters <- dplyr::full_join(fso_rates, fso_numbers) |>
tidyr::pivot_wider(names_from = fso_parameter, values_from = value) |>
# compute inter-cantonal net migration
dplyr::mutate(mig_ch = interc_imm - interc_emi) |>
left_join(fso_births_int_ch, by = c("year", "scen")) |>
# add mandatory column spatial_unit
dplyr::mutate(spatial_unit = "Aargau") |>
# remove unnecessary variables
dplyr::select(-c(Kanton, intercant, emi_n, interc_imm, interc_emi)) |>
dplyr::arrange(year)
Show parameters for one demographic group for the year 2024:
In addition to the parameters, the projection function
propop
also requires a starting population. To prepare the
corresponding query, we again start with the metadata:
metadata_pop <- BFS::bfs_get_metadata(number_bfs = "px-x-0102010000_101")
metadata_pop_tidy <- metadata_pop |>
select(-valueTexts) |>
unnest_longer(values) |>
mutate(
valueTexts = metadata_pop |>
select(valueTexts) |>
unnest_longer(valueTexts) |>
pull(valueTexts)
) |>
select(code, text, values, valueTexts, everything())
We can now specify which levels of the variables we want:
# Specify the elements to download
dim1 <- metadata_pop_tidy |>
dplyr::filter(
text == "Kanton (-) / Bezirk (>>) / Gemeinde (......)" & # Canton
valueTexts %in% c("- Aargau")
)
dim2 <- metadata_pop_tidy |>
dplyr::filter(
text == "Jahr" & # year
valueTexts %in% c("2018")
)
dim3 <- metadata_pop_tidy |>
dplyr::filter(
text == "Bevölkerungstyp" & # permanent
valueTexts %in% "Ständige Wohnbevölkerung"
)
dim4 <- metadata_pop_tidy |>
dplyr::filter(
text == "Staatsangehörigkeit (Kategorie)" & # nationality
valueTexts %in% c("Schweiz", "Ausland")
)
dim5 <- metadata_pop_tidy |>
dplyr::filter(
text == "Geschlecht" & # sex
valueTexts %in% c("Mann", "Frau")
)
dim6 <- metadata_pop_tidy |>
dplyr::filter(
text == "Alter" & # age
!(valueTexts %in% "Alter - Total")
) # exclude "Total"
# build dimensions list object
dimensions <- list(
dim1$values,
dim2$values,
dim3$values,
dim4$values,
dim5$values,
dim6$values
)
# add names
names(dimensions) <- c(
unique(dim1$code),
unique(dim2$code),
unique(dim3$code),
unique(dim4$code),
unique(dim5$code),
unique(dim6$code)
)
Using the above specifications, we can download the FSO “population” as follows:
# Download population
fso_pop_raw <- BFS::bfs_get_data(
number_bfs = "px-x-0102010000_101", # reference scenario
query = dimensions
)
We now process the data to ensure that the population data conforms
to the structure expected in propop::propop()
:
# Bring variable names and factor levels into the format required later
starting_population <- fso_pop_raw |>
dplyr::select(-"Bevölkerungstyp") |>
dplyr::rename(
year = Jahr,
Kanton = "Kanton (-) / Bezirk (>>) / Gemeinde (......)",
nat = "Staatsangehörigkeit (Kategorie)",
sex = Geschlecht,
age = Alter,
n = "Ständige und nichtständige Wohnbevölkerung"
) |>
# change factor levels
mutate(
Kanton = stringr::str_remove_all(Kanton, "- "),
nat = case_match(
nat,
"Schweiz" ~ "ch",
"Ausland" ~ "int"
),
sex = case_when(
sex == "Mann" ~ "m",
sex == "Frau" ~ "f"
),
age = as.numeric(stringr::str_extract(age, "\\d+"))
) |>
dplyr::rename(spatial_unit = Kanton)
starting_population |>
DT::datatable()
Now that the parameters and the starting population are available, we
can run the population projections (see vignette `run_projections` for
more details). The result is shown for one demographic group.
# only keep reference scenario
projection_parameters_ref <- projection_parameters |>
filter(scen == "reference")
# run propop with data from prepare vignette to make sure vignette is okay
results_clean <- propop(
parameters = projection_parameters_ref,
year_first = 2019,
year_last = 2030,
age_groups = 101,
fert_first = 16,
fert_last = 50,
share_born_female = 100 / 205,
population = starting_population,
subregional = FALSE,
binational = TRUE
)
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#> always returns an ungrouped data frame and adjust accordingly.
#> ℹ The deprecated feature was likely used in the propop package.
#> Please report the issue at <https://github.com/statistik-aargau/propop>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
results_clean |>
# select demographic group
dplyr::filter(sex == "f" & nat == "int" & age == 49) |>
dplyr::mutate(across(n, \(x) sprintf(fmt = "%.0f", x))) |>
DT::datatable(filter = "top")