propop
provides a function to compare
projections against a benchmark. The benchmark can be the
actual population development (i.e., official records) or a base model
(e.g., when testing alternative models or when comparing against the
official FSO forecast).
Evaluations can be carried out individually per observation unit or like a summary statistics aggregated across all observations (e.g., all years, demographic groups, spatial units).
To run the evaluation function, we need some benchmark data and the results from the projection.
Here we use the population records from the canton of Aargau 2019-2022 as benchmark (2018 will serve as starting population):
data_benchmark <- get_population(
number_fso = "px-x-0102010000_101",
year_first = 2018,
year_last = 2022,
spatial_units = "- Aargau"
)
Get the parameters for the projection:
Run the population projection for 2019-2022:
data_projected <- propop(
parameters = data_parameters |>
dplyr::filter(scen == "reference"),
year_first = 2019,
year_last = 2050,
age_groups = 101,
fert_first = 16,
fert_last = 50,
share_born_female = 100 / 205,
# population records from 2018 as starting point
population = data_benchmark |>
dplyr::filter(year == 2018),
subregional = FALSE,
binational = TRUE
)
Let’s first compare the projected population growth against the
recorded population development (= benchmark) using one-year age classes
(default option). Make sure to provide matching data frames, especially
in terms of the year ranges and spatial units. To remove the starting
population (2018) from the projection results, you could also use
drop_start_year = TRUE
.
As a result of prepare_evaluation()
, you get the
recorded (n_bench
) and projected (n_proj
)
population for each demographic group and year as data frame columns
next to each other:
# Combine and pre-process the data
combined <- prepare_evaluation(
# only keep years from projected period
data_benchmark = data_benchmark |> dplyr::filter(year > 2018),
data_projected = data_projected |> dplyr::filter(year > 2018 & year <= 2022)
)
# Show combined data
combined
#> # A tibble: 1,616 × 7
#> year spatial_unit nat sex age n_bench n_proj
#> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 2019 Aargau ch m 0 2622 2510
#> 2 2019 Aargau ch m 1 2593 2551
#> 3 2019 Aargau ch m 2 2635 2631
#> 4 2019 Aargau ch m 3 2596 2609
#> 5 2019 Aargau ch m 4 2683 2670
#> 6 2019 Aargau ch m 5 2737 2738
#> 7 2019 Aargau ch m 6 2636 2633
#> 8 2019 Aargau ch m 7 2609 2594
#> 9 2019 Aargau ch m 8 2642 2632
#> 10 2019 Aargau ch m 9 2657 2654
#> # ℹ 1,606 more rows
Based on the difference between the observed and projected number of
people, compute_measures()
then calculates the error and
several performance metrics:
evaluation_1 <- compute_measures(combined)
#> Warning message:
#> The following columns of the output have `Inf` values:
#> ! "pe" and "ape".
#> ℹ `Inf` values are probably caused by divisions by zero(e.g., when `n_bench` = 0).
#> Consider using age groups thatinclude more than 1 year.
# Create table
evaluation_1 |>
# select demographic group
dplyr::filter(sex == "m" & nat == "ch" & age == 27) |>
# round to two digits
dplyr::mutate(across(pe:ape, \(x) sprintf(fmt = "%.2f", x))) |>
DT::datatable(
filter = "none",
options = list(dom = 't'),
caption = htmltools::tags$caption(
style = "caption-side: top; text-align: left; font-weight: bold",
"Performance measures for projections with 1-year age classes for the
canton of Aargau 2019-2022; only one demographic group is displayed."
)
)
To obtain summary statistics for the whole projection model, you can
use the aggregate_measures()
function:
aggregate_measures(evaluation_1) |>
# round to two digits
dplyr::mutate(across(mpe:ape_under_5, \(x) sprintf(fmt = "%.2f", x))) |>
DT::datatable(
options = list(dom = 't'),
caption = htmltools::tags$caption(
style = "caption-side: top; text-align: left; font-weight: bold",
"Evaluation measures of projection with one-year age classes, aggregated
across all observations."
))
#> Warning message:
#> The following columns in `data` have `Inf` values:
#> ! "pe" and "ape".
#> ℹ `Inf` values are probably caused by divisions by zero (e.g., when `n_bench` = 0). Consider using larger age groups that include more than 1 year.
#> Warning message:
#> The following columns in the output have `Inf` values:
#> ! "mpe" and "mape".
Note the warning message: Several evaluation
measures have Inf
values. The reason for this is that it is
not allowed to divide by 0
.
One-year age classes often lead to small group sizes. This is
particularly problematic when the benchmark is equal to 0; then several
measures cannot be computed because divisions by zero are not allowed
(Inf
values are returned).
A solution to this problem is to use larger age groups that contain several years.
Instead of using one-year age classes, you can conduct the evaluation
for the commonly used age groups 0-19 year olds, 20-64 year olds, and
over 64 year olds using the option
age_groups = "age_groups_3"
.
# Combine and pre-process the data
combined_grouped <- prepare_evaluation(
# only keep years from projected period
data_benchmark = data_benchmark |> dplyr::filter(year > 2018),
data_projected = data_projected |> dplyr::filter(year > 2018 & year <= 2022),
age_groups = "age_groups_3"
)
# Show combined data
combined_grouped
#> # A tibble: 48 × 7
#> year spatial_unit age sex nat n_bench n_proj
#> <int> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 2019 Aargau age_00_19 f ch 49279 49129
#> 2 2019 Aargau age_00_19 f int 17818 18118
#> 3 2019 Aargau age_00_19 m ch 52389 52076
#> 4 2019 Aargau age_00_19 m int 19323 19566
#> 5 2019 Aargau age_20_64 f ch 151860 151781
#> 6 2019 Aargau age_20_64 f int 56115 56380
#> 7 2019 Aargau age_20_64 m ch 149279 149263
#> 8 2019 Aargau age_20_64 m int 66770 67029
#> 9 2019 Aargau age_65_plus f ch 59729 59847
#> 10 2019 Aargau age_65_plus f int 6311 6332
#> # ℹ 38 more rows
# Compute the performance measures
evaluation_2 <- compute_measures(combined_grouped)
evaluation_2 |>
# select demographic group
dplyr::filter(sex == "m" & nat == "ch" & age == "age_20_64") |>
# round to two digits
dplyr::mutate(across(pe:ape, \(x) sprintf(fmt = "%.2f", x))) |>
DT::datatable(
options = list(dom = 't'),
caption = htmltools::tags$caption(
style = "caption-side: top; text-align: left; font-weight: bold",
"Performance measures for projections with three age classes for the
canton of Aargau 2019-2022; only one demographic group is displayed."
)
)
You can again use aggregate_measures()
to obtain a
summary of the evaluation across all observations:
aggregate_measures(evaluation_2) |>
# round to two digits
dplyr::mutate(across(mpe:ape_under_5, \(x) sprintf(fmt = "%.2f", x))) |>
DT::datatable(
options = list(dom = 't'),
caption = htmltools::tags$caption(
style = "caption-side: top; text-align: left; font-weight: bold",
"Evaluation measures for projection with three age classes, aggregated
across all observations."
))
Furthermore, you can use weights. When using the weighted metric,
adjusted absolute percentage errors weigh less heavily in smaller groups
than in larger groups. The adjusted metric w_ape
considers
that smaller groups tend to have larger forecast errors.
This is illustrated in the following table: The unweigthed percentage
errors (ape
) 0.39/0.36 become 0.24/0.07 when they are
weighted by the total number of people per group
(w_ape
).
evaluation_3 <- compute_measures(combined_grouped, weight_groups = c("age"))
evaluation_3 |>
dplyr::filter(year == 2019 & ape > .34 & ape < .39) |>
dplyr::mutate(across(pe:w_ape, \(x) sprintf(fmt = "%.2f", x))) |>
DT::datatable(
options = list(dom = 't'),
caption = htmltools::tags$caption(
style = "caption-side: top; text-align: left; font-weight: bold",
"Selected results showing the effect of using weights
(ape versus w_ape)."
),
) |>
DT::formatStyle(
columns = c('ape', 'w_ape'),
backgroundColor = '#96D4FF'
)
You can also use aggregate_measures()
to obtain a
summary of the evaluation with weighted groups:
propop
versus FSOInstead of comparing the propop projection with population records,
we can also use the evaluation function to check how close the
propop
projection comes to FSO’s projection from the 2020
model.
In the example data, the differences are extremely small (>= 1):
fso_propop <- prepare_evaluation(
data_benchmark = data_parameters |>
dplyr::filter(scen == "reference" & year > 2018) |>
dplyr::mutate(n = n_projected),
data_projected = data_projected |>
dplyr::filter(year > 2018),
age_groups = "age_groups_3"
) |>
compute_measures()
fso_propop |>
dplyr::mutate(across(pe:ape, \(x) sprintf(fmt = "%.2f", x))) |>
DT::datatable(
filter = "top",
caption = htmltools::tags$caption(
style = "caption-side: top; text-align: left; font-weight: bold",
"Comparison of projetions from FSO versus `propop`."
)
)