---
title: "Run projections"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Run projections}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(propop)

# load package data
data("fso_parameters")
data("fso_population")
```

# Overview

With `propop::propop()` you can perform population projections either for one or 
several regions. The function applys the Cohort Component Method and is tailored 
to the context of Switzerland. That is, the package was built to run with information 
provided by the Federal Statistical Office (FSO). To run the function, you need
to provide:

-   a data frame with the **starting population**, that is, the most up-to-date
number of people for each demographic group before the first projection year; to
illustrate, the example population data in `propop` are from 31. December 2018 
and the first projection year is 2019.

-   a data frame containing model **parameters**, that is, information about how
key demographic variables such as mortality are expected to develop in the future;

-   global arguments which do not change over time or across demographic groups.

Importantly, the two data frames' structure (number, names, type of columns) must
correspond exactly to the **specifications** shown in [this vignette](prepare_data.html). 
Among other things, it is **mandatory** to provide two levels for sex and nationality. 
The function is more flexible with respect to age groups. Although the examples
use 1-year age groups ranging from 0 to 100 (incl. those who are older), the model 
should also run with aggregated groups (e.g., 0-19, 20-64, 65+ year olds) --
provided that the information in the population and parameter data frames are
compatible (e.g., by aggregating the parameters for the same age groups).   

# Projection for a single region

The package `propop` includes the population data from the canton of Aargau from 
2018 and the FSO parameters from the [population development scenarios 2020](https://www.bfs.admin.ch/bfs/en/home/statistics/catalogues-databases.assetdetail.14963221.html). 
Using these resources, we can project the population for the canton as a whole 
for 1-year age groups for the period 2019-2030. 

The start and end of women's fertile period, the proportion of babies born as female, and the 
share of babies born by mothers who are not Swiss are stable parameters that are
passed to `propop::propop()` as arguments.
\

``` {r project-canton}
projection_canton_2030 <- propop(
  parameters = fso_parameters,
  year_first = 2019,
  year_last = 2030,
  population = fso_population,
  subregional = FALSE,
  binational = TRUE
)

projection_canton_2030 |>
  DT::datatable(filter = "top")
```

\

# Projection for multiple subregions

To project the population development for subregions within a superordinate entity 
(e.g., districts or municipalities within a canton), we need input files with 
multiple regions. Since these are not yet available, we create them:

``` {r prepare-subregions}
# fso parameters for fictitious subregions
fso_parameters_sub <- fso_parameters |>
  # duplicating rows 5 times
  tidyr::uncount(5) |>
  # create 5 subregions
  dplyr::mutate(spatial_unit = rep(1:5, times = nrow(fso_parameters))) |>
  # divide the size of parameters with numbers by the number of regions (= 5);
  # otherwise the multiplication of lines will inflate the population size.
  dplyr::mutate(spatial_unit = as.character(spatial_unit))

# fso population for fictitious subregions
fso_population_sub <- fso_population |>
  dplyr::rename(n_tot = n) |>
  # duplicating rows 5 times
  tidyr::uncount(5) |>
  # create 5 subregions
  dplyr::mutate(spatial_unit = rep(1:5, times = nrow(fso_population))) |>
  dplyr::mutate(
    # Create fictitious n for each subregion
    n = dplyr::case_match(
      spatial_unit,
      1 ~ round(n_tot * 0.3),
      2 ~ round(n_tot * 0.25),
      3 ~ round(n_tot * 0.2),
      4 ~ round(n_tot * 0.15),
      5 ~ round(n_tot * 0.1),
      .default = NA
    ),
    .keep = "all"
  ) |>
  dplyr::mutate(spatial_unit = as.character(spatial_unit)) |>
  dplyr::select(-n_tot)
```

We can then run the projection for the subregions and show the results for
a selected group:

``` {r project-subregions}
projection_subregions_2030 <- propop(
  parameters = fso_parameters_sub,
  year_first = 2019,
  year_last = 2030,
  population = fso_population_sub,
  subregional = FALSE,
  binational = TRUE
)

projection_subregions_2030 |>
  dplyr::filter(sex == "m" & nat == "int" & age == 14) |>
  DT::datatable(filter = "top")
```
   
\

When information about migration patterns within the superordinate entity are 
available (e.g., moving between municipalities), `subregional` can be set to 
`TRUE` to adjust the population size in each subregion accordingly. This requires 
`imm_can` as an additional parameter in the parameter data frame.

\

# No distinction between nationalities

It's possible to run projections without distinguishing between Swiss and 
non-Swiss nationals. The simplest way to achieve this is to provide population 
data and parameters without the nationality-specific columns (remove `nat`,
`acq`, `births_int_ch`).

Let's adapt the input files accordingly. To keep things simple, we run the 
projection for only one of the two nationalities. It goes without saying that using `propop::propop()` like this in real settings requires more preparation (e.g., determining a single value when parameters differ between Swiss and non-Swiss people).


``` {r prepare-single-nationality}
fso_parameters_int <- fso_parameters |>
  # drop Swiss people
    dplyr::filter(nat == "int") |>
  #   remove `nat`, `acq` and `births_int_ch` from `parameters`
    dplyr::select(-c(nat, acq, births_int_ch))

fso_population_int <- fso_population |> 
# drop Swiss people
    dplyr::filter(nat == "int") |>
    # remove `nat` from `population`
     dplyr::select(-nat)
```

When calling `propop::propop()`, you need to set `binational = FALSE`.

``` {r project-int}
projection_int <- propop(
  parameters = fso_parameters_int,
  year_first = 2019,
  year_last = 2030,
  population = fso_population_int,
  subregional = FALSE,
  binational = FALSE
)

projection_int |>
  DT::datatable(filter = "top")
```
\ 

# Interpretation of the output file

The output file includes the number of people (`n`) per demographic group for 
the base year and the projected years.