# Population Projections using R - Including dynamic graphical visualisations

A textbook on population projections with R.

**Farid FLICI**

*Last Revision: 2020-04-30*

Farid FLICI (2020).

*Populations Projections Using R - Including Dynamic graphical Visualisations*. A textbook published with Gitbook, Available at: [https://farid-flici.gitbook.io/pop-proj-dz/], Version of 2020-04-30.In this textbook, we are going to illustrate how to perform populations projections using the

**Cohort-Component Method**using simple R functions and without using population projections specific*Packages*such as`popdemo`

. We use the algerian population data for our case study. Then, we are going to show how to carry-out practical plots of population pyramid.The most practical way to make population projections consists of using the

**Cohort-Compnent Method**. This methods consists of treating the baseline population (population at time$0$

) to be composed of different cohorts born in different years. So, their actual ages go from age $0$

to the maximum surviving age, that we note $w$

. Then, numbers within each cohort are projected using `prospective life tables`

, and each year, new-borns are added in age $0$

. When migration and emigration data are available, population numbers are adjusted accordingly. That's where **Component**comes from; population dynamics is driven from the expected evolution of**three**components: Mortality, Fertility and Immigration.If we set

$P_{x,t}$

to be the population aged $x$

in the begenning of the year $t$

, and $q_{x,t}$

to be the Age-Specific Mortality Rate (ASMR) corresponding to $x$

and $t$

, the year-to-year evolution of the the population within each cohort can be driven using the equation: $P_{x+1,t+1}=P_{x,t}*(1-q_{x,t})$

The new-borns during the year

$t$

are added to the population of the year $t+1$

as the population aged $0$

. To estimate the number of new-borns during the year $t$

, noted $B_t$

, the mid-year population of females at the procreation ages, i.e., $15-49$

years, needs to be multiplied by the Age-Specific Fertility Rates (ASFRs), $f_{x,t}$

. The mid-year population can be approximated by the average population between the begenning of the years $t$

and $t+1$

. We can write:$B_{t}=\sum_{x=15}^{49} \frac{P^{f}{x,t}+P^{f}{x,t+1}}{2} *f_{x,t}$

Then, the number of new-borns are to be split out into

$boys$

and $girls$

. If we set "$a$

" to be the number of boys corresponding to $one$

girl among new-borns, we can split $B_t$

into $B_t^m$

and $B_t^f$

, with $m$

designs males and $f$

females, using the equation:$B_{t} = B_{t}^m+B_{t}^f=\frac{a}{1+a}*B_{t}+\frac{1}{1+a}*B_{t}$

$$a$$can be estimated based on the historical recorded values.

Then,

$B^{m}_{t}$

and $B^f_t$

are introduced as populations, of males and females, at age 0 in the begenning of the year $t+1$

.The implementation of the Cohort-Component Method requires to make available 4 types of data: a Baseline population, projected mortality surfaces (for males and females), a projected fertility surface, and the part of males for

$1$

female among new-borns.All datasets need to be extended on the period going from the reference yars, 2015 in our case, to the horizon of the projection, 2070 in our case.

We associated the datasets to this textbook as embded files which can be downloaded from the links provided below.

- Baseline population: We use the population of 2015, by detailed ages, as a baseline population.

baseline_pop_dz_2015.txt

2KB

Text

Baseline Population, Country: Algeria, Year: 2015, Sex: male and female, Source: ONS, interpolated by Flici (2017)

This population pyramid, for males and females, can also be downloaded fom :[http://www.cread.dz/Actuarial_Demography/Donnees_Site/Detailed_Ages_Proj.xlsx]

- The projected Age-Specific mortality Rates (ASMRs) from 0 to 120 years for the period from 2015 to 2017 were driven from the coherent mortality foracst of Flici (2016a) following the methodology of Hyndman et al. (2013).

mort_forecast_males.txt

79KB

Text

Projected ASMRs, Sex: Male, Ages: 0-120, Years: 2015-2070, Source: Flici (2016a)

mort_forecast_females.txt

79KB

Text

Projected ASMRs, Sex: Female, Ages: 0-120, Years: 2015-2070, Source: Flici (2016a)

The dataset of ASMRs of males and females can also be downloaded from [ http://www.cread.dz/Actuarial_Demography/Donnees_Site/Projected_ASMRs.xlsx].

- The projected Age-Specific Fertility Rates (ASFRs) for females ages 15-49 years along the period from 2015-2070 are estimated by Flici (2016b) following the methodology of Lee (1993), and can be downloaded from [http://www.cread.dz/Actuarial_Demography/Donnees_Site/Projected_ASFRs.xlsx]

fert_forecast.txt

23KB

Text

Projected ASFRs, Sex: Female, Ages: 15-49, Years: 2015-2070, Source: Flici (2016b)

- The immigration sold is supposed to be null.
- The number of males corresponding to 1 female among newborns is equal to 1.045 according to historical data.

**1.3.1. Baseline population pyramid**

For the needs of this work, we use the population pyramid of Algeria in the begening of 2015 for males and females for the ages 0-99 years old.

Baseline Population Dataset on Excel

We upload the data file:

Baseline_pop_dz_2015 <-

read.table("https://firebasestorage.googleapis.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-M69GC8Q928dOzF6PVLf%2F-M69J6T3gp8XoKgj42Q6%2F-M69JWbNPOq-h1hHgkYv%2Fbaseline_pop_dz_2015.txt?alt=media&token=7735f959-b6c4-43c1-a5c5-294e4825a144", header=T)

We separate males and females into different datasheets, namely PM15 and PF15 as accronyms of "Population of males 2015" and "Population of females 2015".

PM15<-as.matrix(Baseline_pop_dz_2015$males[1:100])

rownames(PM15)<-c(0:99)

PF15<-as.matrix(Baseline_pop_dz_2015$females[1:100])

rownames(PF15)<-c(0:99)

Baseline Population of males on R

**1.3.2. Mortality forecast**

We need to upload the mortality surface for males and females.

Mortality data matrix on Excel

**Male mortality**

MM<- read.table("https://firebasestorage.googleapis.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-M69GC8Q928dOzF6PVLf%2F-M69J6T3gp8XoKgj42Q6%2F-M69J_65GSqJfyB1hQZ4%2FMort_forecast_males.txt?alt=media&token=ce303331-93ca-40ce-b2a3-fb50dfcc0bea", header=T)

MM<-as.matrix(MM[1:121,2:57])

rownames(MM)<-c(0:120)

colnames(MM)<-c(2015:2070)

Mortality Data matrix being uploaded to R

**Female Mortality**

MF<- read.table("https://firebasestorage.googleapis.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-M69GC8Q928dOzF6PVLf%2F-M69J6T3gp8XoKgj42Q6%2F-M69JamiW2fVfbZC0gia%2FMort_forecast_females.txt?alt=media&token=8454d5f6-3270-45a0-826c-09de6bfe750d", header=T)

MF<- as.matrix(MF[1:121,2:57])

rownames(MF)<-c(0:120)

colnames(MF)<-c(2015:2070)

**1.3.3. Fertility Forecast**

Fertility Data matrix on Excel

Upload the data file, and we name it as (FR) as "Fertility".

FR<-read.table("https://firebasestorage.googleapis.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-M69GC8Q928dOzF6PVLf%2F-M69J6T3gp8XoKgj42Q6%2F-M69JcVOtWsH-GQUmIiu%2FFert_forecast.txt?alt=media&token=cc90ae5b-dcae-49d9-86e6-b9244fd4549d", header=T)

FR<- as.matrix(FR[1:35, 2:57])

rownames(FR)<-c(15:49)

colnames(FR)<-c(2015:2070)

Fertility Data Matrix being uploaded to R

First, we create an ampty matrix to receive the projection results (PopM for males and PopF for females). This matrix should be of a dimension (

`n=121 * t = 56`

).PopM<-matrix(0,nrow=121,ncol=56)

rownames(PopM)<-c(0:120)

colnames(PopM)<-c(2015:2070)

Then,

PopF<-matrix(0,nrow=121,ncol=56)

rownames(PopF)<-c(0:120)

colnames(PopF)<-c(2015:2070)

Creating an empty datamatrix on R

Then, we copy the PM15 into the first row in PopM (PF15 into PopF[,1])

PopM[1:121,1]<-rbind(PM15,as.matrix(rep(0,21),ncol=1,nrow=21))

PopF[1:121,1]<-rbind(PF15,as.matrix(rep(0,21),ncol=1,nrow=21))

PopM<-as.matrix(PopM)

PopF<-as.matrix(PopF)

The projection of population number by age is deduced by a year to year approach by applying the survival probabilities. If we not

$q_{xt}$

to be the probablity to die (Age Specific Mortality Rates) between the ages $x$

and $x+1$

during the year $t$

, the population at age $x+1$

during the year $t + 1$

noted $P_{x+1;t+1}$

is deduced from : $$P_{x+1,t+1}=P_{x,t}*(1-q_{x,t})$$.The population at age

$0$

in the begenning of the year $t$

is estimated by the total of newborns during the previous years, i.e., year $t-1$

, (noted $B_{t-1}$

) on the basis of combining the population (Mid-year population rather than that of the begening of the year) of females at fertility age ($15-49$

years : noted $PFM_{x;t-1}$

with the age specific fertility rates ASFRs (noted $f_{x;t-1}$

). We can write :$$B_{t-1}=\sum_{s=15}^{49} PFM_{s,t-1}*f_{s,t-1}$$

We estimate the Mid-year population (PFM as an average of the populations at the begening and the end of the previous year). It makes:

$$B_{t-1}=\sum_{s=15}^{49} \frac{PF_{s,t-1}+PF_{s,t}}{2}*f_{s,t-1}$$

In order to separate males and females among the newborns, we introduce (define)

$a$

which represents the number of males aong newborn corresponding to $1$

female.a=1.045

The part of males among a cohort

$B$

of newborns is equal to $$B*\frac{a}{1+a}$$ while the number of females is equal to $$B*\frac{1}{1+a}$$.The newborns during the year

$t-1$

are introduced as the population of age $0$

in the begenning of year $t$

.for (i in 2 : 56) {

for (j in 2: 121) {

PopM[j,i]<-PopM[j-1,i-1]*(1-MM[j-1,i-1])

PopF[j,i]<-PopF[j-1,i-1]*(1-MF[j-1,i-1])

}

PopM[1,i]<-as.matrix(t(PopF[16:50,i-1]+PopF[16:50,i])/2)%*%as.matrix(FR[,i-1])*(a/(1+a))

PopF[1,i]<-as.matrix(t(PopF[16:50,i-1]+PopF[16:50,i])/2)%*%as.matrix(FR[,i-1])*(1/(1+a))

}

The projection matrix being filled out

First, we need to call

`ggplot`

library(ggplot2)

A first example

i<-40

# We set 40 just as an example

year1<-as.character(2015+i)

pyram<-cbind.data.frame(seq(0,110,1),-PopM[1:111,i],PopF[1:111,i])

colnames(pyram)<-c("age","males","females")

A<-ggplot(pyram, aes(x=age))

B<-A+ geom_bar(aes(y=males),fill="blue",stat="identity",width=0.75)

C<- B+ geom_bar(aes(y=females),fill="red",stat="identity",width=0.75)

print(C)

Population Pyramid - a first example

by adding

`+ coord_flip()`

i<-40

year1<-as.character(2015+i)

pyram<-cbind.data.frame(seq(0,110,1),-PopM[1:111,i],PopF[1:111,i])

colnames(pyram)<-c("age","males","females")

A<-ggplot(pyram, aes(x=age))

B<-A+ geom_bar(aes(y=males),fill="blue",stat="identity",width=0.75)

C<- B+ geom_bar(aes(y=females),fill="red",stat="identity",width=0.75)

D<- C+coord_flip()

print(D)

Population pyramid being rotated 90°

by adding

`scale_y_continuous(breaks = seq(-600000, 600000, 200000), labels = paste(as.character(c(seq(6, 0, -2),seq(2,6,2))), "x105"), limits=c(-650000,650000))`

It makes

i<-40

year1<-as.character(2015+i)

pyram<-cbind.data.frame(seq(0,110,1),-PopM[1:111,i],PopF[1:111,i])

colnames(pyram)<-c("age","males","females")

A<-ggplot(pyram, aes(x=age))

B<-A+ geom_bar(aes(y=males),fill="blue",stat="identity",width=0.75)

C<- B+ geom_bar(aes(y=females),fill="red",stat="identity",width=0.75)

D<- C+coord_flip()

E<-D+ scale_y_continuous(breaks = seq(-600000, 600000, 200000),

labels = paste(as.character(c(seq(6, 0, -2),seq(2,6,2))), "x105"),

limits=c(-650000,650000))

print(E)

Population pyramid with axes being redefined

by adding

`xlab("Age") + ylab("Population number")`

It makes :

i<-40

year1<-as.character(2015+i)

pyram<-cbind.data.frame(seq(0,110,1),-PopM[1:111,i],PopF[1:111,i])

colnames(pyram)<-c("age","males","females")

A<-ggplot(pyram, aes(x=age))

B<-A+ geom_bar(aes(y=males),fill="blue",stat="identity",width=0.75)

C<- B+ geom_bar(aes(y=females),fill="red",stat="identity",width=0.75)

D<- C+coord_flip()

E<-D+ scale_y_continuous(breaks = seq(-600000, 600000, 200000),

labels = paste(as.character(c(seq(6, 0, -2),seq(2,6,2))), "x105"),

limits=c(-650000,650000))

F<-E+ xlab("Age") + ylab("Population number")

print(F

Population pyramid with axes being labelled

by adding

`ggtitle(paste("Population Pyramid of Algeria:", year1)`

It makes :

i<-40

year1<-as.character(2015+i)

pyram<-cbind.data.frame(seq(0,110,1),-PopM[1:111,i],PopF[1:111,i])

colnames(pyram)<-c("age","males","females")

A<-ggplot(pyram, aes(x=age))

B<-A+ geom_bar(aes(y=males),fill="blue",stat="identity",width=0.75)

C<- B+ geom_bar(aes(y=females),fill="red",stat="identity",width=0.75)

D<- C+coord_flip()

E<-D+ scale_y_continuous(breaks = seq(-600000, 600000, 200000),

labels = paste(as.character(c(seq(6, 0, -2),seq(2,6,2))), "x105"),

limits=c(-650000,650000))

F<-E+ xlab("Age") + ylab("Population number")

G<-F+ ggtitle(paste("Population Pyramid of Algeria:", year1)))

print(G)

Population pyramid with title being added

by adding

`annotate('text', x = 95, y = 250000, label = 'Females', size = 5, colour="red") + annotate('text', x = 95, y = - 250000, label = 'Males', size = 5, colour="blue")`

It makes :

i<-40

year1<-as.character(2015+i)

pyram<-cbind.data.frame(seq(0,110,1),-PopM[1:111,i],PopF[1:111,i])

colnames(pyram)<-c("age","males","females")

A<-ggplot(pyram, aes(x=age))

B<-A+ geom_bar(aes(y=males),fill="blue",stat="identity",width=0.75)

C<- B+ geom_bar(aes(y=females),fill="red",stat="identity",width=0.75)

D<- C+coord_flip()

E<-D+ scale_y_continuous(breaks = seq(-600000, 600000, 200000),

labels = paste(as.character(c(seq(6, 0, -2),seq(2,6,2))), "x105"),

limits=c(-650000,650000))

F<-E+ xlab("Age") + ylab("Population number")

G<-F+ ggtitle(paste("Population Pyramid of Algeria:", year1)))

H<-G+ annotate('text', x = 95, y = 250000, label = 'Females', size = 5,

colour="red") + annotate('text', x = 95, y = - 250000, label = 'Males', size = 5,

colour="blue")

print(H)

Population pyramid with legend/text being added

by adding

`theme(plot.title = element_text(hjust = 0.5,size=12,face="bold"), panel.border = element_rect(colour = "black",fill=NA,size=1), axis.text = element_text(size=12,face="bold"), panel.background = element_rect(fill="white")))`

It makes :

i<-40

year1<-as.character(2015+i)

pyram<-cbind.data.frame(seq(0,110,1),-PopM[1:111,i],PopF[1:111,i])

colnames(pyram)<-c("age","males","females")

A<-ggplot(pyram, aes(x=age))

B<-A+ geom_bar(aes(y=males),fill="blue",stat="identity",width=0.75)

C<- B+ geom_bar(aes(y=females),fill="red",stat="identity",width=0.75)

D<- C+coord_flip()

E<-D+ scale_y_continuous(breaks = seq(-600000, 600000, 200000),

labels = paste(as.character(c(seq(6, 0, -2),seq(2,6,2))), "x105"),

limits=c(-650000,650000))

F<-E+ xlab("Age") + ylab("Population number")

G<-F+ ggtitle(paste("Population Pyramid of Algeria:", year1)))

H<-G+ annotate('text', x = 95, y = 250000, label = 'Females', size = 5,

colour="red")

I<-H")+theme(plot.title = element_text(hjust = 0.5,size=12,face="bold"),

panel.border = element_rect(colour = "black",fill=NA,size=1), axis.text =

element_text(size=12,face="bold"))

print(H)

Other options

In order to change the background color from gray to white we just need to put:

`panel.background = element_rect(fill=NA) into +theme()`

It makes :

i<-40

year1<-as.character(2015+i)

pyram<-cbind.data.frame(seq(0,110,1),-PopM[1:111,i],PopF[1:111,i])

colnames(pyram)<-c("age","males","females")

A<-ggplot(pyram, aes(x=age))

B<-A+ geom_bar(aes(y=males),fill="blue",stat="identity",width=0.75)

C<- B+ geom_bar(aes(y=females),fill="red",stat="identity",width=0.75)

D<- C+coord_flip()

E<-D+ scale_y_continuous(breaks = seq(-600000, 600000, 200000),

labels = paste(as.character(c(seq(6, 0, -2),seq(2,6,2))), "x105"),

limits=c(-650000,650000))

F<-E+ xlab("Age") + ylab("Population number")

G<-F+ ggtitle(paste("Population Pyramid of Algeria:", year1)))

H<-G+ annotate('text', x = 95, y = 250000, label = 'Females', size = 5, colour="red")

I<-H+theme(plot.title = element_text(hjust = 0.5,size=12,face="bold"),

panel.border = element_rect(colour = "black",fill=NA,size=1), axis.text =

element_text(size=12,face="bold"), panel.background = element_rect(fill= NA))

print(H)

Population pyramid with white background

Required packages:

library(animation)

library(dplyr)

library(ggthemes)

The code to run:

saveGIF({

for (i in 1:56)

{

year1<-as.character(2014+i)

pyram<-cbind.data.frame(seq(0,110,1),-PopM[1:111,i],PopF[1:111,i])

colnames(pyram)<-c("age","males","females")

A<-ggplot(pyram,aes(x=age))+geom_bar(aes(y=males),fill="blue",stat=

"identity",width=0.75)+geom_bar(aes(y=females),fill="red",stat="identity",

width=0.75)+ coord_flip()+ scale_y_continuous(breaks= seq(-600000,600000,

200000), labels = paste(as.character(c(seq(6,0,-2),seq(2,6,2))),"x105"),

limits=c(-650000,650000))+xlab("Age")+ylab("Population number")+ggtitle(

paste("Population Pyramid of Algeria:", year1))+annotate('text', x=95, y=

250000,label='Females', size = 5,colour="red")+annotate('text', x=95, y=

250000, label='Females', size = 5, colour="red")+annotate('text', x=95,

y =-250000, label='Males', size = 5, colour="blue")+ theme(plot.title =

element_text(hjust = 0.5, size=12, face="bold"), panel.border =

element_rect(colour = "black", fill=NA,size=1),axis.text = element_text(

size=12, face="bold"))

print(A)

}}

, movie.name = 'pyram.gif', interval = 0.3, ani.width = 1100, ani.height = 820)

After having executed the R-code above, a message about the progression of the GIF creation is posted on the screen:

Gif creating message

Once you get this message, the gif file can be found in the same directory as the R-project location. To open it, you should use any internet browser or to put it on power point full screen.

Dynamic population pyramid

In order to visualize the dynamic pyramid, you can insert the GIF plot into a power point file and to make it on full screen view or you can just open the GIF using any internet browser.

**Converter Error Message**

Sometimes, when trying to run the code to create the GIF, an error message appears and it concerns the converter

`ImageMagick`

. This last is not a part of R, and i is an independent graphical tool which allow the creation of a GIF correctely. What do you need to do in such a case, is to install `ImageMagic`

version 7 or higher because the old versions work with `converter.exe`

which is not adapted to the newest versions of R-studio. The compatible converter is `magick`

. This application can be downloaded from: [www.imagemagick.org/script/download.php].Then, you need to run the following code in R to update the location of the converter being installed:

`ani.options(convert = 'C:/PROGRA`

~`1/ImageMagick-7.0.7-Q8/convert.exe)`

Before to run again the `saveGIF()`

.- Download historical and the projected series of life expectancy at birth for males and females in Algeria from : [http://www.cread.dz/Actuarial_Demography/Donnees_Site/Total_Fertility_Rate.xlsx]
- plot the observed and the projected Total Fertility Rate

- Now, make it dynami

- Download historical and the projected series of life expectancy at birth for males and females in Algeria from : [http://www.cread.dz/Actuarial_Demography/Donnees_ Site/Life_exp_coherent_forecast.xlsx]
- Do it, plot it

- Make it movING

- Flici, F. (2016a). Coherent mortality forecasting for the Algerian population. Presented at Samos Conference in Actuarial Sciences and Finance, Samos, Greece (May).
- Flici, F. (2016b). Projection des taux de fécondité de la population algérienne á l'horizon 2050. MPRA Paper No. 99077, posted 12 Mar 2020. [https://mpra.ub.uni-muenchen.de/99077/1/MPRA_paper_99077.pdf]
- Flici, F. (2017). Longevity and pension plan sustainability in Algerie: Taking the re- tirees mortality experience into account. Doctoral dissertation, Higher National School of Statistics and Applied Economics (ENSSEA), Kolea, Algeria.
- Hyndman, R. J., Booth, H., & Yasmeen, F. (2013). Coherent mortality forecasting: the product-ratio method with functional time series models. Demography, 50 (1), 261-283.
- Lee, R. D. (1993). Modeling and forecasting the time series of US fertility: Age distribution, range, and ultimate level. International Journal of Forecasting, 9(2), 187-202.

Last modified 1yr ago