Insurance regulation relies on the conditional tail expectation(CTE) of a loss random variable for specifying required capital as well as for valuation of liabilities. Hence, understanding statistical inference of the CTE measure is an important aspect of actuarial education.
A key property of the non-parametric estimator of the CTE is its asymptotic normality, see Manistre and Hancock (2005)1, Ahn and Shyamalkumar (2011)2.
Traditional proofs of this result relies on various forms of functional delta method3, which makes the proof outside the scope of even masters level actuarial education.
In this poster, we provide an intuitive proof of this result, which by relying instead on the ordinary delta method makes it accessible to most actuaries and actuarial students.
For a given \(nice\) distribution \(F\), the \(\alpha\)-th level percentile (VaR), \(q_\alpha\), is defined by
\[ Pr\left(X>q_{\alpha}\right)=1-\alpha, \]
where \(X \sim F\); the \(\alpha\)-th level CTE is defined by
\[ CTE_{\alpha}=\mathbb{E}\left[X|X>q_{\alpha}\right]. \]
In practice, since \(F\) is unknown, an estimate of the CTE is required for regulatory and risk management purposes. Typically, a random sample \(X_1, ..., X_n\) from \(F\) is available for this purpose. The commonly used estimator of the CTE is its empirical counterpart,
\[ CTE_{n:\alpha}=\frac{1}{n-\left\lfloor n\alpha\right\rfloor }\sum_{i=\left\lfloor n\alpha\right\rfloor +1}^{n}Y_{i} \]
where
\[ Y_1 \leq Y_2 \leq \cdots \leq Y_n \]
are the sample order statistics. The estimation error as well as the confidence interval for CTE rely on the following asymptotic result:
\[\begin{equation} \sqrt{n}\left(CTE_{n:\alpha}-CTE_{\alpha}\right) \overset{d}{\rightarrow} N\left(0,\sigma_\alpha^2\right), \tag{1} \label{eqn:mainthm} \end{equation}\]
where \(\sigma_\alpha^2=\eta_\alpha^2+ \gamma_\alpha^2\) with \[ \eta_\alpha^2=\frac{Var\left(X|X>q_{\alpha}\right)}{1-\alpha}, \]
and
\[ \gamma_\alpha^2=\frac{\alpha}{1-\alpha}\left(\mathbb{E}\left(X|X>q_{\alpha}\right)-q_{\alpha}\right)^{2} \]
The lack of an accessible proof prevents a full insight into it for most actuaries. Our proof essentially fixes a straightforward paradoxical argument and thus lends insight into not only this pivotal result in actuarial practice, but more importantly fortifies the intuition of the actuaries into similar such asymptotic results.
In the following \(N_{n:x}\) denotes the number of \(X_i\)s larger than \(x\) and \(w(x)\) denotes \(\mathbb{E}\left(X|X>x\right)\). Note that the \(N_{n:x}\) observations larger than \(x\) form random samples from \(F\) left-truncated at \(x\), and their \(partially\) normalized mean is given by
\[ \sqrt{N_{n:x}}\left(\frac{1}{N_{n:x}}\underset{i=1}{\overset{n}{\Sigma}}X_{i}I_{X_i>x}- w(x)\right). \tag{2} \label{paradox} \]
Now since \(N_{n:x}\) approaches infinity with \(n\), the ordinary CLT implies that the expression in \(\eqref{paradox}\) converges in limit to a \(N\left(0,Var\left(X|X>x\right)\right)\). Moreover, since \(Y_{\lfloor n\alpha\rfloor}\) is consistent for \(q_\alpha\), and \[ \frac{1}{N_{n,Y_{\lfloor n\alpha\rfloor}}}\underset{i=1}{\overset{n}{\Sigma}}X_{i}I_{X_i>Y_{\left\lfloor n\alpha\right\rfloor}}=CTE_{n,\alpha}, \] it is tempting to argue, albeit erroneously, that \[ \sqrt{n}\left(CTE_{n:\alpha}-CTE_{\alpha}\right) \overset{d}{\rightarrow} N\left(0,\eta_\alpha^2\right). \] So what is wrong in the above argument, and how can we fix it to yield the correct conclusion \(\eqref{eqn:mainthm}\) - we do this below.
In the above, we argued that the expression in \(\eqref{paradox}\) has an asymptotic normal limit for any \(x\) (in the support of \(F\)). This convergence is uniform under mild conditions - Berry-Esseen Theorem. In other words, we can replace \(x\) in \(\eqref{paradox}\) by \(x_n\), with \(\{x_n\}_{n\geq1}\) satisfying \(\lim x_n=x\), and yet maintain the same normal limit.
Towards replacing \(x\) by \(Y_{\lfloor n\alpha\rfloor}\), we note that \(N_{n,Y_{\lfloor n\alpha\rfloor}}=n-\lfloor n \alpha \rfloor\) and these \(N_{n,Y_{\lfloor n\alpha\rfloor}}\) observations conditioned on \(Y_{\lfloor n\alpha\rfloor}\) form a random sample from \(F\) left-truncated at \(Y_{\lfloor n\alpha\rfloor}\). Combining these observations with the uniform convergence results in
\[ \sqrt{n}\left(CTE_{n:\alpha}-w(Y_{\lfloor n\alpha\rfloor})\right) \overset{d}{\rightarrow} N\left(0,\eta_\alpha^2\right), \tag{3} \label{BEeqn} \]
Figure 1 demonstrates the uniformity in the convergence alluded to above via simulation.
R code for Figure 1
library(ggplot2)
library(RColorBrewer)
library(shinyWidgets)
library(shiny)
### Figure1
my_color <- brewer.pal(5, "Accent")
# Prepare empty dataframe for plotting
dataset1 <- data.frame(z_nx = double(), sample_size = integer())
# Making data set
for (n in c(100, 200, 500, 1000, 5000)){
x <- matrix(rexp(n * 3000), nrow= 3000);
x <- apply(x,1,sort);
f <- function(k, q_a){ k[which(k > q_a)] }
for (alpha in c(0.94, 0.95, 0.96)){
if (alpha == 0.95){
a <- colMeans(x[floor(n * 0.95):n, ])
dataset1 <- rbind.data.frame(dataset1,
data.frame(z_nx = a,
k = length(c(floor(n * 0.95):n)),
alpha_value = "sample q_0.95",
exp_q = x[floor(n * 0.95), ] + 1,
sample_size = n ))
} else {
result <- apply(x, 2, f, q_a = qexp(alpha))
k <- unlist(lapply(result,length))
a <- unlist(lapply(result, mean))
a <- ifelse(is.nan(a), 0, a)
dataset1 <- rbind.data.frame(dataset1,
data.frame(z_nx = a,
k = k,
alpha_value = paste("q_", alpha),
exp_q = qexp(alpha) + 1,
sample_size = n ))
}
}
}
ui <- pageWithSidebar(
headerPanel('Uniformity a la Berry-Esseen Theorem'),
sidebarPanel(
sliderTextInput(inputId = "sampleSize1",
label = "Sample size n:",
animate = TRUE,
grid = TRUE,
choices = c(100, 200, 500, 1000, 5000))
),
mainPanel(
plotOutput('plot1')
)
)
server <- function(input, output, session) {
dataset1_f <- reactive({
dataset1[dataset1$sample_size == input$sampleSize1,]
})
output$plot1 <- renderPlot({
p <- ggplot(dataset1_f()) +
geom_density(mapping = aes(x = sqrt(k) *(z_nx - exp_q),
color = factor(alpha_value)),
size = 1) +
# Add pdf of the standard Normal density
stat_function(fun = dnorm,
args = list(mean = 0, sd = 1),
size = 0.7,
color = "black") +
geom_hline(yintercept = 0, size = 1) +
theme(legend.position="bottom", legend.box = "horizontal",
legend.text = element_text(size = 14)) +
xlim(-3, 3) +
ylim(0, 0.5) +
labs(
y = "Estimated density",
x = "" ,
color = "Threshold values: ")+
scale_color_manual(values=my_color[c(5,1,3)])+
scale_colour_manual(labels = expression(q[0.94],
paste(hat(q)[0.94]),q[0.96]),
values = my_color[c(5,1,3)])
print(p)
}, height=300)
}
shinyApp(ui, server)
Towards establishing \(\eqref{eqn:mainthm}\) from \(\eqref{BEeqn}\) we note that
\[ \begin{aligned} \sqrt{n}\left(\right.&\left.CTE_{n:\alpha}-CTE_{\alpha}\right) \\ &=\sqrt{n}\left(CTE_{n:\alpha}-w(Y_{\lfloor n\alpha\rfloor})\right)\\ &\phantom{=}+ \sqrt{n} \left(w(Y_{\lfloor n\alpha\rfloor }) - \mathbb{E}\left(X|X>q_{\alpha}\right)\right) \end{aligned} \tag{4} \label{decom} \]
The two terms on the right in \(\eqref{decom}\) are asymptotically independent. The reason for this being that the first term is asymptotically independent of \(Y_{\lfloor n\alpha\rfloor}\) as it only depends on its almost sure limit \(q_\alpha\). This is illustrated in Figure 2.
R code for Figure 2
library(ggplot2)
library(RColorBrewer)
library(shinyWidgets)
library(shiny)
### Figure2
my_color <- brewer.pal(5, "Accent")
# Prepare empty dataframe for plotting
dataset2 <- data.frame(z_nx = double(), sample_size = integer())
# Making data set
for (n in c(100, 200, 500, 1000, 5000)){
x <- matrix(rexp(n * 3000), nrow= 3000);
x <- apply(x,1,sort);
f_term <- sqrt(n) * (colMeans(x[floor(n * 0.95):n, ]) - (x[floor(n * 0.95), ] + 1))
s_term <- sqrt(n) * ((x[floor(n * 0.95), ] + 1) - (qexp(0.95) + 1))
dataset2 <- rbind.data.frame(dataset2,
data.frame(first_t = f_term * sqrt(0.05),
second_t = s_term * sqrt((0.05 / 0.95)),
sample_size = n) )
}
ui <- pageWithSidebar(
headerPanel('Asymptotic Indep. of the Two terms'),
sidebarPanel(
sliderTextInput(inputId = "sampleSize2",
label = "Sample size n:",
animate = TRUE,
grid = TRUE,
choices = c(100, 200, 500, 1000, 5000))
),
mainPanel(
plotOutput('plot2')
)
)
server <- function(input, output, session) {
dataset2_f <- reactive({
dataset2[dataset2$sample_size == input$sampleSize2,]
})
output$plot2 <- renderPlot({
q <- ggplot(data = dataset2_f(),
aes(x = first_t, y = second_t)) +
theme_light() +
# Add estimated density
geom_point(fill = "lightgray",
size = 0.5,
alpha = 0.2) +
stat_density_2d(aes(fill = ..level..), geom = "polygon",
colour="white",
alpha = 0.7) +
scale_fill_distiller(palette= "Spectral", direction=-1) +
xlim(-4, 4) +
ylim(-4, 4) +
labs(y = "The second term",
x = "The first term")
theme(strip.text = element_text(size = 15, color = "black"))+
theme(legend.position="bottom")
print(q)
}, height=300)
}
shinyApp(ui, server)
Hence all that remains to be derived is the asymptotic distribution of the second term in \(\eqref{decom}\). That this limiting distribution is non-degenerate, in other words this term cannot be ignored, resolves the paradox.
The asymptotic distribution of the second term derives from that of \(Y_{\lfloor n\alpha \rfloor}\);
\[ \sqrt{n}\left(Y_{\left\lfloor n\alpha\right\rfloor }-q_{\alpha}\right)\overset{d}{\rightarrow}N\left(0,\frac{\alpha\left(1-\alpha\right)}{f^{2}\left(q_{\alpha}\right)}\right) \]
where \(f\) is a density function of \(X\). This is so as this term is a smooth function of \(Y_{\lfloor n\alpha\rfloor}\); this function \(w(\cdot)\) satisfies
\[ \omega\left(x\right)=x+\frac{\int_{x}^{\infty}S\left(z\right)dz}{S\left(x\right)}, \]
where \(S\) is the survival function of \(X\). By the ordinary delta method 4, we now have \[ \sqrt{n} \left(w(Y_{\lfloor n\alpha\rfloor}) - \mathbb{E}\left(X|X>q_{\alpha}\right)\right) \overset{d}{\rightarrow} N\left(0,\gamma_\alpha^2\right) \]
The independence of the two terms in \(\eqref{decom}\)and their asymptotic normality together yield \(\eqref{eqn:mainthm}\). For the traditional approach in our setting see 5, and in the setting of importance sampling see 6.
R code for Figure 3
library(ggplot2)
library(RColorBrewer)
library(shinyWidgets)
library(shiny)
library(markdown)
library(dplyr)
library(tidyr)
### Figure 3
my_color <- brewer.pal(5, "Accent")
# Prepare empty dataframe for plotting
dataset3 <- data.frame(z_nx = double(), sample_size = integer())
# Making data set
# alpha_value <- 0.95
n <- 1000
x <- matrix(rexp(n * 10000), nrow= 10000);
x <- apply(x,1,sort);
f_term <- sqrt(n) * (colMeans(x[floor(n * 0.95):n, ]) - (x[floor(n * 0.95), ] + 1))
s_term <- sqrt(n) * ((x[floor(n * 0.95), ] + 1) - (qexp(0.95) + 1))
dataset3 <- rbind.data.frame(dataset3,
data.frame(first_t = f_term,
second_t = s_term,
dist = "Exp. dist.",
sample_size = n) )
qpareto <- function(p, theta = 2.687376 , alpha = 4){
theta * ((1-p)^(-1/alpha) - 1)
}
cte_pareto <- function(x, theta = 2.687376 , alpha = 4){
x + (x + theta) / (alpha - 1)
}
x <- matrix(qpareto(runif(n * 10000)), nrow= 10000)
x <- apply(x,1,sort)
f_term <- sqrt(n) * (colMeans(x[floor(n * 0.95):n, ]) -
cte_pareto(x[floor(n * 0.95), ]))
s_term <- sqrt(n) * (cte_pareto(x[floor(n * 0.95), ]) -
cte_pareto(qpareto(0.95)))
dataset3 <- rbind.data.frame(dataset3,
data.frame(first_t = f_term,
second_t = s_term,
dist = "Pareto dist.",
sample_size = n) )
dataset_orig <- dataset3 %>% mutate(total = first_t + second_t)
dataset_orig <- select(dataset_orig, first_t, second_t, total, dist)
dataset3 <- dataset_orig %>% gather('first_t', 'second_t', 'total',
key = "terms", value = "values")
ui <- pageWithSidebar(
headerPanel('Tale of the Tails: Impact on the Terms'),
sidebarPanel(
# Input: Select the random distribution type ----
radioButtons("dist", "Distribution type:",
c("Exponential (Light tail)" = "Exp. dist.",
"Pareto (Heavy tail)" = "Pareto dist."))
),
mainPanel(
plotOutput('plot3')
)
)
server <- function(input, output, session) {
dataset3_f <- reactive({
dataset3[dataset3$dist == input$dist,]
})
dat_text1 <- reactive({
data.frame(label = c(paste("Variance \n 1st: ",
round(var(dataset_orig$first_t[dataset_orig$dist == input$dist]),2),
"\n 2nd: ",
round(var(dataset_orig$second_t[dataset_orig$dist == input$dist]),2),
"\n Conv.: ",
round(var(dataset_orig$total[dataset_orig$dist == input$dist]),2)))
)
})
output$plot3 <- renderPlot({
# Plotting
r <- ggplot(data = dataset3_f()) +
theme_light() +
# Add estimated density
geom_density(mapping = aes(x = values, color = terms),
size = 1) +
geom_hline(yintercept = 0, size = 1) +
# Draw plots based on the sample_size variable
scale_color_manual(values=my_color[c(1,3,5)])+
theme(legend.position="bottom", legend.box = "horizontal",
legend.text = element_text(size = 14)) +
scale_colour_manual(labels = expression(paste(1^"st", " term"),
paste(2^"nd", " term"),
"Conv."),
values = my_color[c(1,3,5)]) +
labs(
y = "Estimated density",
x = "" ,
color = "")+
xlim(-35, 35) +
ylim(0, 0.1) +
theme(strip.text = element_text(size = 19, color = "black")) +
geom_text(
data = dat_text1(),
mapping = aes(x = 24, y = 0.08, label = label)
)
print(r)
}, height=300)
}
shinyApp(ui, server)
While we appeal to Berry-Esseen Theorem, which is’nt covered at the MS level, it’s statement as a guarantee of uniform convergence to normality is easily understood.
It’s use does require the third moment; while we believe that the line of argument can be executed under finite variance, it is besides the central focus our efforts.
This approach was successful with a cohort of seniors and MS students last Spring, providing motivation for this poster.
John Manistre, B., & Hancock, G. H. (2005). Variance of the CTE estimator. North American Actuarial Journal, 9(2), 129-156.↩︎
Ahn, J. Y., & Shyamalkumar, N. D. (2011). Large sample behavior of the CTE and VaR estimators under importance sampling. North American Actuarial Journal, 15(3), 393-416.↩︎
John Manistre, B., & Hancock, G. H. (2005). Variance of the CTE estimator. North American Actuarial Journal, 9(2), 129-156.↩︎
Klugman, S. A., Panjer, H. H., & Willmot, G. E. (2012). Loss models: from data to decisions (Vol. 715). John Wiley & Sons.↩︎
John Manistre, B., & Hancock, G. H. (2005). Variance of the CTE estimator. North American Actuarial Journal, 9(2), 129-156.↩︎
Ahn, J. Y., & Shyamalkumar, N. D. (2011). Large sample behavior of the CTE and VaR estimators under importance sampling. North American Actuarial Journal, 15(3), 393-416.↩︎