Asymptotic Normality for the Sample CTE Revisited

Publication
Actuarial Research Conference 2018

Insurance regulation relies on the conditional tail expectation(CTE) of a loss random variable for specifying required capital as well as for valuation of liabilities. Hence, understanding statistical inference of the CTE measure is an important aspect of actuarial education.

  • A key property of the non-parametric estimator of the CTE is its asymptotic normality, see Manistre and Hancock (2005)1, Ahn and Shyamalkumar (2011)2.

  • Traditional proofs of this result relies on various forms of functional delta method3, which makes the proof outside the scope of even masters level actuarial education.

  • In this poster, we provide an intuitive proof of this result, which by relying instead on the ordinary delta method makes it accessible to most actuaries and actuarial students.

Introduction

For a given \(nice\) distribution \(F\), the \(\alpha\)-th level percentile (VaR), \(q_\alpha\), is defined by

\[ Pr\left(X>q_{\alpha}\right)=1-\alpha, \]

where \(X \sim F\); the \(\alpha\)-th level CTE is defined by

\[ CTE_{\alpha}=\mathbb{E}\left[X|X>q_{\alpha}\right]. \]

In practice, since \(F\) is unknown, an estimate of the CTE is required for regulatory and risk management purposes. Typically, a random sample \(X_1, ..., X_n\) from \(F\) is available for this purpose. The commonly used estimator of the CTE is its empirical counterpart,

\[ CTE_{n:\alpha}=\frac{1}{n-\left\lfloor n\alpha\right\rfloor }\sum_{i=\left\lfloor n\alpha\right\rfloor +1}^{n}Y_{i} \]

where

\[ Y_1 \leq Y_2 \leq \cdots \leq Y_n \]

are the sample order statistics. The estimation error as well as the confidence interval for CTE rely on the following asymptotic result:

\[\begin{equation} \sqrt{n}\left(CTE_{n:\alpha}-CTE_{\alpha}\right) \overset{d}{\rightarrow} N\left(0,\sigma_\alpha^2\right), \tag{1} \label{eqn:mainthm} \end{equation}\]

where \(\sigma_\alpha^2=\eta_\alpha^2+ \gamma_\alpha^2\) with \[ \eta_\alpha^2=\frac{Var\left(X|X>q_{\alpha}\right)}{1-\alpha}, \]

and

\[ \gamma_\alpha^2=\frac{\alpha}{1-\alpha}\left(\mathbb{E}\left(X|X>q_{\alpha}\right)-q_{\alpha}\right)^{2} \]

The lack of an accessible proof prevents a full insight into it for most actuaries. Our proof essentially fixes a straightforward paradoxical argument and thus lends insight into not only this pivotal result in actuarial practice, but more importantly fortifies the intuition of the actuaries into similar such asymptotic results.

A Paradox

In the following \(N_{n:x}\) denotes the number of \(X_i\)s larger than \(x\) and \(w(x)\) denotes \(\mathbb{E}\left(X|X>x\right)\). Note that the \(N_{n:x}\) observations larger than \(x\) form random samples from \(F\) left-truncated at \(x\), and their \(partially\) normalized mean is given by

\[ \sqrt{N_{n:x}}\left(\frac{1}{N_{n:x}}\underset{i=1}{\overset{n}{\Sigma}}X_{i}I_{X_i>x}- w(x)\right). \tag{2} \label{paradox} \]

Now since \(N_{n:x}\) approaches infinity with \(n\), the ordinary CLT implies that the expression in \(\eqref{paradox}\) converges in limit to a \(N\left(0,Var\left(X|X>x\right)\right)\). Moreover, since \(Y_{\lfloor n\alpha\rfloor}\) is consistent for \(q_\alpha\), and \[ \frac{1}{N_{n,Y_{\lfloor n\alpha\rfloor}}}\underset{i=1}{\overset{n}{\Sigma}}X_{i}I_{X_i>Y_{\left\lfloor n\alpha\right\rfloor}}=CTE_{n,\alpha}, \] it is tempting to argue, albeit erroneously, that \[ \sqrt{n}\left(CTE_{n:\alpha}-CTE_{\alpha}\right) \overset{d}{\rightarrow} N\left(0,\eta_\alpha^2\right). \] So what is wrong in the above argument, and how can we fix it to yield the correct conclusion \(\eqref{eqn:mainthm}\) - we do this below.

Replacing \(x\) by \(Y_{\lfloor n\alpha\rfloor}\) is a Red-Herring .

In the above, we argued that the expression in \(\eqref{paradox}\) has an asymptotic normal limit for any \(x\) (in the support of \(F\)). This convergence is uniform under mild conditions - Berry-Esseen Theorem. In other words, we can replace \(x\) in \(\eqref{paradox}\) by \(x_n\), with \(\{x_n\}_{n\geq1}\) satisfying \(\lim x_n=x\), and yet maintain the same normal limit.

Towards replacing \(x\) by \(Y_{\lfloor n\alpha\rfloor}\), we note that \(N_{n,Y_{\lfloor n\alpha\rfloor}}=n-\lfloor n \alpha \rfloor\) and these \(N_{n,Y_{\lfloor n\alpha\rfloor}}\) observations conditioned on \(Y_{\lfloor n\alpha\rfloor}\) form a random sample from \(F\) left-truncated at \(Y_{\lfloor n\alpha\rfloor}\). Combining these observations with the uniform convergence results in

\[ \sqrt{n}\left(CTE_{n:\alpha}-w(Y_{\lfloor n\alpha\rfloor})\right) \overset{d}{\rightarrow} N\left(0,\eta_\alpha^2\right), \tag{3} \label{BEeqn} \]

Figure 1 demonstrates the uniformity in the convergence alluded to above via simulation.

Figure 1

R code for Figure 1

library(ggplot2)
library(RColorBrewer)
library(shinyWidgets)
library(shiny)

### Figure1

my_color <- brewer.pal(5, "Accent")

# Prepare empty dataframe for plotting
dataset1 <- data.frame(z_nx = double(), sample_size = integer())
# Making data set

for (n in c(100, 200, 500, 1000, 5000)){
  x <- matrix(rexp(n * 3000), nrow= 3000);
  x <- apply(x,1,sort);
  
  f <- function(k, q_a){ k[which(k > q_a)] }
  
  for (alpha in c(0.94, 0.95, 0.96)){
    if (alpha == 0.95){
      a <- colMeans(x[floor(n * 0.95):n, ])
      
      dataset1 <- rbind.data.frame(dataset1,
                                  data.frame(z_nx = a,
                                             k = length(c(floor(n * 0.95):n)),
                                             alpha_value = "sample q_0.95",
                                             exp_q = x[floor(n * 0.95), ] + 1,
                                             sample_size = n ))
      
    } else {
      result <- apply(x, 2, f, q_a = qexp(alpha))
      k <- unlist(lapply(result,length))
      a <- unlist(lapply(result, mean))
      a <- ifelse(is.nan(a), 0, a)
      dataset1 <- rbind.data.frame(dataset1,
                                  data.frame(z_nx = a,
                                             k = k,
                                             alpha_value = paste("q_", alpha),
                                             exp_q = qexp(alpha) + 1,
                                             sample_size = n ))
    }
    
  }
}



ui <- pageWithSidebar(
  headerPanel('Uniformity a la Berry-Esseen Theorem'),
  sidebarPanel(
    sliderTextInput(inputId = "sampleSize1", 
                    label = "Sample size n:",
                    animate = TRUE,
                    grid = TRUE,
                    choices = c(100, 200, 500, 1000, 5000))
    
  ),
  mainPanel(
    plotOutput('plot1')
  )
)


server <- function(input, output, session) {
  
  dataset1_f <- reactive({
    dataset1[dataset1$sample_size == input$sampleSize1,]
  })
  
  output$plot1 <- renderPlot({
    
    p <- ggplot(dataset1_f()) +
      geom_density(mapping = aes(x =  sqrt(k) *(z_nx - exp_q),
                                 color = factor(alpha_value)),
                   size = 1) +
      # Add pdf of the standard Normal density
      stat_function(fun = dnorm,
                    args = list(mean = 0, sd = 1),
                    size = 0.7,
                    color = "black") +
      geom_hline(yintercept = 0, size = 1) +
      theme(legend.position="bottom", legend.box = "horizontal",
            legend.text = element_text(size = 14)) +
      xlim(-3, 3) +
      ylim(0, 0.5) +
      labs(
        y = "Estimated density",
        x = "" ,
        color = "Threshold values:  ")+
      scale_color_manual(values=my_color[c(5,1,3)])+
      scale_colour_manual(labels = expression(q[0.94],
                                              paste(hat(q)[0.94]),q[0.96]),
                          values = my_color[c(5,1,3)])

    print(p)
    
    
  }, height=300)
  
}

shinyApp(ui, server)

Resolving the Paradox

Towards establishing \(\eqref{eqn:mainthm}\) from \(\eqref{BEeqn}\) we note that

\[ \begin{aligned} \sqrt{n}\left(\right.&\left.CTE_{n:\alpha}-CTE_{\alpha}\right) \\ &=\sqrt{n}\left(CTE_{n:\alpha}-w(Y_{\lfloor n\alpha\rfloor})\right)\\ &\phantom{=}+ \sqrt{n} \left(w(Y_{\lfloor n\alpha\rfloor }) - \mathbb{E}\left(X|X>q_{\alpha}\right)\right) \end{aligned} \tag{4} \label{decom} \]

The two terms on the right in \(\eqref{decom}\) are asymptotically independent. The reason for this being that the first term is asymptotically independent of \(Y_{\lfloor n\alpha\rfloor}\) as it only depends on its almost sure limit \(q_\alpha\). This is illustrated in Figure 2.

Figure 2

R code for Figure 2

library(ggplot2)
library(RColorBrewer)
library(shinyWidgets)
library(shiny)


### Figure2

my_color <- brewer.pal(5, "Accent")

# Prepare empty dataframe for plotting
dataset2 <- data.frame(z_nx = double(), sample_size = integer())

# Making data set
for (n in c(100, 200, 500, 1000, 5000)){
  x <- matrix(rexp(n * 3000), nrow= 3000);
  x <- apply(x,1,sort);
  
  f_term <-  sqrt(n) * (colMeans(x[floor(n * 0.95):n, ]) - (x[floor(n * 0.95), ] + 1))
  s_term <- sqrt(n) * ((x[floor(n * 0.95), ] + 1) - (qexp(0.95) + 1))
  
  dataset2 <- rbind.data.frame(dataset2,
                              data.frame(first_t = f_term * sqrt(0.05),
                                         second_t = s_term * sqrt((0.05 / 0.95)),
                                         sample_size = n) )
}

ui <- pageWithSidebar(
  headerPanel('Asymptotic Indep. of the Two terms'),
  sidebarPanel(
    sliderTextInput(inputId = "sampleSize2", 
                    label = "Sample size n:",
                    animate = TRUE,
                    grid = TRUE,
                    choices = c(100, 200, 500, 1000, 5000))
    
  ),
  mainPanel(
    plotOutput('plot2')
  )
)

server <- function(input, output, session) {
  
  dataset2_f <- reactive({
    dataset2[dataset2$sample_size == input$sampleSize2,]
  })
  
  output$plot2 <- renderPlot({
    q <- ggplot(data = dataset2_f(),
           aes(x =  first_t, y = second_t)) +
      theme_light() +
      # Add estimated density
      geom_point(fill = "lightgray",
                 size = 0.5,
                 alpha = 0.2) +
      stat_density_2d(aes(fill = ..level..), geom = "polygon",
                      colour="white",
                      alpha = 0.7) +
      scale_fill_distiller(palette= "Spectral", direction=-1) +
      xlim(-4, 4) +
      ylim(-4, 4) +
      labs(y = "The second term",
           x = "The first term")
      theme(strip.text = element_text(size = 15, color = "black"))+
      theme(legend.position="bottom")
      
      print(q)
  }, height=300)

}

shinyApp(ui, server)

Hence all that remains to be derived is the asymptotic distribution of the second term in \(\eqref{decom}\). That this limiting distribution is non-degenerate, in other words this term cannot be ignored, resolves the paradox.

Ordinary Delta Method

The asymptotic distribution of the second term derives from that of \(Y_{\lfloor n\alpha \rfloor}\);

\[ \sqrt{n}\left(Y_{\left\lfloor n\alpha\right\rfloor }-q_{\alpha}\right)\overset{d}{\rightarrow}N\left(0,\frac{\alpha\left(1-\alpha\right)}{f^{2}\left(q_{\alpha}\right)}\right) \]

where \(f\) is a density function of \(X\). This is so as this term is a smooth function of \(Y_{\lfloor n\alpha\rfloor}\); this function \(w(\cdot)\) satisfies

\[ \omega\left(x\right)=x+\frac{\int_{x}^{\infty}S\left(z\right)dz}{S\left(x\right)}, \]

where \(S\) is the survival function of \(X\). By the ordinary delta method 4, we now have \[ \sqrt{n} \left(w(Y_{\lfloor n\alpha\rfloor}) - \mathbb{E}\left(X|X>q_{\alpha}\right)\right) \overset{d}{\rightarrow} N\left(0,\gamma_\alpha^2\right) \]

Asymptotic dist. of \(CTE_{n:\alpha}\)

The independence of the two terms in \(\eqref{decom}\)and their asymptotic normality together yield \(\eqref{eqn:mainthm}\). For the traditional approach in our setting see 5, and in the setting of importance sampling see 6.

Figure 3

R code for Figure 3

library(ggplot2)
library(RColorBrewer)
library(shinyWidgets)
library(shiny)
library(markdown)
library(dplyr)
library(tidyr)


### Figure 3

my_color <- brewer.pal(5, "Accent")

# Prepare empty dataframe for plotting
dataset3 <- data.frame(z_nx = double(), sample_size = integer())

# Making data set
# alpha_value <- 0.95
n <- 1000
x <- matrix(rexp(n * 10000), nrow= 10000);
x <- apply(x,1,sort);

f_term <- sqrt(n) * (colMeans(x[floor(n * 0.95):n, ]) - (x[floor(n * 0.95), ] + 1))
s_term <- sqrt(n) * ((x[floor(n * 0.95), ] + 1) - (qexp(0.95) + 1))

dataset3 <- rbind.data.frame(dataset3,
                            data.frame(first_t = f_term,
                                       second_t = s_term,
                                       dist = "Exp. dist.",
                                       sample_size = n) )

qpareto <- function(p, theta = 2.687376 , alpha = 4){
  theta * ((1-p)^(-1/alpha) - 1)
}

cte_pareto <- function(x, theta = 2.687376 , alpha = 4){
  x + (x + theta) / (alpha - 1)
}

x <- matrix(qpareto(runif(n * 10000)), nrow= 10000)
x <- apply(x,1,sort)

f_term <- sqrt(n) * (colMeans(x[floor(n * 0.95):n, ]) -
                       cte_pareto(x[floor(n * 0.95), ]))
s_term <- sqrt(n) * (cte_pareto(x[floor(n * 0.95), ]) - 
                       cte_pareto(qpareto(0.95)))

dataset3 <- rbind.data.frame(dataset3,
                            data.frame(first_t = f_term,
                                       second_t = s_term,
                                       dist = "Pareto dist.",
                                       sample_size = n) )


dataset_orig <- dataset3 %>% mutate(total = first_t + second_t)
dataset_orig <- select(dataset_orig, first_t, second_t, total, dist)
dataset3 <- dataset_orig %>% gather('first_t', 'second_t', 'total',
                                    key = "terms", value = "values")

ui <- pageWithSidebar(
  headerPanel('Tale of the Tails: Impact on the Terms'),
  sidebarPanel(
    # Input: Select the random distribution type ----
    radioButtons("dist", "Distribution type:",
                 c("Exponential (Light tail)" = "Exp. dist.",
                   "Pareto (Heavy tail)" = "Pareto dist."))
  ),
  mainPanel(
    plotOutput('plot3')
  )
)

server <- function(input, output, session) {

  dataset3_f <- reactive({
    dataset3[dataset3$dist == input$dist,]
  })

  dat_text1 <- reactive({
    data.frame(label = c(paste("Variance \n 1st: ", 
                    round(var(dataset_orig$first_t[dataset_orig$dist == input$dist]),2),
                    "\n  2nd: ",
                    round(var(dataset_orig$second_t[dataset_orig$dist == input$dist]),2),
                    "\n  Conv.: ",
                    round(var(dataset_orig$total[dataset_orig$dist == input$dist]),2)))
    )
  })
  
  output$plot3 <- renderPlot({
    
    # Plotting
    r <- ggplot(data = dataset3_f()) +
      theme_light() +
      # Add estimated density
      geom_density(mapping = aes(x = values, color = terms),
                   size = 1) +
      geom_hline(yintercept = 0, size = 1) +
      # Draw plots based on the sample_size variable
      scale_color_manual(values=my_color[c(1,3,5)])+
      theme(legend.position="bottom", legend.box = "horizontal",
            legend.text = element_text(size = 14)) +
      scale_colour_manual(labels = expression(paste(1^"st", " term"),
                                              paste(2^"nd", " term"),
                                              "Conv."),
                          values = my_color[c(1,3,5)]) +
      labs(
        y = "Estimated density",
        x = "" ,
        color = "")+
      xlim(-35, 35) +
      ylim(0, 0.1) +
      theme(strip.text = element_text(size = 19, color = "black")) +
      geom_text(
        data    = dat_text1(),
        mapping = aes(x = 24, y = 0.08, label = label)
      )
    print(r)
  }, height=300)
}

shinyApp(ui, server)

Epilogue

  • While we appeal to Berry-Esseen Theorem, which is’nt covered at the MS level, it’s statement as a guarantee of uniform convergence to normality is easily understood.

  • It’s use does require the third moment; while we believe that the line of argument can be executed under finite variance, it is besides the central focus our efforts.

  • This approach was successful with a cohort of seniors and MS students last Spring, providing motivation for this poster.


  1. John Manistre, B., & Hancock, G. H. (2005). Variance of the CTE estimator. North American Actuarial Journal, 9(2), 129-156.

  2. Ahn, J. Y., & Shyamalkumar, N. D. (2011). Large sample behavior of the CTE and VaR estimators under importance sampling. North American Actuarial Journal, 15(3), 393-416.

  3. John Manistre, B., & Hancock, G. H. (2005). Variance of the CTE estimator. North American Actuarial Journal, 9(2), 129-156.

  4. Klugman, S. A., Panjer, H. H., & Willmot, G. E. (2012). Loss models: from data to decisions (Vol. 715). John Wiley & Sons.

  5. John Manistre, B., & Hancock, G. H. (2005). Variance of the CTE estimator. North American Actuarial Journal, 9(2), 129-156.

  6. Ahn, J. Y., & Shyamalkumar, N. D. (2011). Large sample behavior of the CTE and VaR estimators under importance sampling. North American Actuarial Journal, 15(3), 393-416.

Avatar
Issac Lee
PhD candidate

I believe anyone can learn anything with a proper education process.