9  Trapezoidal Integration

The examples in Chapter 9 require that the search path contains the following namespaces,

library(groupedHyperframe)

The chained trapezoidal rule of numerically approximating a definite integral is an ancient idea since the dawn of human civilization.

Let \(\{x_k; k=0,\cdots,N\}\) be a partition of the real interval \([a,b]\) such that \(a = x_0 < x_1 < \cdots < x_{N-1} < x_N = b\) and \(\Delta x_k= x_k - x_{k-1}\) be the length of the \(k\)-th sub-interval, then the chained-trapezoidal approximation of the integration of the function \(y=f(x)\) is,

\[ \begin{split} \int_a^b f(x)\,dx & \approx \sum_{k=1}^{N}{\dfrac{y_{k-1}+y_k}{2}}\Delta x_{k} \\ & = \left(\frac{y_0}{2}+\sum _{k=1}^{N-1}y_k+\frac{y_N}{2}\right)\Delta x, \quad\iff\ \forall k, \Delta x_{k}\equiv \Delta x \end{split} \tag{9.1}\]

Consider

Data: a toy example x \(=(x_0,x_1,x_2,x_3,x_4,x_5)^t\) and y \(=\{y_k = f(x_k);k=0,1,\cdots,5\}\)
x. = seq.int(from = 10L, to = 20L, by = 2L)
set.seed(23); (x = rnorm(n = length(x.), mean = x., sd = .5))
# [1] 10.09661 11.78266 14.45663 16.89669 18.49830 20.55375
set.seed(12); (y = rnorm(n = length(x.), mean = 1, sd = .2))
# [1] 0.7038865 1.3154339 0.8086511 0.8159990 0.6004716 0.9455408

Note that the authors intentionally make the toy example data x not equidistant, as neither the chained trapezoidal rule framework nor the functions pracma::trapz() and pracma::cumtrapz() (Borchers 2023, v2.4.4) require the equidistant assumption.

Function pracma::trapz() calculates a trapezoidal integration (Equation 9.1), visualized in Figure 9.1(A).

Review: function pracma::trapz() (Borchers 2023)
pracma::trapz(x, y)
# [1] 9.247523

In R nomenclature, the term “cumulative” indicates an inclusive scan, that

The ith element of the output is the operation result of the first-i-elements of the input

Function pracma::cumtrapz() calculates a cumulative trapezoidal integration. Note that the first element of the output is always 0, as a trapezoidal integration needs at least \(\left(x_0, y_0\right)\) and \(\left(x_1, y_1\right)\). The first 0-value is often removed for downstream analysis (Section 14.3).

Review: function pracma::cumtrapz() (Borchers 2023)
pracma::cumtrapz(x, y)
#          [,1]
# [1,] 0.000000
# [2,] 1.702340
# [3,] 4.542215
# [4,] 6.524337
# [5,] 7.658653
# [6,] 9.247523

9.1 Average Vertical Height

The average vertical height of a trapezoidal integration is the chained-trapezoid (Equation 9.1) divided by the length of \(x\)-domain \(b-a\),

\[ \begin{split} (b-a)^{-1}\displaystyle\int_a^b f(x)\,dx & \approx (b-a)^{-1} \sum_{k=1}^{N}{\dfrac{y_{k-1}+y_k}{2}}\Delta x_{k} \\ & = N^{-1}\left(\frac{y_0}{2}+\sum _{k=1}^{N-1}y_k+\frac{y_N}{2}\right), \quad\iff\ \forall k, \Delta x_{k}\equiv \Delta x \end{split} \tag{9.2}\]

The S3 generic function vtrapz() calculates the average vertical height of a trapezoidal integration (Equation 9.2). The (tentative) prefix v indicates “vertical”. The shaded rectangle in Figure 9.1(A) has the same area as the trapezoidal integration. Therefore, the authors define the vertical height of the shaded rectangle as the average vertical height of the trapezoidal integration.

Example: function vtrapz.default()
vtrapz(x, y)
# [1] 0.8843263

The S3 generic function cumvtrapz() calculates the cumulative average vertical height of a trapezoidal integration. The output has,

  • the 1st element of NaN, i.e., the result of 0 (trapzoidal integration) divided by 0 (length of \(x\)-domain). The first NaN-value is often removed for downstream analysis (Section 14.3);
  • the 2nd element of \((y_0+y_1)/2\);
  • the \((i+1)\)-th element of \(i^{-1}(y_0/2+\sum_{k=1}^{i-1}y_k+y_i/2)\), if \(\{x_k;k=0,\cdots,i\}\) are equidistant, i.e., \(\forall k\in\{1,\cdots,i\}\), \(\Delta x_{k}\equiv \Delta x\). Note that the R does not use zero-based numbering; i.e., R vector indices start at 1L instead of 0L;
  • the \((N+1)\)-th and last element of (Equation 9.2).
Example: function cumvtrapz.default()
cumvtrapz(x, y)
#           [,1]
# [1,]       NaN
# [2,] 1.0096602
# [3,] 1.0417859
# [4,] 0.9594490
# [5,] 0.9115603
# [6,] 0.8843263

Figure 9.1 visualizes the concept of (cumulative) average vertical height of a trapezoidal integration with

  1. the toy example of non-equidistant \((x_0,x_1,\cdots,x_5)^t\) and \((y_0,y_1,\cdots,y_5)^t\);
  2. the kernel stats::density(), using the elements $x and $y. Obviously, this practice is meaningless mathematically, as a density function has \(x\)-domain of \((-\infty, \infty)\), thus the average vertical height of the trapzoidal integration under a density curve should be zero. This practice is meaningful to an empirical density though, if we use the range of the actual observations as the \(x\)-domain.
  3. the empirical cumulative distribution function stats::ecdf(), using the objects x and y in the enclosing environment. Obviously, this practice is meaningless mathematically, as a distribution function has \(x\)-domain of \((-\infty, \infty)\), thus the average vertical height of the trapzoidal integration under a distribution curve should be zero. This practice is meaningful to an empirical distribution though, if we use the range of the actual observations as the \(x\)-domain.
  4. the conditional mean \(E(r)\) between the points and the marks (Schlather, Ribeiro, and Diggle 2003), from function spatstat.explore::Emark() (Section 13.1) with the \(r\)-vector of default length-513L (v3.5.3.3). The return of the pipeline <ppp.object> |> Emark() |> cumvtrapz.fv(),
    • indicates the cumulative average vertical height of the trapzoidal integration (cumvtrapz) of the conditional mean \(E(r)\);
    • is denoted by <numeric-mark>.E.cumvtrapz in the output of function summary_fv() (Section 3.3.1, Section 17.10);
    • was named “standardized area under the conditional mean curve (AUcMean)” (Chervoneva et al. 2021).
  5. the conditional variance \(V(r)\) between the points and the marks (Schlather, Ribeiro, and Diggle 2003), from function spatstat.explore::Vmark() (Section 13.1) with the \(r\)-vector of default length-513L (v3.5.3.3). The return of the pipeline <ppp.object> |> Vmark() |> cumvtrapz.fv(),
    • indicates the cumulative average vertical height of the trapzoidal integration (cumvtrapz) of the conditional variance \(V(r)\);
    • is denoted by <numeric-mark>.V.cumvtrapz in the output of function summary_fv() (Section 3.3.1, Section 17.10);
    • was named “standardized area under the conditional variance curve (AUcVar)” (Chervoneva et al. 2021).
  6. the receiver operating characteristic (ROC) curve of a ppp.object (Baddeley et al. 2025), from the S3 method dispatch spatstat.explore::roc.ppp() (Section 13.1). Note that the following two values are mathematically equivalent, although not numerically identical, because the \(x\)-domain is \([0,1]\) for an ROC curve,
    • the return of the pipeline <ppp.object> |> roc() |> vtrapz.fv(), i.e., the average vertical height of the trapezoidal integration under the ROC-curve; and
    • the return of the pipeline <ppp.object> |> auc(), i.e., the area-under-ROC-curve (AU-ROC) via the S3 method dispatch spatstat.explore::auc.ppp().
Figure: Panel (A), visualize vtrapz & cumvtrapz of a toy example
fig_vtrapz_1 = visualize_vtrapz(x, y) + 
  ggplot2::labs(title = '(A). a toy example', y = NULL)
Figure: Panel (B), visualize vtrapz & cumvtrapz of stats::density()
set.seed(12); fig_vtrapz_2 = rnorm(n = 1e3L) |> density() |> visualize_vtrapz() + 
  ggplot2::labs(title = '(B). stats::density()')
Figure: Panel (C), visualize vtrapz & cumvtrapz of stats::ecdf()
set.seed(27); fig_vtrapz_3 = rnorm(n = 1e3L) |> ecdf() |> visualize_vtrapz() + 
  ggplot2::labs(title = '(C). stats::ecdf()')
Figure: Panel (D), visualize cumvtrapz of spatstat.explore::Emark()
fig_vtrapz_4 = spatstat.data::spruces |>
  spatstat.explore::Emark() |>
  visualize_vtrapz(draw.rect = FALSE) +
  ggplot2::labs(title = '(D). spatstat.explore::Emark()')
Figure: Panel (E), visualize cumvtrapz of spatstat.explore::Vmark()
fig_vtrapz_5 = spatstat.data::spruces |>
  spatstat.explore::Vmark() |>
  visualize_vtrapz(draw.rect = FALSE) +
  ggplot2::labs(title = '(E). spatstat.explore::Vmark()')
Figure: Panel (F), visualize vtrapz & cumvtrapz of spatstat.explore::roc()
swedishpines_roc = spatstat.data::swedishpines |>
  spatstat.explore::roc.ppp(covariate = 'x')
swedishpines_roc |>
  vtrapz.fv()
# [1] 0.5407941
spatstat.data::swedishpines |> 
  spatstat.explore::auc.ppp(covariate = 'x')
# [1] 0.5407314
fig_vtrapz_6 = swedishpines_roc |> 
  visualize_vtrapz() + 
  ggplot2::labs(title = '(F). spatstat.explore::roc()')
Figure: Panels collaged, visualize vtrapz and cumvtrapz
(fig_vtrapz_1 + fig_vtrapz_2 + fig_vtrapz_3 + fig_vtrapz_4 + fig_vtrapz_5 + fig_vtrapz_6 +
  patchwork::plot_layout(ncol = 2L)) & 
  ggplot2::theme_minimal()
Figure 9.1: (Cumulative) average vertical height of trapezoidal integration