library(groupedHyperframe)11 Trapezoidal Integration
The examples in Chapter 11 require
The chained trapezoidal rule of numerically approximating a definite integral is an ancient idea since the dawn of human civilization.
Let \(\{x_k; k=0,\cdots,N\}\) be a partition of the real interval \([a,b]\) such that \(a = x_0 < x_1 < \cdots < x_{N-1} < x_N = b\) and \(\Delta x_k= x_k - x_{k-1}\) be the length of the \(k\)-th sub-interval, then the chained-trapezoidal approximation of the integration of the function \(y=f(x)\) is,
\[ \begin{split} \int_a^b f(x)\,dx & \approx \sum_{k=1}^{N}{\dfrac{y_{k-1}+y_k}{2}}\Delta x_{k} \\ & = \left(\frac{y_0}{2}+\sum _{k=1}^{N-1}y_k+\frac{y_N}{2}\right)\Delta x, \quad\iff\ \forall k, \Delta x_{k}\equiv \Delta x \end{split} \tag{11.1}\]
Consider a toy example x \(=(x_0,x_1,x_2,x_3,x_4,x_5)^t\) and y \(=\{y_k = f(x_k);k=0,1,\cdots,5\}\) (Listing 11.1),
x and y
x. = seq.int(from = 10L, to = 20L, by = 2L)
set.seed(23); x = rnorm(n = length(x.), mean = x., sd = .5)
set.seed(12); y = rnorm(n = length(x.), mean = 1, sd = .3)
list(x = x, y = y)
# $x
# [1] 10.09661 11.78266 14.45663 16.89669 18.49830 20.55375
#
# $y
# [1] 0.5558297 1.4731508 0.7129767 0.7239984 0.4007074 0.9183112Note that the author intentionally makes the toy example data x not equidistant, as neither the chained trapezoidal rule framework nor the functions pracma::trapz() and pracma::cumtrapz() (Borchers 2025, v2.4.6) require the equidistant assumption.
Listing 11.2 calculates the trapezoidal integration (Equation 11.1, Figure 11.1) of the toy example x and y (Listing 11.1).
pracma::trapz() (Borchers 2025) (Listing 11.1)
pracma::trapz(x, y)
# [1] 8.642715In R nomenclature, the term “cumulative” indicates an inclusive scan, that the ith element of the output is the operation result of the first-i-elements of the input. For example, Listing 11.3 is a summation, while Listing 11.4 is a cumulative summation.
base::sum()
sum(1:5)
# [1] 15base::cumsum()
cumsum(1:5)
# [1] 1 3 6 10 15Listing 11.5 calculates the cumulative trapezoidal integration of the toy example x and y (Listing 11.1). Note that the first element of the output is always 0, as a trapezoidal integration needs at least \(\left(x_0, y_0\right)\) and \(\left(x_1, y_1\right)\).
pracma::cumtrapz() (Borchers 2025) (Listing 11.1)
pracma::cumtrapz(x, y)
# [,1]
# [1,] 0.000000
# [2,] 1.710484
# [3,] 4.633309
# [4,] 6.386462
# [5,] 7.287131
# [6,] 8.64271511.1 (Cumulative) Average Vertical Height
The average vertical height of a trapezoidal integration is the chained-trapezoid (Equation 11.1) divided by the length of \(x\)-domain \(b-a\),
\[ \begin{split} (b-a)^{-1}\displaystyle\int_a^b f(x)\,dx & \approx (b-a)^{-1} \sum_{k=1}^{N}{\dfrac{y_{k-1}+y_k}{2}}\Delta x_{k} \\ & = N^{-1}\left(\frac{y_0}{2}+\sum _{k=1}^{N-1}y_k+\frac{y_N}{2}\right), \quad\iff\ \forall k, \Delta x_{k}\equiv \Delta x \end{split} \tag{11.2}\]
Function vtrapz() (Listing 11.6) calculates the average vertical height of a trapezoidal integration (Equation 11.2). The (tentative) prefix v indicates “vertical”. The shaded rectangle in Figure 11.1 has the same area as the trapezoidal integration. Therefore, the author defines the vertical height of the shaded rectangle as the average vertical height of the trapezoidal integration.
vtrapz() (Listing 11.1)
vtrapz(x, y)
# [1] 0.8264894The S3 generic function cumvtrapz() calculates the cumulative average vertical height of a trapezoidal integration. Package groupedHyperframe (v0.3.2.20251225) implements the following S3 methods (Table 11.1),
S3 methods of groupedHyperframe::cumvtrapz (v0.3.2.20251225)
| visible | isS4 | |
|---|---|---|
cumvtrapz.fv |
TRUE | FALSE |
cumvtrapz.fvlist |
TRUE | FALSE |
cumvtrapz.hyperframe |
TRUE | FALSE |
cumvtrapz.numeric |
TRUE | FALSE |
The S3 method cumvtrapz.numeric() takes a numeric-vector input of x. The output (Listing 11.7) has,
- the 1st element of
NaN, i.e., the result of0(trapzoidal integration) divided by0(length of \(x\)-domain). The firstNaN-value is often removed for downstream analysis (default parameter valuerm1 = TRUE); - the 2nd element of \((y_0+y_1)/2\);
- the \((i+1)\)-th element of \(i^{-1}(y_0/2+\sum_{k=1}^{i-1}y_k+y_i/2)\), if \(\{x_k;k=0,\cdots,i\}\) are equidistant, i.e., \(\forall k\in\{1,\cdots,i\}\), \(\Delta x_{k}\equiv \Delta x\). Note that the R does not use zero-based numbering; i.e., R
vectorindices start at1Linstead of0L; - the \((N+1)\)-th and last element of Equation 11.2 (Listing 11.6).
cumvtrapz.numeric() (Listing 11.1)
cumvtrapz(x, y, rm1 = FALSE)
# [,1]
# [1,] NaN
# [2,] 1.0144903
# [3,] 1.0626788
# [4,] 0.9391734
# [5,] 0.8673404
# [6,] 0.826489411.2 Visualization
The S3 generic function visualize_vtrapz() visualizes the concept of (cumulative) average vertical height of a trapezoidal integration. Package groupedHyperframe (v0.3.2.20251225) implements the following S3 methods (Table 11.2),
S3 methods of groupedHyperframe::visualize_vtrapz (v0.3.2.20251225)
| visible | isS4 | |
|---|---|---|
visualize_vtrapz.density |
TRUE | FALSE |
visualize_vtrapz.ecdf |
TRUE | FALSE |
visualize_vtrapz.function |
TRUE | FALSE |
visualize_vtrapz.fv |
TRUE | FALSE |
visualize_vtrapz.listof |
TRUE | FALSE |
visualize_vtrapz.loess |
TRUE | FALSE |
visualize_vtrapz.numeric |
TRUE | FALSE |
Figure 11.1 visualizes the toy example (Listing 11.1) of the non-equidistant \((x_0,x_1,\cdots,x_5)^t\) and \((y_0,y_1,\cdots,y_5)^t\).
visualize_vtrapz(x, y) +
ggplot2::labs(y = NULL) +
ggplot2::theme_minimal()
Figure 11.2 visualizes the kernel stats::density() of a random normal sample of size 1e3L, using the elements $x and $y. Obviously, this practice is meaningless mathematically, as a density function has an \(x\)-domain of \((-\infty, \infty)\), thus the (cumulative) average vertical height of the trapzoidal integration under a density curve should be a constant of zero. This practice is meaningful to an empirical density though, if we use the range of the actual observations as the \(x\)-domain.
stats::density()
set.seed(12); rnorm(n = 1e3L) |> density() |> visualize_vtrapz() +
ggplot2::theme_minimal()stats::density()
Figure 11.3 visualizes the empirical cumulative distribution function stats::ecdf() of a random normal sample of size 1e3L, using the objects x and y in the enclosing environment. Obviously, this practice is meaningless mathematically, as a distribution function has an \(x\)-domain of \((-\infty, \infty)\), thus the (cumulative) average vertical height of the trapzoidal integration under a distribution curve should be a constant of zero. This practice is meaningful to an empirical distribution though, if we use the range of the actual observations as the \(x\)-domain.
stats::ecdf()
set.seed(27); rnorm(n = 1e3L) |> ecdf() |> visualize_vtrapz() +
ggplot2::theme_minimal()stats::ecdf()
Figure 11.4 visualizes the local polynomial regression fit stats::loess() with the numeric response $dist and the numeric predictor $speed in the data set cars from package datasets shipped with R version 4.5.2 (2025-10-31).
stats::loess()
loess(dist ~ speed, data = datasets::cars) |> visualize_vtrapz() +
ggplot2::theme_minimal()stats::loess()
Figure 11.5 visualizes the receiver operating characteristic (ROC) curve of the point-pattern (Baddeley et al. 2025) swedishpines (Section 10.20). Note that the returns of,
- Listing 11.12, the average vertical height of the trapezoidal integration under the ROC-curve of a point-pattern; and
- Listing 11.13, the area-under-ROC-curve (AU-ROC) of a point-pattern,
are mathematically equivalent, although not numerically identical (as of package spatstat.explore v3.6.0.5), because the \(x\)-domain is \([0,1]\) for an ROC curve of a point-pattern.
vtrapz of ROC, via spatstat.explore::roc.ppp()
spatstat.data::swedishpines |>
spatstat.explore::roc.ppp(covariate = 'x') |>
spatstat.explore::with.fv(expr = {
vtrapz(x = p, y = raw)
})
# [1] 0.5407941spatstat.explore::auc.ppp()
spatstat.data::swedishpines |>
spatstat.explore::auc.ppp(covariate = 'x')
# [1] 0.5407314spatstat.explore::roc.ppp()
spatstat.data::swedishpines |>
spatstat.explore::roc.ppp(covariate = 'x') |>
visualize_vtrapz() +
ggplot2::theme_minimal()spatstat.explore::roc.ppp()
Figure 11.6 visualizes the conditional mean \(E(r)\) and variance \(V(r)\) between the points and the marks (Schlather, Ribeiro, and Diggle 2003) of the point-pattern spruces (Section 10.19) using the functions spatstat.explore::Emark() and spatstat.explore::Vmark() with an \(r\)-vector of default length-513L (v3.6.0.5).
spatstat.explore::Emark() and spatstat.explore::Vmark()
list(
spatstat.data::spruces |>
spatstat.explore::Emark(),
spatstat.data::spruces |>
spatstat.explore::Vmark()
) |>
visualize_vtrapz.listof(draw.rect = FALSE) &
ggplot2::theme_minimal()