33  Internal Utility Functions

The examples in Chapter 33 require that the search path contains the following namespaces,

library(groupedHyperframe)
library(groupedHyperframe.random)
library(maxEff)
# Registered S3 method overwritten by 'pROC':
#   method   from            
#   plot.roc spatstat.explore

33.1 'add_numeric_'

The internal class 'add_numeric_' defined in package maxEff v0.2.1 inherits from the class 'call', with additional attributes

  • attr(., 'effsize'), a numeric scalar, regression coefficients, i.e., effect size effsize, of the additional numeric predictor
  • attr(., 'model'), the regression model with additional numeric predictor

The S3 method dispatch base::print.default() displays each 'add_numeric_' object.

Example: training models a0, 1st element
a0[[1L]]
Example: training models a0, 2nd element
a0[[2L]]

The S3 method dispatch spatstat.geom::with.hyperframe() obtains the selected numeric predictors by passing the call to parameter ee.

Example: 1st selected numeric predictor
s0 |>
  with(ee = a0[[1L]]) |> # ?spatstat.geom::with.hyperframe
  summary.default()
s1 |>
  with(ee = a0[[1L]]) |> # ?spatstat.geom::with.hyperframe
  summary.default()
Example: 2nd selected numeric predictor
s0 |>
  with(ee = a0[[2L]]) |> # ?spatstat.geom::with.hyperframe
  summary.default()
s1 |>
  with(ee = a0[[2L]]) |> # ?spatstat.geom::with.hyperframe
  summary.default()

The S3 method dispatch predict.add_numeric_() is the workhorse of the S3 method dispatch predict.add_numeric().

Example: predict.add_numeric_(); predicted models a1, 1st element
a11 = a0[[1L]] |> 
  predict(newdata = s1)
stopifnot(identical(a1[[1L]], a11))
Example: predict.add_numeric_(); predicted models a1, 2nd element
a12 = a0[[2L]] |> 
  predict(newdata = s1)
stopifnot(identical(a1[[2L]], a12))  

33.2 'add_dummy_'

The internal class 'add_dummy_' defined in package maxEff v0.2.1 inherits from the class 'node1' (Chapter 21), with additional attributes

  • attr(., 'p1'), a numeric scalar between 0 and 1, the TRUE probability of the additional logical predictor in the training set
  • attr(., 'effsize'), a numeric scalar, the regression coefficients, i.e., effect size effsize, of the additional logical predictor
  • attr(., 'model'), the regression model with additional logical predictor

The S3 method dispatch base::print.default() displays each 'add_dummy_' object.

Example: training models b0 in training set s0: 1st element
b0[[1L]]
Example: training models b0 in training set s0: 2nd element
b0[[2L]]
Example: training models c0 in test-subset of training set s0: 1st element
c0[[1L]]
Example: training models c0 in test-subset of training set s0: 2nd element
c0[[2L]]

The S3 method dispatch predict.node1() evaluates a dichotomizing rule in a hyperframe. Note that user must call the S3 method dispatch predict.node1() explicitly, otherwise the S3 generic stats::predict() would dispatch to predict.add_dummy_().

Example: predict.node1(); 1st selected logical predictor
b0[[1L]] |> 
  predict.node1(newdata = s0) |>
  table() |> 
  addmargins()  
b0[[1L]] |> 
  predict.node1(newdata = s1) |>
  table() |> 
  addmargins()
Example: predict.node1(); 2nd selected logical predictor
b0[[2L]] |> 
  predict.node1(newdata = s0) |>
  table() |> 
  addmargins() 
b0[[2L]] |> 
  predict.node1(newdata = s1) |>
  table() |> 
  addmargins()  
Example: predict.node1(); 1st selected logical predictor via repeated partitions
c0[[1L]] |>
  predict.node1(newdata = s0) |>
  table() |> 
  addmargins()
c0[[1L]] |>
  predict.node1(newdata = s1) |>
  table() |> 
  addmargins()
Example: predict.node1(); 2nd selected logical predictor via repeated partitions
c0[[2L]] |>
  predict.node1(newdata = s0) |>
  table() |> 
  addmargins()
c0[[2L]] |>
  predict.node1(newdata = s1) |>
  table() |> 
  addmargins()

The S3 method dispatch predict.add_dummy_() is the workhorse of the S3 method dispatch predict.add_dummy().

Example: predict.add_dummy_(); predicted models b1: 1st element
b11 = b0[[1L]] |> 
  predict(newdata = s1)
stopifnot(identical(b1[[1L]], b11))
Example: predict.add_dummy_(); predicted models b1: 2nd element
b12 = b0[[2L]] |> 
  predict(newdata = s1)
stopifnot(identical(b1[[2L]], b12))  
Example: predict.add_dummy_(); predicted models c1: 1st element
c11 = c0[[1L]] |> 
  predict(newdata = s1)
stopifnot(identical(c1[[1L]], c11))
Example: predict.add_dummy_(); predicted models c1: 2nd element
c12 = c0[[2L]] |> 
  predict(newdata = s1)
stopifnot(identical(c1[[2L]], c12))  

33.3 grouped_rppp()

Function groupedHyperframe.random::grouped_rppp() implements the matrix parameterization using advanced R language operations. The code snippet inside function grouped_rppp() in Section 4.2 cannot be taken outside function grouped_rppp()!

Previously: p_Matern
set.seed(37); (n = sample(x = 1:4, size = 3L, replace = TRUE)) 
# [1] 2 3 4
set.seed(39); p_Matern = mapply(
  FUN = mvrnorm2, 
  mu = list(kappa = c(3,2), mu = c(10,5), scale = c(.4,.2), meanlog = c(3,5), sdlog = c(.4,.2)), 
  sd = list(kappa = .2, mu = .5, scale = .05, meanlog = .1, sdlog = .01), 
  MoreArgs = list(n = 3L), 
  SIMPLIFY = FALSE
) |>
  within.list(expr = {
    kappa = pmax(kappa, 1 + .Machine$double.eps)
    mu = pmax(mu, 1 + .Machine$double.eps)
    scale = pmax(scale, .Machine$double.eps)
    sdlog = pmax(sdlog, .Machine$double.eps)
  })
Advanced: without language operation
tryCatch(expr = {
  p_Matern |> 
    with.default(expr = {
      spatstat.random::rMatClust(kappa = kappa, scale = scale, mu = mu)
    })
}, error = identity)
# <simpleError: 'scale' should be a single number>

The native pipe operator |> successfully passes the code snippet into function grouped_rppp(), while the pipe operator magrittr::`%>%` (Bache and Wickham 2025, v2.0.4) does not pass the code snippet into function grouped_rppp()!

Advanced: language operation via native pipe |>
p_Matern |> 
  with.default(expr = {
    rMatClust(kappa = kappa, scale = scale, mu = mu) |> 
      grouped_rppp(n = n)
  })
# Grouped Hyperframe: ~g1/g2
# 
# 9 g2 nested in
# 3 g1
# 
# Preview of first 10 (or less) rows:
# 
#     ppp g1 g2
# 1 (ppp)  1  1
# 2 (ppp)  1  2
# 3 (ppp)  2  1
# 4 (ppp)  2  2
# 5 (ppp)  2  3
# 6 (ppp)  3  1
# 7 (ppp)  3  2
# 8 (ppp)  3  3
# 9 (ppp)  3  4
Advanced: language operation via magrittr::`%>%`
library(magrittr)
tryCatch(expr = {
  p_Matern |> 
    with.default(expr = {
      rMatClust(kappa = kappa, scale = scale, mu = mu) %>% 
        grouped_rppp(n = n)
    })
}, error = identity)
# <notSubsettableError in i[[1L]]: object of type 'symbol' is not subsettable>

33.4 mvrnorm2()

Function groupedHyperframe.random::mvrnorm2() is a wrapper of the multivariate normal simulator function MASS::mvrnorm() (Venables and Ripley 2002) to accept the standard deviation(s) \(\sigma\) via parameter sd

  • parameter \(\sigma\) sd may be a numeric scalar, indicating an all-equal diagonal-variance zero-covariance matrix;
  • parameter \(\sigma\) sd may be a numeric vector of the same length as parameter \(\mu\) mu, indicating a diagonal-variance zero-covariance matrix;
  • To specify a full variance-covariance matrix \(\Sigma\), user should use function MASS::mvrnorm() (Venables and Ripley 2002).
Example: function mvrnorm2(), scalar \(\sigma\)
set.seed(12); a1 = MASS::mvrnorm(n = 3L, mu = c(0, 0), Sigma = diag(x = .9^2, nrow = 2L))
set.seed(12); a2 = mvrnorm2(n = 3L, mu = c(0, 0), sd = .9)
stopifnot(identical(a1, a2))
Example: function mvrnorm2(), vector \(\sigma\)
set.seed(42); b1 = MASS::mvrnorm(n = 3L, mu = c(0, 0), Sigma = diag(x = c(.9, 1.1)^2, nrow = 2L))
set.seed(42); b2 = mvrnorm2(n = 3L, mu = c(0, 0), sd = c(.9, 1.1))
stopifnot(identical(b1, b2))
Example: function mvrnorm2(), matrix \(\Sigma\)
(R = matrix(c(1, .5, .5, 1), nrow = 2L)) # correlation matrix
#      [,1] [,2]
# [1,]  1.0  0.5
# [2,]  0.5  1.0
(S = c(.9, 1.1) * R * rep(c(.9, 1.1), each = 2L)) # variance-covariance matrix
#       [,1]  [,2]
# [1,] 0.810 0.495
# [2,] 0.495 1.210
set.seed(23); c1 = MASS::mvrnorm(n = 3L, mu = c(0, 0), Sigma = S)
set.seed(23); c2 = mvrnorm2(n = 3L, mu = c(0, 0), Sigma = S)
stopifnot(identical(c1, c2))

33.5 statusPartition()

Function maxEff::statusPartition() (v0.2.1)

  1. splits a left-censored survival::Surv object by its survival status, i.e., observed vs. left-censored;
  2. partitions the observed and left-censored subjects, respectively, into test/training sets.

See Section 10.2 for the usage of the terms “split” vs. “partition”.

Consider a toy example of

Data: left-censored Surv object capacitor_failure
capacitor_failure = survival::capacitor |> 
  with(expr = survival::Surv(time, status))
capacitor_failure
#  [1]  439   904  1092  1105   572   690   904  1090   315   315   439   628   258   258   347   588   959  1065  1065 
# [20] 1087   216   315   455   473   241   315   332   380   241   241   435   455  1105+ 1105+ 1105+ 1105+ 1090+ 1090+
# [39] 1090+ 1090+  628+  628+  628+  628+  588+  588+  588+  588+ 1087+ 1087+ 1087+ 1087+  473+  473+  473+  473+  380+
# [58]  380+  380+  380+  455+  455+  455+  455+

Function statusPartition() intends to avoid the situation that a Cox proportional hazards model survival::coxph() in one or more of the partitioned data set being degenerate due to the fact that all subjects in that partition being censored.

Example: statusPartition()
set.seed(12); id = capacitor_failure |>
  statusPartition(times = 1L, p = .5)
capacitor_failure[id[[1L]], 2L] |> 
  table() # balanced by survival status
# 
#  0  1 
# 16 16

Function statusPartition() is an extension of the very popular function caret::createDataPartition(), which stratifies a Surv object by the quantiles of its survival time (as of package caret v7.0.1).

Review: caret::createDataPartition(), not balanced by survival status
set.seed(12); id0 = capacitor_failure |>
  caret::createDataPartition(times = 1L, p = .5)
capacitor_failure[id0[[1L]], 2L] |> 
  table()
# 
#  0  1 
# 19 14

33.6 rfactor()

Function groupedHyperframe.random::rfactor() is a wrapper of function base::sample.int(). Function rfactor()

  • has first parameter n of the random sample size, similar to functions stats::rlnorm(), stats::rnbinom(), etc.
  • returns a factor
Example: rfactor()
set.seed(18); rfactor(n = 20L, prob = c(4,2,3))
#  [1] 2 3 2 1 1 3 1 3 1 1 3 3 1 1 2 1 3 1 2 1
# Levels: 1 2 3
Example: rfactor() with levels
set.seed(18); rfactor(n = 20L, prob = c(4,2,3), levels = letters[1:3])
#  [1] b c b a a c a c a a c c a a b a c a b a
# Levels: a b c

33.7 .rppp()

Function groupedHyperframe.random::.rppp() (v0.2.0) implements the vectorized parameterization using advanced R language operations. The code snippet inside function .rppp() in Section 4.1 cannot be taken outside function .rppp()!

Advanced: without language operation
tryCatch(expr = {
  spatstat.random::rMatClust(kappa = c(10, 5), mu = c(8, 4), scale = c(.15, .06))
}, error = identity)
# <simpleError: 'scale' should be a single number>

The native pipe operator |> successfully passes the code snippet into function .rppp(), while the pipe operator magrittr::`%>%` (Bache and Wickham 2025, v2.0.4) does not pass the code snippet into function .rppp()!

Advanced: language operation via native pipe |>
set.seed(12); r = rMatClust(kappa = c(10, 5), mu = c(8, 4), scale = c(.15, .06)) |>
  .rppp()
# Point-pattern simulated by `spatstat.random::rMatClust()`
# 
Advanced: language operation via magrittr::`%>%`
library(magrittr)
tryCatch(expr = {
  rMatClust(kappa = c(10, 5), mu = c(8, 4), scale = c(.15, .06)) %>% 
    .rppp()
}, error = identity)
# <notSubsettableError in i[[1L]]: object of type 'symbol' is not subsettable>