2  Grouped Hyper Data Frame

The examples in Chapter 2 require

library(groupedHyperframe)
library(survival)
search path & loadedNamespaces on author’s computer
search()
#  [1] ".GlobalEnv"                "package:survival"          "package:groupedHyperframe" "package:stats"             "package:graphics"          "package:grDevices"         "package:utils"            
#  [8] "package:datasets"          "package:methods"           "Autoloads"                 "package:base"
loadedNamespaces() |> sort.int()
#  [1] "abind"             "base"              "cli"               "cluster"           "codetools"         "compiler"          "datasets"          "deldir"            "digest"           
# [10] "doParallel"        "dplyr"             "evaluate"          "farver"            "fastmap"           "fastmatrix"        "foreach"           "generics"          "geomtextpath"     
# [19] "GET"               "ggplot2"           "glue"              "goftest"           "graphics"          "grDevices"         "grid"              "gridExtra"         "groupedHyperframe"
# [28] "gtable"            "htmltools"         "htmlwidgets"       "iterators"         "jsonlite"          "knitr"             "lattice"           "lifecycle"         "magrittr"         
# [37] "Matrix"            "matrixStats"       "methods"           "nlme"              "otel"              "parallel"          "patchwork"         "pillar"            "pkgconfig"        
# [46] "polyclip"          "pracma"            "R6"                "RColorBrewer"      "rlang"             "rmarkdown"         "rstudioapi"        "S7"                "scales"           
# [55] "SpatialPack"       "spatstat.data"     "spatstat.explore"  "spatstat.geom"     "spatstat.random"   "spatstat.sparse"   "spatstat.univar"   "spatstat.utils"    "splines"          
# [64] "stats"             "survival"          "systemfonts"       "tensor"            "textshaping"       "tibble"            "tidyselect"        "tools"             "utils"            
# [73] "vctrs"             "viridisLite"       "xfun"              "yaml"

A hyper data frame (hyperframe, Chapter 26, package spatstat.geom, v3.6.1.16) contains columns that are either atomic vectors, as in a standard data frame, or lists of objects of the same class—referred to as hypercolumns. This data structure is particularly well suited for spatial analysis contexts, such as medical imaging, where each element in a hypercolumn can represent the spatial information contained in a single image. For example, the hyper data frame demohyper (Section 10.8) from package spatstat.data (v3.1.9) contains a regular column Group, a point-pattern (ppp) hypercolumn Points, and a pixel-image (im) hypercolumn Image.

spatstat.data::demohyper
# Hyperframe:
#   Points Image Group
# 1  (ppp)  (im)     a
# 2  (ppp)  (im)     b
# 3  (ppp)  (im)     a

Package groupedHyperframe (v0.3.2.20251225) introduces the grouped hyper data frame, a hyper data frame augmented with a (nested) grouping structure (Chapter 25).

The author provides a toy dataset wrobel_lung, originally contributed by Dr. Julia Wrobel. Listing 2.1 creates a subset lung0, in which the non-identical column(s) within the lowest-level group image_id (under the nested grouping structure ~patient_id/image_id) are hladr and phenotype.

Listing 2.1: Data frame lung0
lung0 = wrobel_lung |>
  within.data.frame(expr = {
    x = y = NULL
    dapi = NULL
  })
lung0
#                image_id    patient_id gender hladr phenotype    OS age
# 1     [40864,18015].im3 #01 0-889-121      F 0.115  CK-.CD8- 3488+  85
# 2     [40864,18015].im3 #01 0-889-121      F 0.239  CK-.CD8- 3488+  85
# 3     [40864,18015].im3 #01 0-889-121      F 0.268  CK-.CD8- 3488+  85
# 4     [40864,18015].im3 #01 0-889-121      F 0.245  CK-.CD8- 3488+  85
# 5     [40864,18015].im3 #01 0-889-121      F 0.127  CK+.CD8- 3488+  85
# 6     [40864,18015].im3 #01 0-889-121      F 0.136  CK+.CD8- 3488+  85
# ✂️ --- output truncated --- ✂️

Listing 2.2 creates a grouped hyper data frame lung_g from the data frame lung0 (Listing 2.1) by specifying a (nested) grouping structure (Section 18.1),

Listing 2.2: Grouped hyper data frame lung_g
lung_g = lung0 |> 
  as.groupedHyperframe(group = ~ patient_id/image_id)
lung_g
# Grouped Hyperframe: ~patient_id/image_id
# 
# 15 image_id nested in
# 3 patient_id
# 
#        hladr phenotype          image_id    patient_id gender    OS age
# 1  (numeric)  (factor) [40864,18015].im3 #01 0-889-121      F 3488+  85
# 2  (numeric)  (factor) [42689,19214].im3 #01 0-889-121      F 3488+  85
# 3  (numeric)  (factor) [42806,16718].im3 #01 0-889-121      F 3488+  85
# 4  (numeric)  (factor) [44311,17766].im3 #01 0-889-121      F 3488+  85
# ✂️ --- output truncated --- ✂️

Listing 2.3 computes and aggregates the quantiles of each element in the numeric-hypercolumn lung_g$hladr at the biologically independent grouping level patient_id (Section 3.3.3, Section 3.4).

Listing 2.3: Aggregated quantiles
lung_g |>
  quantile(probs = seq.int(from = .01, to = .99, by = .01)) |>
  aggregate(by = ~ patient_id)
# Hyperframe:
#      patient_id gender    OS age hladr.quantile
# 1 #01 0-889-121      F 3488+  85      (numeric)
# 2 #02 1-037-393      M  1605  66      (numeric)
# 3 #03 2-080-378      M   176  84      (numeric)