library(groupedHyperframe)
library(survival)
2 Grouped Hyper Data Frame
The examples in Chapter 2 require that the
search
path contains the followingnamespace
s,
search
path on author’s computer running RStudio (Posit Team 2025)
search()
# [1] ".GlobalEnv" "package:survival" "package:groupedHyperframe" "package:stats"
# [5] "package:graphics" "package:grDevices" "package:utils" "package:datasets"
# [9] "package:methods" "Autoloads" "package:base"
A hyper data frame (hyperframe
, Chapter 17, package spatstat.geom
, v3.6.0.3) contains columns that are either atomic vectors, as in a standard data frame, or lists of objects of the same class—referred to as hypercolumns. This data structure is particularly well suited for spatial analysis contexts, such as medical imaging, where each element in a hypercolumn can represent the spatial information contained in a single image. For example, the dataset demohyper
(Section 8.8) from package spatstat.data
(v3.1.9) contains a regular column Group
, a point-pattern (ppp
) hypercolumn Points
, and a pixel image (im
) hypercolumn Image
.
::demohyper
spatstat.data# Hyperframe:
# Points Image Group
# 1 (ppp) (im) a
# 2 (ppp) (im) b
# 3 (ppp) (im) a
Package groupedHyperframe
(v0.3.0.20251020) introduces the grouped hyper data frame, a hyper data frame augmented with a (nested) grouping structure (Chapter 16).
The authors provide a toy dataset wrobel_lung
, originally contributed by Dr. Julia Wrobel. Consider a subset lung0
, in which the non-identical column(s) within the lowest-level group image_id
(under the nested grouping structure ~patient_id/image_id
) are hladr
and phenotype
.
lung0
= wrobel_lung |>
lung0 within.data.frame(expr = {
= y = NULL
x = NULL
dapi })
|>
lung0 head(n = 7L)
# image_id patient_id gender hladr phenotype OS age
# 1 [40864,18015].im3 #01 0-889-121 F 0.115 CK-.CD8- 3488+ 85
# 2 [40864,18015].im3 #01 0-889-121 F 0.239 CK-.CD8- 3488+ 85
# 3 [40864,18015].im3 #01 0-889-121 F 0.268 CK-.CD8- 3488+ 85
# 4 [40864,18015].im3 #01 0-889-121 F 0.245 CK-.CD8- 3488+ 85
# 5 [40864,18015].im3 #01 0-889-121 F 0.127 CK+.CD8- 3488+ 85
# 6 [40864,18015].im3 #01 0-889-121 F 0.136 CK+.CD8- 3488+ 85
# 7 [40864,18015].im3 #01 0-889-121 F 0.481 CK-.CD8+ 3488+ 85
A grouped hyper data frame lung_g
is created from the data frame lung0
by specifying a (nested) grouping structure (Section 12.1),
lung_g
= lung0 |>
lung_g as.groupedHyperframe(group = ~ patient_id/image_id)
lung_g# Grouped Hyperframe: ~patient_id/image_id
#
# 15 image_id nested in
# 3 patient_id
#
# Preview of first 10 (or less) rows:
#
# hladr phenotype image_id patient_id gender OS age
# 1 (numeric) (factor) [40864,18015].im3 #01 0-889-121 F 3488+ 85
# 2 (numeric) (factor) [42689,19214].im3 #01 0-889-121 F 3488+ 85
# 3 (numeric) (factor) [42806,16718].im3 #01 0-889-121 F 3488+ 85
# 4 (numeric) (factor) [44311,17766].im3 #01 0-889-121 F 3488+ 85
# 5 (numeric) (factor) [45366,16647].im3 #01 0-889-121 F 3488+ 85
# 6 (numeric) (factor) [56576,16907].im3 #02 1-037-393 M 1605 66
# 7 (numeric) (factor) [56583,15235].im3 #02 1-037-393 M 1605 66
# 8 (numeric) (factor) [57130,16082].im3 #02 1-037-393 M 1605 66
# 9 (numeric) (factor) [57396,17896].im3 #02 1-037-393 M 1605 66
# 10 (numeric) (factor) [57403,16934].im3 #02 1-037-393 M 1605 66
The pipeline . |> quantile() |> aggregate()
(Section 3.3.2, Section 3.4) computes and aggregates the quantiles of each element in the numeric-hypercolumn lung_g$hladr
at the biologically independent grouping level patient_id
.
|>
lung_g quantile(probs = seq.int(from = .01, to = .99, by = .01)) |>
aggregate(by = ~ patient_id)
# Hyperframe:
# patient_id gender OS age hladr.quantile
# 1 #01 0-889-121 F 3488+ 85 (numeric)
# 2 #02 1-037-393 M 1605 66 (numeric)
# 3 #03 2-080-378 M 176 84 (numeric)