15  groupedData

The examples in Chapter 15 require that the search path contains the following namespaces,

library(groupedHyperframe)

Function nlme::groupedData() (Pinheiro, Bates, and R Core Team 2025, v3.1.168) creates a grouped data frame, i.e., an R object of S3 class 'groupedData'. Package groupedHyperframe implements the following S3 method dispatches to the class 'groupedData' (Listing 15.1, Table 15.1),

Listing 15.1: Table: S3 method dispatches groupedHyperframe::*.groupedData
Code
methods2kable(class = 'groupedData', package = 'groupedHyperframe', all.names = TRUE)
Table 15.1: S3 method dispatches groupedHyperframe::*.groupedData (v0.3.0.20251020)
visible from generic isS4
as.groupedHyperframe.groupedData TRUE groupedHyperframe groupedHyperframe::as.groupedHyperframe FALSE

15.1 Create groupedHyperframe

The S3 generic function as.groupedHyperframe() are introduced in Section 12.1. The S3 method dispatch as.groupedHyperframe.groupedData() converts a groupedData into a groupedHyperframe (Chapter 16) using its grouping structure.

Listing 15.2 converts the grouped data frame Remifentanil from package nlme (Pinheiro, Bates, and R Core Team 2025, v3.1.168) into a groupedHyperframe.

Data: Remifentanil
nlme::Remifentanil |>
  head(n = 3L)
# Grouped Data: conc ~ Time | Subject
#   ID Subject Time  conc  Rate      Amt   Age  Sex  Ht Wt    BSA     LBM
# 1  1       1  0.0    NA 71.99 107.9850 30.58 Male 171 72 1.8393 56.5075
# 2  1       1  1.5  9.51 71.99  35.9950 30.58 Male 171 72 1.8393 56.5075
# 3  1       1  2.0 11.50 71.99  37.4348 30.58 Male 171 72 1.8393 56.5075
Listing 15.2: Example: function as.groupedHyperframe.groupedData() on Remifentanil
Remifentanil_g = nlme::Remifentanil |> 
  as.groupedHyperframe()
Example: a grouped hyper data frame Remifentanil_g
Remifentanil_g
# Grouped Hyperframe: ~Subject
# <environment: 0x31352eab8>
# 
# 65 Subject
# 
# Preview of first 10 (or less) rows:
# 
#         Time      conc      Rate       Amt ID Subject Age    Sex  Ht   Wt    BSA     LBM
# 1  (numeric) (numeric) (numeric) (numeric) 30      30  21 Female 165 55.9 1.6095 42.8260
# 2  (numeric) (numeric) (numeric) (numeric) 21      21  24 Female 161 58.6 1.6131 43.0953
# 3  (numeric) (numeric) (numeric) (numeric) 25      25  32 Female 157 45.9 1.4278 36.4631
# 4  (numeric) (numeric) (numeric) (numeric) 23      23  23 Female 163 50.0 1.5215 39.5740
# 5  (numeric) (numeric) (numeric) (numeric) 29      29  25 Female 163 54.5 1.5782 41.7695
# 6  (numeric) (numeric) (numeric) (numeric) 28      28  30 Female 178 79.1 1.9708 55.4106
# 7  (numeric) (numeric) (numeric) (numeric) 32      32  54 Female 167 45.0 1.4806 37.4038
# 8  (numeric) (numeric) (numeric) (numeric) 64      64  55 Female 168 74.8 1.8455 50.6969
# 9  (numeric) (numeric) (numeric) (numeric) 22      22  33 Female 163 70.5 1.7607 47.7487
# 10 (numeric) (numeric) (numeric) (numeric) 45      45  72 Female 165 54.4 1.5910 42.1204

Listing 15.3 converts the grouped data frame bdf from package nlme (Pinheiro, Bates, and R Core Team 2025, v3.1.168) into a groupedHyperframe.

Data: bdf
nlme::bdf |>
  head(n = 3L)
# Grouped Data: langPOST ~ IQ.verb | schoolNR
# <environment: 0x1275389d0>
#   schoolNR pupilNR IQ.verb  IQ.perf sex Minority repeatgr aritPRET classNR aritPOST langPRET langPOST ses denomina
# 1        1   17001    15.0 12.33333   0        N        0       14     180       24       36       46  23        1
# 2        1   17002    14.5 10.00000   0        Y        0       12     180       19       36       45  10        1
# 3        1   17003     9.5 11.00000   0        N        0       10     180       24       33       33  15        1
#   schoolSES satiprin natitest meetings currmeet mixedgra percmino aritdiff homework classsiz groupsiz IQ.ver.cen
# 1        11  3.42857        0      1.7  1.83333        0       60       12  2.33333       29       29   3.165938
# 2        11  3.42857        0      1.7  1.83333        0       60       12  2.33333       29       29   2.665938
# 3        11  3.42857        0      1.7  1.83333        0       60       12  2.33333       29       29  -2.334062
#   avg.IQ.ver.cen grpSiz.cen
# 1      -1.514062   5.899432
# 2      -1.514062   5.899432
# 3      -1.514062   5.899432
Listing 15.3: Example: function as.groupedHyperframe.groupedData() on bdf
bdf_g = nlme::bdf |> 
  as.groupedHyperframe()
Example: a grouped hyper data frame bdf_g
bdf_g
# Grouped Hyperframe: ~schoolNR
# <environment: 0x125827318>
# 
# 131 schoolNR
# 
# Preview of first 10 (or less) rows:
# 
#     pupilNR   IQ.verb   IQ.perf      sex Minority  repeatgr  aritPRET   classNR  aritPOST  langPRET  langPOST       ses
# 1  (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric) (numeric)
# 2  (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric) (numeric)
# 3  (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric) (numeric)
# 4  (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric) (numeric)
# 5  (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric) (numeric)
# 6  (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric) (numeric)
# 7  (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric) (numeric)
# 8  (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric) (numeric)
# 9  (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric) (numeric)
# 10 (factor) (numeric) (numeric) (factor) (factor) (ordered) (numeric) (numeric) (numeric) (numeric) (numeric) (numeric)
#    mixedgra  percmino  homework  classsiz  groupsiz IQ.ver.cen grpSiz.cen schoolNR denomina schoolSES satiprin natitest
# 1  (factor) (numeric) (numeric) (numeric) (numeric)  (numeric)  (numeric)       47        3        11  2.85714        0
# 2  (factor) (numeric) (numeric) (numeric) (numeric)  (numeric)  (numeric)      103        3        12  3.00000        1
# 3  (factor) (numeric) (numeric) (numeric) (numeric)  (numeric)  (numeric)        2        1        11  3.00000        0
# 4  (factor) (numeric) (numeric) (numeric) (numeric)  (numeric)  (numeric)      123        2        20  3.14286        1
# 5  (factor) (numeric) (numeric) (numeric) (numeric)  (numeric)  (numeric)       10        1        15  3.00000        0
# 6  (factor) (numeric) (numeric) (numeric) (numeric)  (numeric)  (numeric)      258        3        14  2.71429        0
# 7  (factor) (numeric) (numeric) (numeric) (numeric)  (numeric)  (numeric)       27        1        17  3.28571        0
# 8  (factor) (numeric) (numeric) (numeric) (numeric)  (numeric)  (numeric)       12        1        20  3.14286        0
# 9  (factor) (numeric) (numeric) (numeric) (numeric)  (numeric)  (numeric)      109        1        15  3.28571        0
# 10 (factor) (numeric) (numeric) (numeric) (numeric)  (numeric)  (numeric)      192        3        19  3.57143        1
#    meetings currmeet aritdiff avg.IQ.ver.cen
# 1   2.11111  1.83333       13     -3.5215621
# 2   2.10000  2.16667       17     -5.0840621
# 3   1.60000  1.66667       27     -2.8340621
# 4   2.10000  2.00000       27     -2.0840621
# 5   2.60000  2.66667       27     -1.3340621
# 6   2.00000  2.00000       13     -1.1912049
# 7   1.10000  1.00000        9     -0.4054907
# 8   2.70000  2.66667       17     -2.4007288
# 9   1.60000  1.33333       17     -1.0090621
# 10  3.20000  3.16667       16     -1.0840621

Converting a groupedData (or data.frame) with substantial amount of duplicated information into a groupedHyperframe not necessarily(!!) reduces the memory allocation, because the hyperframe object carries additional auxiliary information. And even when it does reduce the memory allocation, a groupedHyperframe would not reduce much the saved file.size compared to a data.frame, if xz compression is used for both.

Advanced: Reducing memory allocation
unclass(object.size(Remifentanil_g) / object.size(nlme::Remifentanil))
# [1] 0.3094831
f = replicate(n = 2L, expr = tempfile(fileext = '.rds'))
Remifentanil_g |> saveRDS(file = f[1L], compress = 'xz')
nlme::Remifentanil |> saveRDS(file = f[2L], compress = 'xz')
file.size(f[1L]) / file.size(f[2L])
# [1] 1.221687
Advanced: Not reducing memory allocation
unclass(object.size(bdf_g) / object.size(nlme::bdf))
# [1] 25.78725