Skip to contents

Final Production Wrapper for GBM (Tunable & Robust). Estimates a Cox proportional hazards model via gradient boosting. Uses the Breslow estimator with a step-function approach for the baseline hazard. Includes internal safeguards against C++ crashes and small cross-validation folds.

Usage

surv.gbm(
  time,
  event,
  X,
  newdata,
  new.times,
  obsWeights,
  id,
  n.trees = 1000,
  interaction.depth = 2,
  shrinkage = 0.01,
  cv.folds = 5,
  n.minobsinnode = 10,
  ...
)

Arguments

time

Observed follow-up time.

event

Observed event indicator.

X

Training covariate data.frame.

newdata

Test covariate data.frame to use for prediction.

new.times

Times at which to obtain the predicted survivals.

obsWeights

Observation weights.

id

Optional cluster/individual ID indicator.

n.trees

Integer specifying the total number of trees to fit (default: 1000).

interaction.depth

Maximum depth of variable interactions (default: 2).

shrinkage

A shrinkage parameter applied to each tree (default: 0.01).

cv.folds

Number of cross-validation folds to perform internally for optimal tree selection (default: 5).

n.minobsinnode

Minimum number of observations in the trees terminal nodes (default: 10).

...

Additional arguments passed to gbm.

Value

A list containing:

  • fit: The fitted model object (e.g., the raw coxph or xgb.Booster object). If the model fails to fit, this may be an object of class try-error.

  • pred: A numeric matrix of cross-validated survival predictions evaluated at the specified new.times grid.

Examples

if (requireNamespace("gbm", quietly = TRUE)) {
  data("metabric", package = "SuperSurv")
  dat <- metabric[1:30, ]
  x_cols <- grep("^x", names(dat))[1:3]
  X <- dat[, x_cols, drop = FALSE]
  newX <- X[1:5, , drop = FALSE]
  times <- seq(50, 150, by = 50)

  fit <- surv.gbm(
    time = dat$duration,
    event = dat$event,
    X = X,
    newdata = newX,
    new.times = times,
    obsWeights = rep(1, nrow(dat)),
    id = NULL,
    n.trees = 20,
    interaction.depth = 1,
    shrinkage = 0.05,
    cv.folds = 0,
    n.minobsinnode = 3
  )

  dim(fit$pred)
}
#> OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
#> [1] 5 3