Final Production Wrapper for GBM (Tunable & Robust). Estimates a Cox proportional hazards model via gradient boosting. Uses the Breslow estimator with a step-function approach for the baseline hazard. Includes internal safeguards against C++ crashes and small cross-validation folds.
Usage
surv.gbm(
time,
event,
X,
newdata,
new.times,
obsWeights,
id,
n.trees = 1000,
interaction.depth = 2,
shrinkage = 0.01,
cv.folds = 5,
n.minobsinnode = 10,
...
)Arguments
- time
Observed follow-up time.
- event
Observed event indicator.
- X
Training covariate data.frame.
- newdata
Test covariate data.frame to use for prediction.
- new.times
Times at which to obtain the predicted survivals.
- obsWeights
Observation weights.
- id
Optional cluster/individual ID indicator.
- n.trees
Integer specifying the total number of trees to fit (default: 1000).
- interaction.depth
Maximum depth of variable interactions (default: 2).
- shrinkage
A shrinkage parameter applied to each tree (default: 0.01).
- cv.folds
Number of cross-validation folds to perform internally for optimal tree selection (default: 5).
- n.minobsinnode
Minimum number of observations in the trees terminal nodes (default: 10).
- ...
Additional arguments passed to
gbm.
Value
A list containing:
fit: The fitted model object (e.g., the rawcoxphorxgb.Boosterobject). If the model fails to fit, this may be an object of classtry-error.pred: A numeric matrix of cross-validated survival predictions evaluated at the specifiednew.timesgrid.
Examples
if (requireNamespace("gbm", quietly = TRUE)) {
data("metabric", package = "SuperSurv")
dat <- metabric[1:30, ]
x_cols <- grep("^x", names(dat))[1:3]
X <- dat[, x_cols, drop = FALSE]
newX <- X[1:5, , drop = FALSE]
times <- seq(50, 150, by = 50)
fit <- surv.gbm(
time = dat$duration,
event = dat$event,
X = X,
newdata = newX,
new.times = times,
obsWeights = rep(1, nrow(dat)),
id = NULL,
n.trees = 20,
interaction.depth = 1,
shrinkage = 0.05,
cv.folds = 0,
n.minobsinnode = 3
)
dim(fit$pred)
}
#> OOB generally underestimates the optimal number of iterations although predictive performance is reasonably competitive. Using cv_folds>1 when calling gbm usually results in improved predictive performance.
#> [1] 5 3
