In order to provide high flexibility, we decided to write the base
implementation of compboost in C++ to make use
of object oriented programming. These C++ classes can be
used in R after exposing them as S4 classes.
To take away this layer of abstraction we decided to break these
S4 classes to one R6 wrapper class.
All in all, the class system of compboost is a mixture
of raw exposed S4 classes and the convenience class written
in R6. As usual for object oriented programming, the
classes works on a reference base. This introduction aims to show how
these references work and can be accessed.
The main classes that is most affected by references is the Response class.
Response Class
The target variable is represented as an object that inherits from
the Response class. Depending on the target type we like to
have different transformations of the internally predicted scores. For
instance, having a binary classification task the score \(\hat{f}(x) \in \mathbb{R}\) is transformed
to a \([0,1]\) scale by using the
logistic function:
\[ \hat{\pi}(x) = \frac{1}{1 + \exp(-\hat{f}(x))} \]
To show the how references work here, we first define a
ResponseBinaryClassif object. Therefore, we use the
mtcars dataset and create a new binary target variable for
fast \(\text{qsec} < 17\)
or slow \(\text{qsec} \geq
17\) cars and create the response object:
df = mtcars[, c("mpg", "disp", "hp", "drat", "wt")]
df$qsec_cat = ifelse(mtcars$qsec < 17, "fast", "slow")
obj_response = ResponseBinaryClassif$new("qsec_cat", "fast", df$qsec_cat)
obj_response
#>
#> Binary classification response of target "qsec_cat" and threshold 0.5
#> ResponseBinaryClassifPrinterTo access the underlying representation of the response class (here a
binary variable) one can use $getResponse(). In the
initialization of a new response object, the prediction \(\hat{f} \in \mathbb{R}\) is initialized
with zeros. We can also use the response object to calculate the
transformed predictions \(\hat{\pi} \in
[0,1]\):
knitr::kable(head(data.frame(
target = df$qsec_cat,
target_representation = obj_response$getResponse(),
prediction_initialization = obj_response$getPrediction(),
prediction_transformed = obj_response$getPredictionTransform()
)))| target | target_representation | prediction_initialization | prediction_transformed |
|---|---|---|---|
| fast | 1 | 0 | 0.5 |
| slow | -1 | 0 | 0.5 |
| slow | -1 | 0 | 0.5 |
| slow | -1 | 0 | 0.5 |
| slow | -1 | 0 | 0.5 |
| slow | -1 | 0 | 0.5 |
In the case of binary classification, we can use the response object to calculate the predictions on a label basis by using a specified threshold \(a\): \[ \hat{y} = 1 \ \ \text{if} \ \ \hat{\pi}(x) \geq a \]
The default threshold here is 0.5:
obj_response$getThreshold()
#> [1] 0.5
head(obj_response$getPredictionResponse())
#> [,1]
#> [1,] 1
#> [2,] 1
#> [3,] 1
#> [4,] 1
#> [5,] 1
#> [6,] 1By setting the threshold to 0.6 we observe now that each class is predicted as negative:
obj_response$setThreshold(0.6)
head(obj_response$getPredictionResponse())
#> [,1]
#> [1,] -1
#> [2,] -1
#> [3,] -1
#> [4,] -1
#> [5,] -1
#> [6,] -1This behavior has nothing to do with references at the moment. During
the fitting of a component-wise boosting model, these predictions are
adjusted over and over again by the Compboost object. This
is where the reference comes in:
cboost = boostSplines(data = df, target = obj_response,
iterations = 2000L, trace = 500L)
#> 1/2000 risk = 0.59 time = 0
#> 500/2000 risk = 0.22 time = 16384
#> 1000/2000 risk = 0.15 time = 44252
#> 1500/2000 risk = 0.12 time = 82352
#> 2000/2000 risk = 0.1 time = 129766
#>
#>
#> Train 2000 iterations in 0 Seconds.
#> Final risk based on the train set: 0.1Having again a look at the predictions shows the difference to the values before training. During the fitting process, the predictions of the response object are updated by the model:
knitr::kable(head(data.frame(
target = df$qsec_cat,
prediction = obj_response$getPrediction(),
prediction_transformed = obj_response$getPredictionTransform(),
prediction_response = obj_response$getPredictionResponse()
)))| target | prediction | prediction_transformed | prediction_response |
|---|---|---|---|
| fast | -0.0082833 | 0.4979292 | -1 |
| slow | -1.0690678 | 0.2555804 | -1 |
| slow | -3.1015214 | 0.0430445 | -1 |
| slow | -5.9979559 | 0.0024777 | -1 |
| slow | -2.2784370 | 0.0929246 | -1 |
| slow | -2.7290270 | 0.0612821 | -1 |
