compboost
Created by Daniel Schalk
compboost
C++
.R
by exposing the important classes.C++
documentation generated by doxygen.
compboost
C++
.C++
to R
using Rcpp
.Rcpp Armadillo
for linear algebra operations.C++
:
R
are also expensive, in C++
they aren't.
C++
train.test = "
arma::mat trainTest (arma::mat X, arma::mat y)
{
return arma::inv(X.t() * X) * X.t() * y;
}
"
trainTestR = function (X, y) { return (solve(t(X) %*% X) %*% t(X) %*% y) }
Rcpp::cppFunction(code = train.test, depends = "RcppArmadillo", rebuild = TRUE)
n = 50000
p = 40
X = matrix(rnorm(n * p), nrow = n, ncol = p)
y = matrix(runif(n), nrow = n, ncol = 1)
# Equivalent to spline train:
microbenchmark::microbenchmark(
"C++" = trainTest(X, y),
"R" = trainTestR(X, y)
)
## Unit: milliseconds
## expr min lq mean median uq max neval cld
## C++ 40.19887 41.17449 42.16829 41.70505 42.99132 46.74248 100 a
## R 95.91568 98.72515 111.33488 102.18383 105.70319 211.14321 100 b
C++
HTML
, CSS
and JavaScript
HTML
to explore the model.JavaScript
and D3
.R
.R
predict
, print
, summary
, plot
) for a compboost
class.
compboost
QuadraticLoss
class for:
C++
classC++
wrapper classR S4
classC++
to R
.
C++
Classes: Abstract Loss
class Loss
{
public:
/// Specific loss function
virtual arma::vec definedLoss (const arma::vec&, const arma::vec&) const = 0;
/// Gradient of loss functions for pseudo residuals
virtual arma::vec definedGradient (const arma::vec&, const arma::vec&) const = 0;
/// Constant initialization of the empirical risk
virtual double constantInitializer (const arma::vec&) const = 0;
virtual ~Loss ();
protected:
/// Custom offset:
double custom_offset;
/// Tag if a custom offset is used
bool use_custom_offset = false;
/// Weights:
arma::vec weights;
};
C++
Classes: Child QuadraticLoss
class QuadraticLoss : public Loss
{
public:
/// Default Constructor
QuadraticLoss ();
/// Constructor to initialize custom offset
QuadraticLoss (const double&);
/// Specific loss function
arma::vec definedLoss (const arma::vec&, const arma::vec&) const;
/// Gradient of loss functions for pseudo residuals
arma::vec definedGradient (const arma::vec&, const arma::vec&) const;
/// Constant initialization of the empirical risk
double constantInitializer (const arma::vec&) const;
};
C++
Classes: Child QuadraticLoss
Implementation
QuadraticLoss::QuadraticLoss () { }
QuadraticLoss::QuadraticLoss (const double& custom_offset0)
{
custom_offset = custom_offset0;
use_custom_offset = true;
}
arma::vec QuadraticLoss::definedLoss (const arma::vec& true_value,
const arma::vec& prediction) const
{
return arma::pow(true_value - prediction, 2) / 2;
}
arma::vec QuadraticLoss::definedGradient (const arma::vec& true_value,
const arma::vec& prediction) const
{
return prediction - true_value;
}
double QuadraticLoss::constantInitializer (const arma::vec& true_value) const
{
if (use_custom_offset) { return custom_offset; }
return arma::mean(true_value);
}
C++
Wrapper Classes
class QuadraticLossWrapper : public LossWrapper
{
public:
QuadraticLossWrapper () { obj = new loss::QuadraticLoss(); }
QuadraticLossWrapper (double custom_offset) { obj = new loss::QuadraticLoss(custom_offset); }
arma::vec testLoss (arma::vec& true_value, arma::vec& prediction) {
return obj->definedLoss(true_value, prediction);
}
arma::vec testGradient (arma::vec& true_value, arma::vec& prediction) {
return obj->definedGradient(true_value, prediction);
}
double testConstantInitializer (arma::vec& true_value) {
return obj->constantInitializer(true_value);
}
};
Rcpp Module
RCPP_EXPOSED_CLASS(LossWrapper);
RCPP_MODULE (loss_module)
{
using namespace Rcpp;
class_< LossWrapper > ("Loss")
.constructor ()
;
class_< QuadraticLossWrapper > ("QuadraticLoss")
.derives< LossWrapper > ("Loss")
.constructor ()
.constructor < double > ()
.method("testLoss", &QuadraticLossWrapper::testLoss,
"Test the defined loss function of the loss")
.method("testGradient", &QuadraticLossWrapper::testGradient,
"Test the defined gradient of the loss")
.method("testConstantInitializer", &QuadraticLossWrapper::testConstantInitializer,
"Test the constant initializer function of the loss")
;
}
R S4
Classes
QuadraticLoss
## C++ class 'QuadraticLoss' <0000000010de5960>
## Constructors:
## QuadraticLoss()
## QuadraticLoss(double)
##
## Fields: No public fields exposed by this class
##
## Methods:
## double testConstantInitializer(arma::Col< double >)
## docstring : Test the constant initializer function of the loss
## arma::Col< double > testGradient(arma::Col< double >, arma::Col< double >)
## docstring : Test the defined gradient of the loss
## arma::Col< double > testLoss(arma::Col< double >, arma::Col< double >)
## docstring : Test the defined loss function of the loss
##
q.loss = QuadraticLoss$new()
q.loss$testConstantInitializer(1:10)
## 5.5
batchtools
.
Compboost
as well as mboost
stores the inverse to boost
performance of the algorithm.
compboost
and mboost
with disabling and
enabling sparse matrices for mboost
.
R
packages to get memory usage have some issues in
tracking memory allocations done on the heap by C++
C++
program to track the memory
usage every second.
R
API
R
cannot just be used to run the main algorithm but also to:
R
users.R
to transform the data parallel instead of sequential.C++
to find the best base-learner in parallel.Depending on the structure, it is possible to increase performance:
dense.spline.train = "
arma::mat test1 (arma::mat Z, arma::mat X, arma::mat y) { return Z * X * y; }
"
sparse.spline.train = "
arma::mat test2 (arma::mat Z, arma::sp_mat X, arma::mat y)
{
// Brackets are important due to CSC format:
return Z * (X * y);
}
"
Rcpp::cppFunction(code = dense.spline.train, depends = "RcppArmadillo", rebuild = TRUE)
Rcpp::cppFunction(code = sparse.spline.train, depends = "RcppArmadillo", rebuild = TRUE)
n = 100000; p = 40;
Z = matrix(runif(p^2), p, p)
X = matrix(0, nrow = n, ncol = p)
X[1:n, sample(1:p, 5)] = rnorm(10000 * 5)
y = matrix(runif(n), nrow = n, ncol = 1)
betas = matrix(runif(p), nrow = p, ncol = 1)
X.sparse = as(X, "sparseMatrix")
# Equivalent to spline train:
microbenchmark::microbenchmark(
"dense spline train" = test1(Z, t(X), y),
"sparse spline train" = test2(Z, t(X.sparse), y)
)
## Unit: milliseconds
## expr min lq mean median uq max neval cld
## dense spline train 26.72330 28.01085 44.14224 29.70544 32.75517 144.2635 100 b
## sparse spline train 10.13785 10.55193 15.07657 10.93836 11.60216 130.8875 100 a
compboost
R
plotting systems like base
or ggplot2
.HTML
and JavaScript
.