Functions
LightGBM.LGBMClassification
LightGBM.LGBMRegression
LightGBM.LGBM_BoosterUpdateOneIterCustom
LightGBM.cv
LightGBM.fit!
LightGBM.gain_importance
LightGBM.loadmodel!
LightGBM.predict
LightGBM.savemodel
LightGBM.search_cv
LightGBM.split_importance
LightGBM.LGBMClassification
— MethodLGBMClassification(;[
objective = "multiclass",
boosting = "gbdt",
num_iterations = 10,
learning_rate = .1,
num_leaves = 127,
max_depth = -1,
tree_learner = "serial",
num_threads = Sys.CPU_THREADS,
histogram_pool_size = -1.,
min_data_in_leaf = 100,
min_sum_hessian_in_leaf = 1e-3,
max_delta_step = 0.,
lambda_l1 = 0.,
lambda_l2 = 0.,
min_gain_to_split = 0.,
feature_fraction = 1.,
feature_fraction_bynode = 1.,
feature_fraction_seed = 2,
bagging_fraction = 1.,
pos_bagging_fraction = 1.,
neg_bagging_fraction = 1.,
bagging_freq = 0,
bagging_seed = 3,
early_stopping_round = 0,
extra_trees = false,
extra_seed = 6,
max_bin = 255,
bin_construct_sample_cnt = 200000,
data_random_seed = 1,
init_score = "",
is_sparse = true,
save_binary = false,
categorical_feature = Int[],
use_missing = true,
is_unbalance = false,
boost_from_average = true,
scale_pos_weight = 1.0,
sigmoid = 1.0,
drop_rate = 0.1,
max_drop = 50,
skip_drop = 0.5,
xgboost_dart_mode = false,
uniform_drop = false,
drop_seed = 4,
top_rate = 0.2,
other_rate = 0.1,
min_data_per_group = 100,
max_cat_threshold = 32,
cat_l2 = 10.0,
cat_smooth = 10.0,
metric = ["multi_logloss"],
metric_freq = 1,
is_training_metric = false,
ndcg_at = Int[],
num_machines = 1,
local_listen_port = 12400,
time_out = 120,
machine_list_file = "",
num_class = 1,
device_type="cpu",
gpu_use_dp = false,
gpu_platform_id = -1,
gpu_device_id = -1,
num_gpu = 1,
force_col_wise = false,
force_row_wise = false,
])
Return a LGBMClassification estimator.
LightGBM.LGBMRegression
— MethodLGBMRegression(; [
objective = "regression",
boosting = "gbdt",
num_iterations = 10,
learning_rate = .1,
num_leaves = 127,
max_depth = -1,
tree_learner = "serial",
num_threads = Sys.CPU_THREADS,
histogram_pool_size = -1.,
min_data_in_leaf = 100,
min_sum_hessian_in_leaf = 1e-3,
max_delta_step = 0.,
lambda_l1 = 0.,
lambda_l2 = 0.,
min_gain_to_split = 0.,
feature_fraction = 1.,
feature_fraction_bynode = 1.,
feature_fraction_seed = 2,
bagging_fraction = 1.,
bagging_freq = 0,
bagging_seed = 3,
early_stopping_round = 0,
extra_trees = false
extra_seed = 6,
max_bin = 255,
bin_construct_sample_cnt = 200000,
data_random_seed = 1,
init_score = "",
is_sparse = true,
save_binary = false,
categorical_feature = Int[],
use_missing = true,
feature_pre_filter = true,
is_unbalance = false,
boost_from_average = true,
alpha = 0.9,
drop_rate = 0.1,
max_drop = 50,
skip_drop = 0.5,
xgboost_dart_mode = false,
uniform_drop = false,
drop_seed = 4,
top_rate = 0.2,
other_rate = 0.1,
min_data_per_group = 100,
max_cat_threshold = 32,
cat_l2 = 10.0,
cat_smooth = 10.0,
metric = ["l2"],
metric_freq = 1,
is_training_metric = false,
ndcg_at = Int[],
num_machines = 1,
local_listen_port = 12400,
time_out = 120,
machine_list_file = "",
device_type="cpu",
gpu_use_dp = false,
gpu_platform_id = -1,
gpu_device_id = -1,
num_gpu = 1,
force_col_wise = false
force_row_wise = false
])
Return a LGBMRegression estimator.
LightGBM.LGBM_BoosterUpdateOneIterCustom
— MethodLGBM_BoosterUpdateOneIterCustom Pass grads and 2nd derivatives corresponding to some custom loss function grads and 2nd derivatives must be same cardinality as training data * number of models Also, trying to run this on a booster without data will fail.
LightGBM.cv
— Methodcv(estimator, X, y, splits; [verbosity = 1])
Cross-validate the estimator
with features data X
and label y
. The iterable splits
provides vectors of indices for the training dataset. The remaining indices are used to create the validation dataset. Alternatively, cv can be called with an input Dataset class
Return a dictionary with an entry for the validation dataset and, if the parameter is_training_metric
is set in the estimator
, an entry for the training dataset. Each entry of the dictionary is another dictionary with an entry for each validation metric in the estimator
. Each of these entries is an array that holds the validation metric's value for each dataset, at the last valid iteration.
Arguments
estimator::LGBMEstimator
: the estimator to be fit.X::Matrix{TX<:Real}
: the features data.y::Vector{Ty<:Real}
: the labels.dataset::Dataset
: prepared dataset (either (X, y), or dataset needs to be specified as input)splits
: the iterable providing arrays of indices for the training dataset.verbosity::Integer
: keyword argument that controls LightGBM's verbosity.< 0
for fatal logs only,0
includes warning logs,1
includes info logs, and> 1
includes debug logs.
LightGBM.fit!
— Methodfit!(estimator, num_iterations, X, y[, test...]; [verbosity = 1, is_row_major = false])
fit!(estimator, X, y[, test...]; [verbosity = 1, is_row_major = false])
fit!(estimator, X, y, train_indices[, test_indices...]; [verbosity = 1, is_row_major = false])
fit!(estimator, train_dataset[, test_datasets...]; [verbosity = 1])
Fit the estimator
with features data X
and label y
using the X-y pairs in test
as validation sets. Alternatively, Fit the estimator
with train_dataset
and test_datasets
in the form of Dataset class(es)
Return a dictionary with an entry for each validation set. Each entry of the dictionary is another dictionary with an entry for each validation metric in the estimator
. Each of these entries is an array that holds the validation metric's value at each iteration.
Positional Arguments
estimator::LGBMEstimator
: the estimator to be fit.- and either
X::AbstractMatrix{TX<:Real}
: the features data. May be aSparseArrays.SparseMatrixCSC
y::Vector{Ty<:Real}
: the labels.test::Tuple{AbstractMatrix{TX},Vector{Ty}}...
: (optional) contains one or more tuples of X-y pairs of the same types asX
andy
that should be used as validation sets. May be aSparseArrays.SparseMatrixCSC
and can mix-and-match sparse/dense among these test and the train.
- or
train_dataset::Dataset
: prepared train_datasettest_datasets::Vector{Dataset}
: (optional) prepared test_datasets
Keyword Arguments
verbosity::Integer
: keyword argument that controls LightGBM's verbosity.< 0
for fatal logs only,0
includes warning logs,1
includes info logs, and> 1
includes debug logs.is_row_major::Bool
: keyword argument that indicates whether or notX
is row-major.true
indicates that it is row-major,false
indicates that it is column-major (Julia's default). Should be consistent across train/test. Does not apply toSparseArrays.SparseMatrixCSC
orDataset
constructors.weights::Vector{Tw<:Real}
: the training weights.init_score::Vector{Ti<:Real}
: the init scores.
LightGBM.gain_importance
— Methodgain_importance(estimator, num_iteration)
gain_importance(estimator)
Returns the importance of a fitted booster in terms of information gain across
all boostings, or up to `num_iteration` boostings
LightGBM.loadmodel!
— Methodloadmodel!(estimator, filename)
Load the fitted model filename
into estimator
. Note that this only loads the fitted model—not the parameters or data of the estimator whose model was saved as filename
.
Arguments
estimator::LGBMEstimator
: the estimator to use in the prediction.filename::String
: the name of the file that contains the model.
LightGBM.predict
— Methodpredict(estimator, X; [predict_type = 0, num_iterations = -1, verbosity = 1,
is_row_major = false])
Return a MATRIX with the labels that the estimator
predicts for features data X
. Use dropdims
if a vector is required.
Arguments
estimator::LGBMEstimator
: the estimator to use in the prediction.X::Matrix{T<:Real}
: the features data.predict_type::Integer
: keyword argument that controls the prediction type.0
for normal scores with transform (if needed),1
for raw scores,2
for leaf indices,3
for SHAP contributions.num_iterations::Integer
: keyword argument that sets the number of iterations of the model to use in the prediction.< 0
for all iterations.verbosity::Integer
: keyword argument that controls LightGBM's verbosity.< 0
for fatal logs only,0
includes warning logs,1
includes info logs, and> 1
includes debug logs.is_row_major::Bool
: keyword argument that indicates whether or notX
is row-major.true
indicates that it is row-major,false
indicates that it is column-major (Julia's default).
One can obtain some form of feature importances by averaging SHAP contributions across predictions, i.e. mean(LightGBM.predict(estimator, X; predict_type=3); dims=1)
LightGBM.savemodel
— Methodsavemodel(estimator, filename; [num_iteration = -1])
Save the fitted model in estimator
as filename
.
Arguments
estimator::LGBMEstimator
: the estimator to use in the prediction.filename::String
: the name of the file to save the model in.num_iteration::Integer
: keyword argument that sets the number of iterations of the model that should be saved.< 0
for all iterations.start_iteration
: : Start index of the iteration that should be saved.feature_importance_type
: Type of feature importance, can be CAPIFEATUREIMPORTANCESPLIT or CAPIFEATUREIMPORTANCEGAIN
LightGBM.search_cv
— Methodsearch_cv(estimator, X, y, splits, params; [verbosity = 1])
Exhaustive search over the specified sets of parameter values for the estimator
with features data X
and label y
. The iterable splits
provides vectors of indices for the training dataset. The remaining indices are used to create the validation dataset. Alternatively, search_cv can be called with an input Dataset class
Return an array with a tuple for each set of parameters value, where the first entry is a set of parameter values and the second entry the cross-validation outcome of those values. This outcome is a dictionary with an entry for the validation dataset and, if the parameter is_training_metric
is set in the estimator
, an entry for the training dataset. Each entry of the dictionary is another dictionary with an entry for each validation metric in the estimator
. Each of these entries is an array that holds the validation metric's value for each dataset, at the last valid iteration.
Arguments
estimator::LGBMEstimator
: the estimator to be fit.X::Matrix{TX<:Real}
: the features data.y::Vector{Ty<:Real}
: the labels.dataset::Dataset
: prepared dataset (either (X, y), or dataset needs to be specified as input)splits
: the iterable providing arrays of indices for the training dataset.params
: the iterable providing dictionaries of pairs of parameters (Symbols) and values to configure theestimator
with.verbosity::Integer
: keyword argument that controls LightGBM's verbosity.< 0
for fatal logs only,0
includes warning logs,1
includes info logs, and> 1
includes debug logs.
LightGBM.split_importance
— Methodsplit_importance(estimator, num_iteration)
split_importance(estimator)
Returns the importance of a fitted booster in terms of number of times feature was
used in a split across all boostings, or up to `num_iteration` boostings