Functions
LightGBM.LGBMClassificationLightGBM.LGBMRegressionLightGBM.LGBM_BoosterUpdateOneIterCustomLightGBM.cvLightGBM.fit!LightGBM.gain_importanceLightGBM.loadmodel!LightGBM.predictLightGBM.savemodelLightGBM.search_cvLightGBM.split_importance
LightGBM.LGBMClassification — MethodLGBMClassification(;[
objective = "multiclass",
boosting = "gbdt",
num_iterations = 10,
learning_rate = .1,
num_leaves = 127,
max_depth = -1,
tree_learner = "serial",
num_threads = Sys.CPU_THREADS,
histogram_pool_size = -1.,
min_data_in_leaf = 100,
min_sum_hessian_in_leaf = 1e-3,
max_delta_step = 0.,
lambda_l1 = 0.,
lambda_l2 = 0.,
min_gain_to_split = 0.,
feature_fraction = 1.,
feature_fraction_bynode = 1.,
feature_fraction_seed = 2,
bagging_fraction = 1.,
pos_bagging_fraction = 1.,
neg_bagging_fraction = 1.,
bagging_freq = 0,
bagging_seed = 3,
early_stopping_round = 0,
extra_trees = false,
extra_seed = 6,
max_bin = 255,
bin_construct_sample_cnt = 200000,
data_random_seed = 1,
init_score = "",
is_sparse = true,
save_binary = false,
categorical_feature = Int[],
use_missing = true,
is_unbalance = false,
boost_from_average = true,
scale_pos_weight = 1.0,
sigmoid = 1.0,
drop_rate = 0.1,
max_drop = 50,
skip_drop = 0.5,
xgboost_dart_mode = false,
uniform_drop = false,
drop_seed = 4,
top_rate = 0.2,
other_rate = 0.1,
min_data_per_group = 100,
max_cat_threshold = 32,
cat_l2 = 10.0,
cat_smooth = 10.0,
metric = ["multi_logloss"],
metric_freq = 1,
is_training_metric = false,
ndcg_at = Int[],
num_machines = 1,
local_listen_port = 12400,
time_out = 120,
machine_list_file = "",
num_class = 1,
device_type="cpu",
gpu_use_dp = false,
gpu_platform_id = -1,
gpu_device_id = -1,
num_gpu = 1,
force_col_wise = false,
force_row_wise = false,
])Return a LGBMClassification estimator.
LightGBM.LGBMRegression — MethodLGBMRegression(; [
objective = "regression",
boosting = "gbdt",
num_iterations = 10,
learning_rate = .1,
num_leaves = 127,
max_depth = -1,
tree_learner = "serial",
num_threads = Sys.CPU_THREADS,
histogram_pool_size = -1.,
min_data_in_leaf = 100,
min_sum_hessian_in_leaf = 1e-3,
max_delta_step = 0.,
lambda_l1 = 0.,
lambda_l2 = 0.,
min_gain_to_split = 0.,
feature_fraction = 1.,
feature_fraction_bynode = 1.,
feature_fraction_seed = 2,
bagging_fraction = 1.,
bagging_freq = 0,
bagging_seed = 3,
early_stopping_round = 0,
extra_trees = false
extra_seed = 6,
max_bin = 255,
bin_construct_sample_cnt = 200000,
data_random_seed = 1,
init_score = "",
is_sparse = true,
save_binary = false,
categorical_feature = Int[],
use_missing = true,
feature_pre_filter = true,
is_unbalance = false,
boost_from_average = true,
alpha = 0.9,
drop_rate = 0.1,
max_drop = 50,
skip_drop = 0.5,
xgboost_dart_mode = false,
uniform_drop = false,
drop_seed = 4,
top_rate = 0.2,
other_rate = 0.1,
min_data_per_group = 100,
max_cat_threshold = 32,
cat_l2 = 10.0,
cat_smooth = 10.0,
metric = ["l2"],
metric_freq = 1,
is_training_metric = false,
ndcg_at = Int[],
num_machines = 1,
local_listen_port = 12400,
time_out = 120,
machine_list_file = "",
device_type="cpu",
gpu_use_dp = false,
gpu_platform_id = -1,
gpu_device_id = -1,
num_gpu = 1,
force_col_wise = false
force_row_wise = false
])Return a LGBMRegression estimator.
LightGBM.LGBM_BoosterUpdateOneIterCustom — MethodLGBM_BoosterUpdateOneIterCustom Pass grads and 2nd derivatives corresponding to some custom loss function grads and 2nd derivatives must be same cardinality as training data * number of models Also, trying to run this on a booster without data will fail.
LightGBM.cv — Methodcv(estimator, X, y, splits; [verbosity = 1])Cross-validate the estimator with features data X and label y. The iterable splits provides vectors of indices for the training dataset. The remaining indices are used to create the validation dataset. Alternatively, cv can be called with an input Dataset class
Return a dictionary with an entry for the validation dataset and, if the parameter is_training_metric is set in the estimator, an entry for the training dataset. Each entry of the dictionary is another dictionary with an entry for each validation metric in the estimator. Each of these entries is an array that holds the validation metric's value for each dataset, at the last valid iteration.
Arguments
estimator::LGBMEstimator: the estimator to be fit.X::Matrix{TX<:Real}: the features data.y::Vector{Ty<:Real}: the labels.dataset::Dataset: prepared dataset (either (X, y), or dataset needs to be specified as input)splits: the iterable providing arrays of indices for the training dataset.verbosity::Integer: keyword argument that controls LightGBM's verbosity.< 0for fatal logs only,0includes warning logs,1includes info logs, and> 1includes debug logs.
LightGBM.fit! — Methodfit!(estimator, num_iterations, X, y[, test...]; [verbosity = 1, is_row_major = false])
fit!(estimator, X, y[, test...]; [verbosity = 1, is_row_major = false])
fit!(estimator, X, y, train_indices[, test_indices...]; [verbosity = 1, is_row_major = false])
fit!(estimator, train_dataset[, test_datasets...]; [verbosity = 1])Fit the estimator with features data X and label y using the X-y pairs in test as validation sets. Alternatively, Fit the estimator with train_dataset and test_datasets in the form of Dataset class(es)
Return a dictionary with an entry for each validation set. Each entry of the dictionary is another dictionary with an entry for each validation metric in the estimator. Each of these entries is an array that holds the validation metric's value at each iteration.
Positional Arguments
estimator::LGBMEstimator: the estimator to be fit.- and either
X::AbstractMatrix{TX<:Real}: the features data. May be aSparseArrays.SparseMatrixCSCy::Vector{Ty<:Real}: the labels.test::Tuple{AbstractMatrix{TX},Vector{Ty}}...: (optional) contains one or more tuples of X-y pairs of the same types asXandythat should be used as validation sets. May be aSparseArrays.SparseMatrixCSCand can mix-and-match sparse/dense among these test and the train.
- or
train_dataset::Dataset: prepared train_datasettest_datasets::Vector{Dataset}: (optional) prepared test_datasets
Keyword Arguments
verbosity::Integer: keyword argument that controls LightGBM's verbosity.< 0for fatal logs only,0includes warning logs,1includes info logs, and> 1includes debug logs.is_row_major::Bool: keyword argument that indicates whether or notXis row-major.trueindicates that it is row-major,falseindicates that it is column-major (Julia's default). Should be consistent across train/test. Does not apply toSparseArrays.SparseMatrixCSCorDatasetconstructors.weights::Vector{Tw<:Real}: the training weights.init_score::Vector{Ti<:Real}: the init scores.
LightGBM.gain_importance — Methodgain_importance(estimator, num_iteration)
gain_importance(estimator)
Returns the importance of a fitted booster in terms of information gain across
all boostings, or up to `num_iteration` boostingsLightGBM.loadmodel! — Methodloadmodel!(estimator, filename)Load the fitted model filename into estimator. Note that this only loads the fitted model—not the parameters or data of the estimator whose model was saved as filename.
Arguments
estimator::LGBMEstimator: the estimator to use in the prediction.filename::String: the name of the file that contains the model.
LightGBM.predict — Methodpredict(estimator, X; [predict_type = 0, num_iterations = -1, verbosity = 1,
is_row_major = false])Return a MATRIX with the labels that the estimator predicts for features data X. Use dropdims if a vector is required.
Arguments
estimator::LGBMEstimator: the estimator to use in the prediction.X::Matrix{T<:Real}: the features data.predict_type::Integer: keyword argument that controls the prediction type.0for normal scores with transform (if needed),1for raw scores,2for leaf indices,3for SHAP contributions.num_iterations::Integer: keyword argument that sets the number of iterations of the model to use in the prediction.< 0for all iterations.verbosity::Integer: keyword argument that controls LightGBM's verbosity.< 0for fatal logs only,0includes warning logs,1includes info logs, and> 1includes debug logs.is_row_major::Bool: keyword argument that indicates whether or notXis row-major.trueindicates that it is row-major,falseindicates that it is column-major (Julia's default).
One can obtain some form of feature importances by averaging SHAP contributions across predictions, i.e. mean(LightGBM.predict(estimator, X; predict_type=3); dims=1)
LightGBM.savemodel — Methodsavemodel(estimator, filename; [num_iteration = -1])Save the fitted model in estimator as filename.
Arguments
estimator::LGBMEstimator: the estimator to use in the prediction.filename::String: the name of the file to save the model in.num_iteration::Integer: keyword argument that sets the number of iterations of the model that should be saved.< 0for all iterations.start_iteration: : Start index of the iteration that should be saved.feature_importance_type: Type of feature importance, can be CAPIFEATUREIMPORTANCESPLIT or CAPIFEATUREIMPORTANCEGAIN
LightGBM.search_cv — Methodsearch_cv(estimator, X, y, splits, params; [verbosity = 1])Exhaustive search over the specified sets of parameter values for the estimator with features data X and label y. The iterable splits provides vectors of indices for the training dataset. The remaining indices are used to create the validation dataset. Alternatively, search_cv can be called with an input Dataset class
Return an array with a tuple for each set of parameters value, where the first entry is a set of parameter values and the second entry the cross-validation outcome of those values. This outcome is a dictionary with an entry for the validation dataset and, if the parameter is_training_metric is set in the estimator, an entry for the training dataset. Each entry of the dictionary is another dictionary with an entry for each validation metric in the estimator. Each of these entries is an array that holds the validation metric's value for each dataset, at the last valid iteration.
Arguments
estimator::LGBMEstimator: the estimator to be fit.X::Matrix{TX<:Real}: the features data.y::Vector{Ty<:Real}: the labels.dataset::Dataset: prepared dataset (either (X, y), or dataset needs to be specified as input)splits: the iterable providing arrays of indices for the training dataset.params: the iterable providing dictionaries of pairs of parameters (Symbols) and values to configure theestimatorwith.verbosity::Integer: keyword argument that controls LightGBM's verbosity.< 0for fatal logs only,0includes warning logs,1includes info logs, and> 1includes debug logs.
LightGBM.split_importance — Methodsplit_importance(estimator, num_iteration)
split_importance(estimator)
Returns the importance of a fitted booster in terms of number of times feature was
used in a split across all boostings, or up to `num_iteration` boostings