Functions

LightGBM.LGBMClassificationMethod
LGBMClassification(;[
    objective = "multiclass",
    boosting = "gbdt",
    num_iterations = 10,
    learning_rate = .1,
    num_leaves = 127,
    max_depth = -1,
    tree_learner = "serial",
    num_threads = Sys.CPU_THREADS,
    histogram_pool_size = -1.,
    min_data_in_leaf = 100,
    min_sum_hessian_in_leaf = 1e-3,
    max_delta_step = 0.,
    lambda_l1 = 0.,
    lambda_l2 = 0.,
    min_gain_to_split = 0.,
    feature_fraction = 1.,
    feature_fraction_bynode = 1.,
    feature_fraction_seed = 2,
    bagging_fraction = 1.,
    pos_bagging_fraction = 1.,
    neg_bagging_fraction = 1.,
    bagging_freq = 0,
    bagging_seed = 3,
    early_stopping_round = 0,
    extra_trees = false,
    extra_seed = 6,
    max_bin = 255,
    bin_construct_sample_cnt = 200000,
    data_random_seed = 1,
    init_score = "",
    is_sparse = true,
    save_binary = false,
    categorical_feature = Int[],
    use_missing = true,
    is_unbalance = false,
    boost_from_average = true,
    scale_pos_weight = 1.0,
    sigmoid = 1.0,
    drop_rate = 0.1,
    max_drop = 50,
    skip_drop = 0.5,
    xgboost_dart_mode = false,
    uniform_drop = false,
    drop_seed = 4,
    top_rate = 0.2,
    other_rate = 0.1,
    min_data_per_group = 100,
    max_cat_threshold = 32,
    cat_l2 = 10.0,
    cat_smooth = 10.0,
    metric = ["multi_logloss"],
    metric_freq = 1,
    is_training_metric = false,
    ndcg_at = Int[],
    num_machines = 1,
    local_listen_port = 12400,
    time_out = 120,
    machine_list_file = "",
    num_class = 1,
    device_type="cpu",
    gpu_use_dp = false,
    gpu_platform_id = -1,
    gpu_device_id = -1,
    num_gpu = 1,
    force_col_wise = false,
    force_row_wise = false,
])

Return a LGBMClassification estimator.

source
LightGBM.LGBMRegressionMethod
LGBMRegression(; [
    objective = "regression",
    boosting = "gbdt",
    num_iterations = 10,
    learning_rate = .1,
    num_leaves = 127,
    max_depth = -1,
    tree_learner = "serial",
    num_threads = Sys.CPU_THREADS,
    histogram_pool_size = -1.,
    min_data_in_leaf = 100,
    min_sum_hessian_in_leaf = 1e-3,
    max_delta_step = 0.,
    lambda_l1 = 0.,
    lambda_l2 = 0.,
    min_gain_to_split = 0.,
    feature_fraction = 1.,
    feature_fraction_bynode = 1.,
    feature_fraction_seed = 2,
    bagging_fraction = 1.,
    bagging_freq = 0,
    bagging_seed = 3,
    early_stopping_round = 0,
    extra_trees = false
    extra_seed = 6,
    max_bin = 255,
    bin_construct_sample_cnt = 200000,
    data_random_seed = 1,
    init_score = "",
    is_sparse = true,
    save_binary = false,
    categorical_feature = Int[],
    use_missing = true,
    feature_pre_filter = true,
    is_unbalance = false,
    boost_from_average = true,
    alpha = 0.9,
    drop_rate = 0.1,
    max_drop = 50,
    skip_drop = 0.5,
    xgboost_dart_mode = false,
    uniform_drop = false,
    drop_seed = 4,
    top_rate = 0.2,
    other_rate = 0.1,
    min_data_per_group = 100,
    max_cat_threshold = 32,
    cat_l2 = 10.0,
    cat_smooth = 10.0,
    metric = ["l2"],
    metric_freq = 1,
    is_training_metric = false,
    ndcg_at = Int[],
    num_machines = 1,
    local_listen_port = 12400,
    time_out = 120,
    machine_list_file = "",
    device_type="cpu",
    gpu_use_dp = false,
    gpu_platform_id = -1,
    gpu_device_id = -1,
    num_gpu = 1,
    force_col_wise = false
    force_row_wise = false
])

Return a LGBMRegression estimator.

source
LightGBM.LGBM_BoosterUpdateOneIterCustomMethod

LGBM_BoosterUpdateOneIterCustom Pass grads and 2nd derivatives corresponding to some custom loss function grads and 2nd derivatives must be same cardinality as training data * number of models Also, trying to run this on a booster without data will fail.

source
LightGBM.cvMethod
cv(estimator, X, y, splits; [verbosity = 1])

Cross-validate the estimator with features data X and label y. The iterable splits provides vectors of indices for the training dataset. The remaining indices are used to create the validation dataset. Alternatively, cv can be called with an input Dataset class

Return a dictionary with an entry for the validation dataset and, if the parameter is_training_metric is set in the estimator, an entry for the training dataset. Each entry of the dictionary is another dictionary with an entry for each validation metric in the estimator. Each of these entries is an array that holds the validation metric's value for each dataset, at the last valid iteration.

Arguments

  • estimator::LGBMEstimator: the estimator to be fit.
  • X::Matrix{TX<:Real}: the features data.
  • y::Vector{Ty<:Real}: the labels.
  • dataset::Dataset: prepared dataset (either (X, y), or dataset needs to be specified as input)
  • splits: the iterable providing arrays of indices for the training dataset.
  • verbosity::Integer: keyword argument that controls LightGBM's verbosity. < 0 for fatal logs only, 0 includes warning logs, 1 includes info logs, and > 1 includes debug logs.
source
LightGBM.fit!Method
fit!(estimator, num_iterations, X, y[, test...]; [verbosity = 1, is_row_major = false])
fit!(estimator, X, y[, test...]; [verbosity = 1, is_row_major = false])
fit!(estimator, X, y, train_indices[, test_indices...]; [verbosity = 1, is_row_major = false])
fit!(estimator, train_dataset[, test_datasets...]; [verbosity = 1])

Fit the estimator with features data X and label y using the X-y pairs in test as validation sets. Alternatively, Fit the estimator with train_dataset and test_datasets in the form of Dataset class(es)

Return a dictionary with an entry for each validation set. Each entry of the dictionary is another dictionary with an entry for each validation metric in the estimator. Each of these entries is an array that holds the validation metric's value at each iteration.

Positional Arguments

  • estimator::LGBMEstimator: the estimator to be fit.
  • and either
    • X::AbstractMatrix{TX<:Real}: the features data. May be a SparseArrays.SparseMatrixCSC
    • y::Vector{Ty<:Real}: the labels.
    • test::Tuple{AbstractMatrix{TX},Vector{Ty}}...: (optional) contains one or more tuples of X-y pairs of the same types as X and y that should be used as validation sets. May be a SparseArrays.SparseMatrixCSC and can mix-and-match sparse/dense among these test and the train.
  • or
    • train_dataset::Dataset: prepared train_dataset
    • test_datasets::Vector{Dataset}: (optional) prepared test_datasets

Keyword Arguments

  • verbosity::Integer: keyword argument that controls LightGBM's verbosity. < 0 for fatal logs only, 0 includes warning logs, 1 includes info logs, and > 1 includes debug logs.
  • is_row_major::Bool: keyword argument that indicates whether or not X is row-major. true indicates that it is row-major, false indicates that it is column-major (Julia's default). Should be consistent across train/test. Does not apply to SparseArrays.SparseMatrixCSC or Dataset constructors.
  • weights::Vector{Tw<:Real}: the training weights.
  • init_score::Vector{Ti<:Real}: the init scores.
source
LightGBM.gain_importanceMethod
gain_importance(estimator, num_iteration)
gain_importance(estimator)

Returns the importance of a fitted booster in terms of information gain across
all boostings, or up to `num_iteration` boostings
source
LightGBM.loadmodel!Method
loadmodel!(estimator, filename)

Load the fitted model filename into estimator. Note that this only loads the fitted model—not the parameters or data of the estimator whose model was saved as filename.

Arguments

  • estimator::LGBMEstimator: the estimator to use in the prediction.
  • filename::String: the name of the file that contains the model.
source
LightGBM.predictMethod
predict(estimator, X; [predict_type = 0, num_iterations = -1, verbosity = 1,
is_row_major = false])

Return a MATRIX with the labels that the estimator predicts for features data X. Use dropdims if a vector is required.

Arguments

  • estimator::LGBMEstimator: the estimator to use in the prediction.
  • X::Matrix{T<:Real}: the features data.
  • predict_type::Integer: keyword argument that controls the prediction type. 0 for normal scores with transform (if needed), 1 for raw scores, 2 for leaf indices, 3 for SHAP contributions.
  • num_iterations::Integer: keyword argument that sets the number of iterations of the model to use in the prediction. < 0 for all iterations.
  • verbosity::Integer: keyword argument that controls LightGBM's verbosity. < 0 for fatal logs only, 0 includes warning logs, 1 includes info logs, and > 1 includes debug logs.
  • is_row_major::Bool: keyword argument that indicates whether or not X is row-major. true indicates that it is row-major, false indicates that it is column-major (Julia's default).

One can obtain some form of feature importances by averaging SHAP contributions across predictions, i.e. mean(LightGBM.predict(estimator, X; predict_type=3); dims=1)

source
LightGBM.savemodelMethod
savemodel(estimator, filename; [num_iteration = -1])

Save the fitted model in estimator as filename.

Arguments

  • estimator::LGBMEstimator: the estimator to use in the prediction.
  • filename::String: the name of the file to save the model in.
  • num_iteration::Integer: keyword argument that sets the number of iterations of the model that should be saved. < 0 for all iterations.
  • start_iteration : : Start index of the iteration that should be saved.
  • feature_importance_type : Type of feature importance, can be CAPIFEATUREIMPORTANCESPLIT or CAPIFEATUREIMPORTANCEGAIN
source
LightGBM.search_cvMethod
search_cv(estimator, X, y, splits, params; [verbosity = 1])

Exhaustive search over the specified sets of parameter values for the estimator with features data X and label y. The iterable splits provides vectors of indices for the training dataset. The remaining indices are used to create the validation dataset. Alternatively, search_cv can be called with an input Dataset class

Return an array with a tuple for each set of parameters value, where the first entry is a set of parameter values and the second entry the cross-validation outcome of those values. This outcome is a dictionary with an entry for the validation dataset and, if the parameter is_training_metric is set in the estimator, an entry for the training dataset. Each entry of the dictionary is another dictionary with an entry for each validation metric in the estimator. Each of these entries is an array that holds the validation metric's value for each dataset, at the last valid iteration.

Arguments

  • estimator::LGBMEstimator: the estimator to be fit.
  • X::Matrix{TX<:Real}: the features data.
  • y::Vector{Ty<:Real}: the labels.
  • dataset::Dataset: prepared dataset (either (X, y), or dataset needs to be specified as input)
  • splits: the iterable providing arrays of indices for the training dataset.
  • params: the iterable providing dictionaries of pairs of parameters (Symbols) and values to configure the estimator with.
  • verbosity::Integer: keyword argument that controls LightGBM's verbosity. < 0 for fatal logs only, 0 includes warning logs, 1 includes info logs, and > 1 includes debug logs.
source
LightGBM.split_importanceMethod
split_importance(estimator, num_iteration)
split_importance(estimator)

Returns the importance of a fitted booster in terms of number of times feature was
used in a split across all boostings, or up to `num_iteration` boostings
source