What is LightGBM?

LightGBM is an open-source, distributed, high-performance gradient boosting (GBDT, GBRT, GBM, or MART) framework. This framework specializes in creating high-quality and GPU-enabled decision tree algorithms for ranking, classification, and many other machine learning tasks. LightGBM is part of Microsoft's DMTK project.

Advantages of LightGBM

  • Composability: LightGBM models can be incorporated into existing SparkML pipelines and used for batch, streaming, and serving workloads.
  • Performance: LightGBM on Spark is 10-30% faster than SparkML on the Higgs dataset and achieves a 15% increase in AUC. Parallel experiments have verified that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.
  • Functionality: LightGBM offers a wide array of tunable parameters, that one can use to customize their decision tree system. LightGBM on Spark also supports new types of problems such as quantile regression.
  • Cross platform: LightGBM on Spark is available on Spark, PySpark, and SparklyR.

LightGBM Usage

  • LightGBMClassifier: used for building classification models. For example, to predict whether a company bankrupts or not, we could build a binary classification model with LightGBMClassifier.
  • LightGBMRegressor: used for building regression models. For example, to predict housing price, we could build a regression model with LightGBMRegressor.
  • LightGBMRanker: used for building ranking models. For example, to predict the relevance of website search results, we could build a ranking model with LightGBMRanker.