A PHP Error was encountered

Severity: Warning

Message: Undefined array key "articleIDUniqueCode"

Filename: frontend/article.php

Line Number: 189

A PHP Error was encountered

Severity: 8192

Message: substr(): Passing null to parameter #1 ($string) of type string is deprecated

Filename: frontend/article.php

Line Number: 189

">

A PHP Error was encountered

Severity: Warning

Message: Undefined array key "articleIDUniqueCode"

Filename: frontend/article.php

Line Number: 190

A PHP Error was encountered

Severity: 8192

Message: substr(): Passing null to parameter #1 ($string) of type string is deprecated

Filename: frontend/article.php

Line Number: 190

">

A PHP Error was encountered

Severity: Warning

Message: Undefined array key "articleIDUniqueCode"

Filename: frontend/article.php

Line Number: 213

A PHP Error was encountered

Severity: 8192

Message: substr(): Passing null to parameter #1 ($string) of type string is deprecated

Filename: frontend/article.php

Line Number: 213

">

A PHP Error was encountered

Severity: Warning

Message: Undefined array key "articleIDUniqueCode"

Filename: frontend/article.php

Line Number: 214

A PHP Error was encountered

Severity: 8192

Message: substr(): Passing null to parameter #1 ($string) of type string is deprecated

Filename: frontend/article.php

Line Number: 214

">

View Article

  • Explainable Chronic Kidney Disease Prediction Using LightGBM with Shap and Fuzzy Rule-Based System

  • Department of Electronics Communication and Engineering, Sri Venkateshwara University College of Engineering, Tirupati

Abstract

Chronic Kidney Disease (CKD) is a progressive condition, which in most cases can go through without realisation until the late stages, so timely identification and clarification of the condition is important to intervene. This paper presents a machine learning explainable system to predict CKD stage with Light Gradient Boosting Machine (LightGBM), Synthetic Minority Oversampling Technique (SMOTE), explainability with the help of SHAP, and a fuzzy rule-based reasoning system. The data have biochemical, clinical, lifestyle, and urinalysis characteristics related to the severity of CKD. The pipeline of systematic preprocessing was built to address the problem of missing data, coding of nominal data, the normalisation of numerical data, and the large class imbalance with the use of SMOTE. The reason why LightGBM was chosen is that it is efficient and able to capture non-linear and complicated relationships across clinical data. Probability calibration was done using Platt scaling to enhance clinical reliability. SHAP was included to offer global and local interpretability, which would guarantee transparency behind every prediction. A fuzzy reasoning layer, A model that converted model outputs to intuitive linguistic rules, was used to improve clinical understanding. The results of the experiments demonstrate a weighted F1-score of 0.87-0.92, which is an indicator of a high predictive ability. SHAP analysis identified such important biomarkers as GFR, creatinine, and BUN. A graphical user interface was created to provide real-time predictions, SHAP visualisation, and recommendations with personalisation. The hybrid framework shows that the decision-support tool used to assist in making decisions about CKD staging is clinically viable, transparent, and accurate.

Keywords

CKD prediction, LightGBM, SHAP, Fuzzy logic, Explainable AI, SMOTE, Clinical decision support

Introduction

× Popup Image

Chronic Kidney Disease (CKD) is one of the greatest global health challenges that afflicts over 850 million people worldwide [1]. Clinically, CKD can be characterised by the consistent decrease of kidney ability to filter, often expressed in estimated glomerular filtration rate (eGFR), serum creatinine and proteinuria [1, 2]. Due to CKD being a silent disease in most cases, most patients develop the condition to a complicated stage before they are well attended to, exposing them to cardiovascular issues, hospitalisation, and even death [3]. An increase in cases of diabetes, high blood pressure, and other lifestyle-related illnesses has played a major role in CKD in developing nations [3]. The conventional diagnosis is based on manual analysis of biochemical pointers. This is, however, complicated with complex datasets that have multivariate relationships that cannot easily be established by human evaluation. Machine learning (ML) models have proven to have significant potential in CKD prediction because they are able to process complex clinical data and identify concealed patterns [4, 5]. Random Forest (RF), Logistic Regression (LR), Support Vector Machines (SVM), as well as boosting models, including XGBoost, have demonstrated encouraging performance [4, 6]. Although this has been achieved, the majority of research has been done on binary classification, between CKD and non-CKD, and this constrains clinical relevance to treatment planning since CKD progression is very severe based on the stage of advancement [6, 13]. Another important challenge is interpretability. ML models, especially those based on ensemble and boosting, are considered black boxes (their inner workings are hard to understand) in some way. This limits its implementation in health systems demanding transparent, auditable and clinically interpretable decisions in order to guarantee patient safety and trust [9, 14]. SHAP (SHapley Additive exPlanations) is the solution to this problem by offering mathematically consistent contributions of each feature to the final prediction [9]. Nevertheless, numerical SHAP values can also not be intuitively understood by clinicians. Fuzzy logic, which is based on the human reasoning style, is a natural solution when the model decisions are translated into the form of readable rules [10, 12]. The literature on the prediction of CKD has several research gaps. To begin with, stage-wise classification of CKD is still scanty, with the majority of the studies conducting binary classification as their approaches [13, 16]. Second, the terrible class imbalance in real-world CKD datasets, especially at low stages, can be observed, and the issue is not properly covered in many studies [8]. Third, many ML models employed to predict CKD do not have or lack adequate explainability [14]. Fourth, hybrid systems are uncommon that include hybridisation of ML, SHAP, and fuzzy reasoning. Finally, tools including GUI-based decision-support systems are deployable and are not available, which restricts the application in clinical and screening settings [13]. The proposed study is a bridge between these gaps because it presents a complete explainable model of CKD stage prediction with LightGBM and SMOTE, as well as SHAP and fuzzy reasoning, and a graphical interface. The main findings of the research are the following:

  • The creation of a holistic CKD stage prediction tool that can predict all 6 stages (0-5).
  • Successful management of class imbalance by optimising the performance of the minority classes with the help of SMOTE.
  • SHAP international and local explainability implementation.
  • Mechanism of clinical interpretability: The integration of fuzzy rules.
  • Creation of a GUI to predict in real-time, visualise and make personalised suggestions.

LITERATURE REVIEW

Machine learning has experienced a significant amount of CKD prediction research, with the first models of this type examining the CKD presence by classifying it through the use of random forest (RF), Support Vector Machines (SVM), and Logistic Regression (LR) models, using structured clinical data [4, 5]. RF was effective in predicting because it was robust to noise and had the capacity to predict nonlinear relationships, whereas SVM was effective at dealing with high-dimensional medical data. XGBoost was subsequently enhanced with gradient boosting to achieve better precision, but remained poor at interpretation due to its complicated internal characterisation [6]. One of the most significant weaknesses witnessed in CKD datasets is an extreme imbalance of classes, in which most are of early-stage, and few are of advanced CKD (4-5) stages. Such an imbalance may cause biased model training, which will cause impoverished generalisation on minority classes. Synthetic Minority Oversampling Technique (SMOTE) has already been shown to be efficient in addressing such imbalance by creating synthetic samples in clusters of minority classes, thus enhancing the bias and recall of classifiers [8]. Research that has included SMOTE has had a continued increase in F1-score and sensitivity on underrepresented CKD groups. Machine learning must be adopted in healthcare because it has to be interpretable. One of the most mathematically sound schemes to explain model choices is SHAP, which is an algorithm introduced by Lundberg and Lee that computes the marginal contribution of each feature to the output [9]. SHAP has demonstrated itself to be a promising predictive model in areas of diabetes, cardiovascular disease, and oncology, yet it is hardly used to explain CKD staging. Arvind et al. noted that SHAP could be appropriate in the clinical setting, particularly because it could provide explanations that corresponded to physician reasoning and regulatory sustainability [14]. In addition to numerical interpretability, fuzzy logic provides the ability to think in a human manner with the use of linguistic representations like low GFR, moderately high creatinine, or high BUN. Fuzzy logic, originally introduced by Zadeh [10] and extended by Kosko [11], is highly used in clinical diagnostic systems because it is more flexible in uncertainty management and its interpretation ability. Son et al. proved that the use of fuzzy rule-based reasoning has been found to increase both clinician trust and enhance the usability of the decision-support system [12]. The latest systematic reviews of CKD prediction models highlight various gaps in the current literature that remain unaddressed [13, 16]. These gaps are a scarcity of research on stage-by-stage classification, inadequate work with skewed datasets, little incorporation of explainability methods like SHAP, and the absence of solutions linking ML with fuzzy and user interfaces. Moreover, most of the models are at the stage of academic research and do not become real-world clinical solutions because of the lack of deployable GUI-based solutions [13]. Resting on the above observations, it is evident that there is a need to have a holistic CKD prediction framework that:

  • carries out prediction on a stage-by-stage basis,
  • manages the issue of class imbalance,
  • explains openly,
  • is a fuzzy system that incorporates fuzzy reasoning to achieve clinical interpretability, and
  • provides a GUI for real-time decision support.

All these research gaps are discussed in the current study, which is why it can be regarded as an important contribution to the CKD prediction literature.

METHODOLOGY

The proposed system of predicting CKD incorporates the preprocessing of data, balancing of classes, machine learning classification, probability calibration, explainability using SHAP, reasoning rules (fuzzy), and deployment into the GUI. The pipeline is multistage and therefore has high predictive accuracy, transparency and clinical usability.

Reference

  1. Kidney Disease: Improving Global Outcomes (KDIGO). KDIGO 2012 Clinical Practice Guideline for the Evaluation and Management of Chronic Kidney Disease. Kidney International Supplements, 2013.
  2. Levey, A.S., et al. Definition and classification of chronic kidney disease: A position statement from Kidney Disease: Improving Global Outcomes (KDIGO). Kidney International, 67, pp. 2089–2100, 2005.
  3. Jha, V., Garcia-Garcia, G., Iseki, K., Li, Z., et al. Chronic kidney disease: Global dimension and perspectives. The Lancet, 382(9888), pp. 260–272, 2013.
  4. Kshirsagar, N.T., et al. Machine learning models for chronic kidney disease prediction: A comparative study. IEEE Access, 9, pp. 12338–12348, 2021.
  5. Breiman, L. Random forests. Machine Learning, 45(1), pp. 5–32, 2001.
  6. Chen, T., Guestrin, C. XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD Conference, pp. 785–794, 2016.
  7. Ke, G., et al. LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, pp. 3146–3154, 2017.
  8. Chawla, N.V., et al. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, pp. 321–357, 2002.
  9. Lundberg, S.M., Lee, S. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, pp. 4765–4774, 2017.
  10. Zadeh, L.A. Fuzzy sets. Information and Control, 8(3), pp. 338–353, 1965.
  11. Kosko, B. Fuzzy Engineering. Prentice Hall, New Jersey, 1997.
  12. Son, H., Seo, J., Kim, C. Data-driven fuzzy rule-based system for clinical decision-making. Expert Systems with Applications, 42(1), pp. 574–586, 2015.
  13. Gunarathne, S., Meegahapola, H., Wicramasinghe, A. Chronic kidney disease prediction using machine learning techniques. Procedia Computer Science, 232, pp. 802–811, 2024.
  14. Arvind, R., et al. Explainable AI models for healthcare: A review of SHAP and LIME. IEEE Reviews in Biomedical Engineering, 16, pp. 1–16, 2023.
  15. Razzak, M.I., Naz, S., Zaib, A. Deep learning for medical image processing. Neurocomputing, 300, pp. 48–64, 2018.
  16. Kuo, J.D., et al. Predicting CKD progression using machine learning – A systematic review. BMC Nephrology, 22(319), pp. 1–16, 2021.

Photo
Govardan Sai Palla
Corresponding author

Department of Electronics Communication and Engineering, Sri Venkateshwara University College of Engineering, Tirupati

Photo
Dr. I. Kullayamma
Co-author

Department of Electronics Communication and Engineering, Sri Venkateshwara University College of Engineering, Tirupati

Govardan Sai Palla*, Dr. I. Kullayamma, Explainable Chronic Kidney Disease Prediction Using LightGBM with Shap and Fuzzy Rule-Based System, Int. J. Sci. R. Tech., 2025, 2 (12), 174-184. https://doi.org/10.5281/zenodo.17918804

More related articles
A Brief Review on Anemia...
Darshan Wagh , Sanket Walekar , Tejas Narawade, Tejaswini Gurud ,...
Comprehensive Review of Polycystic Kidney Disease ...
Poonam Yadav, Granthali Shape, Sushil Patil...
Related Articles
Quality Of Life, Its Predictive Factors And Lived Experiences Of Patients On Hae...
Terese Kochuvilayil SIC, Jaisy Thomas S. H., Ans K. Baby, Lintu Maria Thomas (Anu Mary SH)...
Therapeutic Potential of Celosia Argentea In the Management of Urolithiasis: A C...
Swati Kawade, Sagar Daitkar , Anushka Sutar , Shivshankar Nagrik...
More related articles
A Brief Review on Anemia...
Darshan Wagh , Sanket Walekar , Tejas Narawade, Tejaswini Gurud , Swapnil wadkar , Abhishek bhosale ...
A Brief Review on Anemia...
Darshan Wagh , Sanket Walekar , Tejas Narawade, Tejaswini Gurud , Swapnil wadkar , Abhishek bhosale ...