Gene Selection for Cancer Classification Based on XGBoost

Md Raihanul Islam Tomal (1), Teo Voon Chuan (2), Kohbalan Moorthy (3), Chan Weng Howe (4)
(1) Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Pahang, Malaysia
(2) Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Pahang, Malaysia
(3) Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Pahang, Malaysia
(4) Faculty of Computing, Universiti Teknologi Malaysia, Skudai, Johor, Malaysia
How to cite (COMIEN) :
Tomal, M. R. I., Chuan, T. V., Moorthy, K., & Howe, C. W. (2025). Gene Selection for Cancer Classification Based on XGBoost. International Journal on Computational Engineering, 2(2). https://doi.org/10.62527/comien.2.2.52

Cancer remains a leading cause of death worldwide; World Health Organization (WHO) referees there have been nearly 10 million cancer-related deaths in recent years, with breast cancer affecting over 2.1 million women annually on a global scale, posing significant challenges for early detection and diagnosis. Gene selection, using DNA microarray data, is crucial for reducing the presence of less informative genes and ensuring the selection of genes relevant to disease diagnosis. Cancer classification involves identifying the type of cancer and determining the extent of tumor growth and spread. This research focuses on improving gene selection for cancer classification using the XGBoost classifier, an efficient open-source implementation of the gradient boosted trees algorithm. The primary goal is to enhance the performance of gene selection, enabling timely and appropriate treatments for cancer patients, as early detection is vital for ensuring a full recovery. Additionally, this research aims to reduce the time and expense associated with gene selection for cancer classification while increasing classification accuracy. The proposed method achieved an accuracy of approximately 93%, with precision, recall, and F1-score values of 93%, 87%, and 90%, respectively. The study highlights the potential of the XGBoost classifier in optimizing gene selection and improving diagnostic processes. Future work will focus on further enhancing the accuracy of gene selection for cancer classification and reducing the number of irrelevant genes before proceeding to subsequent processes. This approach holds promise for streamlining the diagnostic process, improving patient outcomes, and offering significant benefits in the timely treatment of cancer.