, Available online , doi: 10.1109/JAS.2025.125231
Abstract:
Advances in data acquisition and accumulation on a massive scale are fueling “the curse of dimensionality” which may deteriorate the generalization performance of machine learning models. Such a dilemma gives birth to the technique of feature selection excelling in the presence of high-dimensional data. As a specific method based on rough set theory, Rough Feature Selection (RFS) has been widely concerned and fruitfully applied. In this survey, we provide a comprehensive review of RFS algorithms that have proliferated in recent years. Firstly, we briefly introduce some typical rough set models especially neighborhood rough set and fuzzy rough set, as well as representative rough feature evaluation criteria. We then systematically discuss several emerging topics of RFS including accelerated, ensemble, incremental, label ambiguous, weakly-supervised, and multi-granularity RFS. Additionally, we illuminate the regular performance validation scheme of RFS and conduct a number of experiments to present benchmarking results of state-of-the-art RFS algorithms. Finally, we summarize the pros and cons of existing research efforts and outline the open challenges and opportunities of class imbalance, multi-modal scenario, causality inference, and high-level representation for RFS. By providing in-depth knowledge of RFS, we anticipate this survey will: 1) serve as a guidebook for newcomers intending to delve into RFS and a stepping-stone for researchers and practitioners to solve domain-specific problems; 2) gain insights into the state-of-the-art published findings, triggering a series of breakthroughs in RFS; 3) underscore some challenges ahead of RFS, directing future efforts toward punctuating advances beyond questions currently pursued.
K. Liu, X. Yang, W. Ding, H. Ju, T. Li, J. Wang, and T. Yin, “A survey on rough feature selection: Recent advances and challenges,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2025.125231.