Implementing a Scikit-Learn Package for the eLCS RBML Algorithm


Engineering and Applied Sciences


Assistant Professor of Informatics in Biostatistics and Epidemiology

Project Summary

The goal of my project was to implement a Scikit-learn package for eLCS, a supervised learning variant of the LCS (Learning Classifier System) algorithm. Learning Classifier Systems are a classification of Rule Based Machine Learning Algorithms that have been shown to perform well on problems involving high amounts of heterogeneity and epistasis. Well designed LCSs are also highly human interpretable. LCS variants have been shown to adeptly handle supervised and reinforced, classification and regression, online and offline learning problems, as well as missing or unbalanced data. These characteristics of versatility and interpretability give LCSs a wide range of potential applications, notably those in epidemiology.


Learning Classifier Systems are highly dissimilar from the more widely known deep learning algorithms used in machine learning, and research in LCSs has been conducted within a relatively small community over the past few decades. Hence, this project gave me the opportunity to work with Dr. Urbanowicz, one of the leading researchers in the field, in bringing LCSs further into the ML zeitgeist through the Scikit implementation of an easy to understand variant of the algorithm - eLCS. Aside from introducing me to LCSs, this project also allowed me to familiarize myself with the Numpy and Scikit-learn Python libraries, which are both heavily used in most computer-driven research. This project also taught me to better troubleshoot and correct algorithmic and software development problems, which is a key skill I need to master as a computer science major. My project primarily involved taking the knowledge of the algorithm I learned through reading a textbook and various papers, and mapping it to Scikit with the help of an existing implementation written by Dr. Urbanowicz. The initial mapping created many errors in the code. Thus, I needed to develop novel ways to identify, locate, and fix those errors, such as writing methods that printed out the steps of the machine learning process, which allowed me to discover unexpected behavior.


Finally, as a student highly interested in the intersection between technology and medicine, I am fascinated with how data can be used in the diagnosis and treatment of patients. Today, there exist countless problems that prevent physicians from properly treating those with chronic conditions, such as the vast, hard to understand reams of EHR data sitting in hospital databases. This project connected many dots in my mind between problems in medicine and potential LCS driven solutions. Over the next few years, I aim to continue work with Dr. Urbanowicz in building and evaluating more LCS algorithms to further advance the fields of machine learning and epidemiology.