The goal of my project, Provenance-Guided Synthesis of Datalog Programs, is to create a program that is able to generate new programs based on provided input and output data. Specifically, we employ a constraint solver in conjunction with Souffle, a Datalog compiler, to generate these programs. A Datalog program is a list of rules which can interact with each other. Running a Datalog program produces output given input, so to generate the desired program we need to modify the incorrect program until the output is correct. However, it is not obvious how to best modify the programs to get the correct output, since rules can interact in unpredictable ways.
Our research relies on the concept of provenance, a term which refers to assigning blame to certain rules for incorrect data points in the output. Using Souffle, we can identify why certain incorrect data points were produced and prohibit the combination of rules that leads to the production of these undesirable data points by adding a constraint to our constraint solver. The more challenging task is determining why a data point was not produced; all rules which are not currently listed could be to blame. We use Delta Debugging, an algorithm commonly employed for finding bugs in programs, to select the smallest possible set of rules that contain a subset of rules that produce the unproduced data point.
Through this research experience, I learned the concepts behind program synthesis and constraint solvers. I was lucky to have an advisor who was willing to both hear out my ideas and explain concepts that I did not understand - most of my learning occurred when we talked through ideas for the algorithm on the whiteboard. After we decided on a strategy, I would go back and write code based on our discussion, which meant I really got to grapple with the concepts first hand. I was also lucky enough to have the experience of writing a paper to be submitted to a conference. Participating in this research project has allowed me to better understand the process of conducting research as well as the concepts behind program synthesis, and has been an extremely valuable experience.
I would like to thank Professor Mayur Naik and Mukund Raghothaman for their mentorship and guidance throughout this project.