Defense Against Adversarial Attacks

The goal of my research project is to improve upon the current methods of adversarial training to better defend machine learning models against adversarial examples. Specifically, most current adversarial training methods only defend models against a specific level of perturbation. Nonetheless, in the real world, the level of perturbation is often determined by the adversary. Hence, a truly safe model should be robust against different levels of perturbation. In my project, I looked for improvements that can be made based on existing methods of adversarial training.

This is the first in-depth research project, and it involved a lot of work. I had to read many papers to gain better understanding of adversarial examples and get inspirations of possible improvements. Since I have not worked so in-depth with a research project before, I was not very organized in the beginning. Therefore, as I did more work, it became harder to find my previous thoughts. I subsequently followed my mentor, Professor Dobriban’s advice, and started taking more structured notes. In addition, I also had to make better documentations of the computational experiments I ran so I can find my previous results easier.

I also learned a lesson from a setback. During the early stages of my project, I ran experiments mostly on smaller datasets. After I got some good results on the smaller datasets, I began to transfer the experiment to larger datasets. I did not do any pilot testing and launched the experiment on AWS. As a result, the experiment ran for 72 hours, cost over $300, and, most frustratingly, did not produce any meaningful result due a bad guess on a parameter. Such setback was a bucket of cold water that stopped me from doing more experiments hastily.