Automating Atomic Defect Identification in TEM Images of 2D Materials

Recent advancements in Scanning Transmission Electron Microscopy (STEM) technology have created an influx of new data in the form of hundreds of thousands of images. Previous “by-hand” analysis techniques are no longer a viable option for efficient characterization of atomic defects and structure, so with this project I focus on automating the identification of atomic defects in transition metal dichalcogenides (TMD), one of the primary two-dimensional materials analyzed in the Drndic Lab. Successful identification and characterization of atomic defects can allow for applications in faster DNA sequencing using nanopores, as well as a better understanding of the relationship between atomic structure and nanomaterial properties. To automate the identification of atomic defects, I use the open source computer vision library OpenCV as well as a few self-written functions to separate atomic defects from the rest of the pristine sheet. I ran into a lot of problems refining the parameters associated with each filter to best isolate the defects. The conventional solution to a problem like this normally involves a neural network, but because I didn’t have the time (nor the patience) to create a substantial training set, I attempted a new technique. I ran my identification program using all possible sets of varying three of my parameters (for a total of 136,500 separate images), and created a script that could identify when an image was identified correctly. When run over all 136,500 images to check for identification accuracy, the result was plots that indicated the combinations of parameters that gave successful identification of the defects in a specific image. When compared side-by-side with multiple images, I could find the overlap between each plot, and thus choose the best set of parameters for the specific nanomaterial I was analyzing. By automating all steps of this process, this technique may be applied to any nanomaterial TEM image. At the moment, I am running into issues maximizing overlap between the plots, so the next step is to 1. factor in additional parameters to change/refine, 2. adjust for different standard deviations of brightnesses (currently I only account for the median brightness when I adjust for varying image brightnesses), or 3. account for a greater number of images to avoid outlier images. Over the past ten weeks I’ve learned a lot about not only tools such as OpenCV, Dm3Reader and Matplotlib, but also about how to approach problems with a creative mindset. I was originally discouraged because I thought I’d need to learn how to create my own neural network, but by focusing on the tools available to me I could find a simpler solution. However, all of this wouldn’t have been possible without the support and guidance of the lab. Thank you to Priyanka for setting me up with this project, Paul for giving me access to his TEM images, and Rachael and Sarah for all of the lab training and guidance. Lastly, thank you to Dr. Marija Drndic for all of her encouraging words and support.