Training a Language-to-Vision Mapping With Contextualized BERT Embeddings to Construct Multimodal Embeddings

This summer I became introduced to Natural Language Processing research through working with Professor Chris Callionson-Burch. There were multiple projects going on under Chris’s guidance, and I was working under the umbrella of investigating Multimodal Embeddings, which entails combining both textual and visual information to represent language, a method that was supposed to out-perform the traditional method of using textual information alone.

In the first half of the summer, I was simply learning- reading papers, tutorials, watching FastAI courses, or anything that was suggested by Chris and fellow student-researchers in his lab. First, with the help of my peers, I familiarized myself with what has already been done with respect to our project by a graduated student who worked with Chris. I started by reading and trying to understand the model that was already written, and then proceeded to train it myself with different types of data and recording the results. This process would not have been possible without the help of the other student working with Chris this summer under PURM, Sarah. Up until the end of Week 5, everything I’ve done was just learning and playing around with what has already been done.

After reaching a mediocre level of comfortability and understanding, I began thinking about what I can do to build upon existing research, preferably containing some sort of novelty. About four weeks into PURM, I attended a NLP paper reading session recommended by a postdoc working for Chris. It was about a record breaking state-of-the-art NLP model recently released by Google called BERT. I was impressed with its speed of popularization within the NLP world and convinced by one of its rationale- word vectors that take sentence context into account are better than word vectors that don’t. I decided that this is the path I will follow. After several meetings with Chris, I began to form a clear plan in my head of the research plan, which I am still in the process of executing.

Working in research this summer taught me many lessons that I have not had to learn as a regular student. First, Chris always had his door open if I needed his guidance, but he never loomed over us, checking for progress, checking for productivity. Because of this, I learned that a researcher must be self-motivated, and the only way to ensure that is by actually loving what you are doing. I observed such passion from grad and phd students working in the building around me. Second, I began to realize that unlike any other work I’ve done, research is a long and slow process that requires patience and persistence. You can’t go from beginning to end in a week. I started to record my progress and every little thing I’ve done, from what papers I read to the code I wrote, in a research journal. I found that this helped a ton with quickly finding work I’ve already done, such as looking back to Week 3 work at Week 9.

Chris said that his main intention, in addition to learning, was for us to get a feel of what being a Computer Science grad student is like. I definitely feel like this goal was achieved, and I am incredibly thankful for Chris and all of the other students working with him this summer for such a valuable learning experience. I look forward to continue working with Chris.