ChatEval: A Framework for Evaluation Neural Open-Domain Chatbots

Arun in front of representation of neural network




Associate Professor

Project Summary

Learning Machine Learning

An Undergraduate Summer in NLP Research

This summer, I was excited to finally work with the poster child of computer science: machine learning. I was a summer research assistant at the University of Pennsylvania working in the Natural Language Processing (NLP) Group. Namely, I was working on ChatEval: a scientific framework for evaluating dialogue systems (i.e. deep learning models).

When I first joined the research group, I didn't have a lot of past experience with machine learning. I knew Python and had followed some tutorials on Sci-Kit Learn, but I didn't really feel like I understood what I was doing. This was mainly due to the fact that by aimlessly bouncing around tutorials, I wasn't developing any real domain knowledge on anything that I was modelling. Trying to understand how deep learning work usually pointed me towards intimidating math and notation.

However, with a software engineering background, I was still able to contribute at first to the research group by assisting with implementation. And as such, I was able to learn practical deep learning skills by focusing on one specific domain: natural language. I was able to develop an intuitive and then mathematical understanding of Seq2Seq models - the type of neural network architecture that recently galvanized incredible progress in neural machine translation and dialogue systems.

My Work

I am working on an application to assist machine learning researchers called ChatEval( As the name implies, the software allows researchers to upload models to perform standardized comparisons and evaluations of different types (e.g. average sentence length). We even use deep learning within the project through the word embeddings of Word2Vec: an unsupervised learning model that captures "semantic" meaning using vectors.


A lot of friends (who have programming experience) have asked me how to get involved with machine learning. The biggest advice I have is the same for any kind of computer science ecosystem: work your way down the abstraction ladder.

You can do this by first only using machine learning libraries that abstract away most of the computation from you. An example would be the excellent Sci-Kit Learn packages, which can be used to get familiar with the test/train paradigm. Then, you can begin implementing various shallow learning models such as K-Nearest Neighbours and Support Vector Machines. Learning how to manipulate data using a language like Python or R is also an important skill. You can’t train a model without being able to format the data into a format that the model can understand.

The jump to deep learning often involves some thought - understanding WHY deep learning excels over standard machine learning will make the learning process a bit easier. For example, knowing the differences between human and machine created features is a good indicator that you understand the importance of deep learning. Here, it is crucial to see neural networks as black boxes while beginning. For this, I am a huge fan of Keras (which abstracts the computational graph of Tensorflow) and its intuitive API for building neural networks. Then, you can start using Tensorflow/Theano directly and delve deeper into the theory of neural networks.

As for domain-specific experience, I found NLP to be an excellent first step into deep learning. This is mainly since words and sentences can be viewed as discrete rather than “continuous” data such as pixel data and time series data (e.g. stock prices). As a result, it is easy to understand and visualize the words as the features for a neural network, unique vector arithmetic for a single word and so forth.

I'm currently using Keras to build projects that I wrap in a Django/Flask application. As a software engineer at heart, I like building things with the end goal of people using it (rather than a print() statement with an accuracy that only I need to know).


My PhD supervisor working on ChatEval told me that NLP is a "gateway drug" to every branch of computer science. I think that's a fantastic way to describe the effect of studying NLP this summer. I was able to branch out and make use of regular expressions (theory), word embeddings (deep learning), dialogue systems (software engineering), language corpuses (data science) etc.

I’d definitely recommend looking into natural language processing if you’re interested in the intersection between computer science and human interaction!