This summer, I worked for Prof. Jonah Berger and Dr. Russell Richie to study the influence of textual differentiation on article reading depth. In general, we examined more than 50,000 articles from 9 different websites. We used Latent Dirichlet Allocation to assign topic distribution, represented by a vector summed up to 1, to each article. We grouped the articles by website and calculated the mean topic distribution for each website. To measure the distance of individual article’s distance from its corresponding website, we adopted four different measures and calculated the differentiation (distance) score of each article. We used the derived parameters as predictors to fit in an ordinary linear regression with each article’s maximum reading depth by reader / article length as the response variable. Our results indicated that textual differentiation leading to higher reading depth ratio.
During the research process, Prof. Berger and Dr. Richie were very helpful and mentored me to learn python and many natural language processing packages. I substantially developed my programming and data analysis skills. My research experience helped me get more familiar with the skills I have already acquired in previous classes, and also facilitated my understanding in machine learning.