This summer I had the opportunity to join Professor Dou’s research team and worked on a project that investigates resource misallocation in innovation with other undergraduate and PhD students. This project provides important economic insights into the efficiency of investments in R&D across industries. We analyzed patent citation data which reveals the levels of technological dependence between industries and constructed network graphs to visualize the innovation flow across industries in the US.
At the beginning of this project, we were able to familiarize ourselves with data analytics tools such as R and Excel while cleaning and processing raw data of patent citation of each classified industry category. We also gained hands-on experience of working with the special Morningstar terminal during the data collection process. One of the most interesting tasks we tackled is to visualize the relative connectivity of different industries’ technological innovation with the most fitting algorithm.
We first tried to graph with some built-in algorithms in R but could not derive reasonable interpretations from those network graphs, as the visual presentation of industries’ relative connectivity does not match their actual correlations. We realize that the algorithm created with graphing purposes similar to ours requires coding modifications that are beyond our knowledge. This led us to discover a new graphing software Gephi. We were able to adjust multiple parameters of the algorithm specific for our simulation and generate relatively accurate visualizations of the patent data.
Performing complex and sometimes time sensitive data analytics tasks together with my teammate honed my communication and coordination skills. I became more attentive to details. Weekly discussions with our project team helped me develop a deeper understanding of the complexity of efficient investment in R&D as well as the broader impact of the finance industry on the overall economy.