COMPUTATIONAL SOCIAL SCIENCE RESEARCH GROUP
I am leading the Computational Social Science Research Group at the University of Wisconsin-Madison. This is an interdisciplinary research group focusing on using computational methods to answer social science questions. Currently, about fifteen graduate students from social sciences, computer science, and statistics collaboratively work on multiple research projects.
Yang, J., Sangari, A., Duncan, M., Zhang, Cao, D., Lukito, J, Bialik, K., Kim, S., Kornfield, R., Wu, Y., & Zhang, W. (2016). Obamacare and Political Polarization on Twitter: An Application of Machine Learning and Social Network Analysis. Paper presented at Communication Crossroads 2016. Madison, WI, USA. [Presentation Slide]
This study investigates political polarization in the Twitter conversation about the Affordable Care Act. Using the Twitter Gardenhose API, we collected over 300,000 tweets over three different periods in 2012 when “Obamacare” received national news coverage. This sample ranged from the day the Supreme Court announced it would hear the Obamacare case through the 2012 U.S. presidential election. Using supervised machine learning methods, we classified Twitter users’ political orientation based on text features used in their tweets and profile descriptions. In addition to political orientation, we distinguished members of the public from members of the elite based on Twitter’s verified status and follower count. We assessed retweet networks, which revealed highly polarized clusters of liberals and conservatives. However, levels of polarization were not uniform across time or across Twitter users. In the earlier time periods, conversations between the groups demonstrate substantial cross-cutting, but these cross-cutting links disappeared and the network became more polarized near election day, showing signs of party-sorting. Furthermore, the elites and the public interacted differently within the network. Our findings suggest the role of the grassroots conservative movement on Twitter for promoting anti-Obamacare agenda and also highlight the role of mainstream media and journalists in bridging the divide between the two ideological groups. The implications of machine learning and social network analysis to political communication research are discussed.
Yang, J., Sangari, A., Zhang, W., & Shah, D. V. (2016). Applying Supervised Machine Learning to Compute Political Ideology Among Twitter Users. Paper submitted to the 2016 International Conference on Computational Social Science. Evanston, IL, USA.
Social media have become a new political arena where politicians, journalists, media, activists, and individuals talk about different issues and share information. As social media data becomes more accessible and are read for traces of public opinion, growing scholarly attention has been paid to identify political ideology of users. Relying on Twitter data, we seek to identify the political leaning of Twitter users according to linguistic features of tweets, hashtag use, patterns of retweet and @mention, and self-described user profile using supervised machine learning methods. In particular, this paper discusses several different procedures of training machine classifiers and examines how different methodological choices impact the overall precision of the estimation. First, we suggest a strategy of sorting out highly influential users to reduce errors in human coding procedure. Since human coding of highly active twitter accounts results in labeling a large volume of tweets generated by these accounts, this additional step is useful to boost the size of trained data and enhance overall accuracy of the estimation. Second, we propose cross-validation of trained data as an important step of supervised machine learning. Third, we apply N-grams to capture phrases and multi-word expressions and to take into consideration of the word dependencies. Fourth, we compare the outcome of two-category classification (e.g., conservative vs. liberal) with the outcome of three-category classification (e.g., conservative vs. liberal vs. neutral) and discuss how these methodological choices affect the accuracy of automated classification of Twitter users’ political ideology. We present applications of this method using two different events as case studies, an issue-specific conversation around the Affordable Care Act in 2012 and a general political conversation during the first U.S. presidential debate in 2012. Further application of the method and implications in social science research are discussed.
Kornfield, R., Yang, J., Zhang, Y., Lukito, J., & Wu, Y. How People Talk about Obamacare?: Differences in Linguistic Patterns of the Conservatives and the Liberals.