There is an increasing amount of human-generated data available on the internet -- including online reviews, user search histories, datasets labeled using crowdsourcing, and beyond. This has created an unprecedented opportunity for researchers in machine learning and data science to address a wide range of problems. On the other hand, human-generated data also creates unique challenges. Humans might be strategic or careless, possess diverse skills, or have behavioral biases. What is the right way to understand and utilize human-generated data? Furthermore, can we better design the systems with humans in the loop to generate more useful data in the first place?
In this talk, I will present my research which addresses the challenges in utilizing and eliciting data from humans. In particular, I will introduce the problem of actively purchasing data from humans for solving machine learning tasks, and demonstrate how to convert a large class of machine learning algorithms into pricing and learning mechanisms. I will also discuss how to obtain high-quality data from humans using financial incentives and present our findings in a comprehensive set of behavioral experiments conducted on Amazon Mechanical Turk.