Frazier, Kempe and Kleinberg² win best paper award at ACM conference
Many consumers choose products based at least in part on reviews on a retailer’s website. But as more and more customers rely on reviews, the reviews may tend to become concentrated on products with many favorable reviews, leaving other products relatively unexplored. So the retailer may decide to offer an incentive, such as a discount, to encourage consumers to explore less frequently reviewed products by buying and reviewing them. If incentives are properly set within the available budget, both the retailer and the customer may benefit their use, improving overall social welfare.
Designers of systems in other domains, such as crowdsourced information discovery (e.g., ratings of stories for social news sites), crowdsourced work (e.g., Amazon Mechanical Turk), citizen science (e.g., Galaxy Zoo or eBird) and even government funding of research efforts have similar opportunities to improve exploration by offering incentives to ‘agents’ to explore more widely.
ORIE’s Peter Frazier, OR Field members Bobby and Jon Kleinberg, and Jon’s former student David Kempe, now a professor at the University of Southern California, won the best paper award at the 2014 Association for Computing Machinery (ACM) Conference on Economics and Computation for their work on incentivizing exploration. The paper was one of more than eighty accepted for the conference and subsequent publication.
In the paper, the authors examine the trade-off between the expected size of incentive payments made to agents versus the expected reward to the principal who organizes the exploration. The more that is spent on incentives, the less accrues to the principal, but the total reward may or may not decrease with the payout. The authors’ approach considers both payments and rewards to be uncertain quantities, evolving over time, given that the actual value of each product, web story, distributed work effort, area of the astronomical sky, bird habitat or research activity may only be known up to a probability distribution on the outcome of the exploration.
According to Frazier, the original motivation for the work came from considering research funding by agencies, such as the National Science Foundation and the National Institutes of Health, that may want to offer incentives for research that has high risk, but also has a high potential long-range payoff, without overly reducing funding for small advances in popular areas that have less risky short-term returns. “You have a budget that you can afford, you want to use this budget to maximize the result,” Frazier told the Cornell Chronicle. “We give a formula for what is achievable.”
The authors lay the groundwork for a theory of such situations by considering a model devised more than sixty years ago by statistician Herbert Robbins, with the unusual name of the multi-armed bandit problem. The name comes from the characterization of a slot machine as a “one-armed bandit.” Robbins considered the case where the payoff odds from different slot machines are different and the gambler chooses from among an array (originally two) of machines, possibly changing machines as the experience of payoffs accumulates. In the prize-winning paper, each slot machine (characterized as an ‘arm’) corresponds to a product or story to be reviewed, a task to be paid for through crowd-sourcing, an area of the night sky or a natural bird habitat to be explored, or a research project to be funded.
The authors offer a complete characterization of the tradeoff between incentive payments and the total achievable reward. In particular, they show whether an incentive policy with at most a given level of expected payout to agents can achieve at least a desired level of expected total reward. They do this for all possible combinations of agent payouts and total reward.
Applying exploration incentives
While the objective of the research was to understand what can be achieved within this tradeoff, the results also suggest a way incentive amounts can be set that, according to Frazier, “works well in a setting where there are ‘diamonds in the rough’ – bandit arms that have a really high value to the person who pulls the arm.” The amounts are set using a randomized strategy, comparable to flipping a weighted coin during each stage of the evolving process to determine whether to either pay no incentive (so the agents are left to pick arms that selfishly maximize their individual returns) or pay an incentive large enough to induce agents to follow a socially optimal policy that explores more of the arms.
As Frazier points out, there are many potential applications for the use of exploration incentives. One intriguing example is associated with a pilot Cornell project to help rural herders – pastoralists - in Kenya find forage for their animals. The overall project, being carried out by Cornell’s Dyson School under the auspices of the Cornell Institute for Computational Sustainability, focuses on improving information about available forage opportunities.
The project provides the pastoralists (who may not be able to read) with icon-driven smartphones that enable them to report on what they find in the field. As with product reviews and the other examples described above, incentives (in this case, cellphone call minutes) are being used to promote the exploration of areas that are farther afield. ORIE graduate student Pu Yang is working with Frazier and ORIE Professor Kris Iyer to develop a theoretical understanding of how additional information, such as data on water resources and demographics, might be incorporated into the system as further incentives to exploration.