Multi-Armed bandits are an elegant model of learning in an unknown and uncertain environment. Such models are relevant in many scenarios, and of late have received increased attention recently due to various problems of distributed control that have arisen in wireless networks, pricing models on the internet, etc. We consider a non-Bayesian multi-armed bandit setting proposed by Lai & Robbins in mid-80s. There are multiple arms each of which generates an i.i.d. reward from an unknown distribution. There are multiple players, each of whom is choosing which arm to play. If two or more players choose the same arm, they all get zero rewards. The problem is to design a learning algorithm to be used by the players that results in an orthogonal matching of players to arms (e.g., users to channels in wireless networks), and moreover minimizes the expected regret.
>>
>> We first consider this as an online bipartite matching problem. We model this combinatorial problem as a classical multi-armed bandit problem but with dependent arms, and propose an index-based learning algorithm that achieves logarithmic regret. From prior results, it is known that this is order-optimal. We then consider the distributed problem where players do not communicate or coordinate in any way. We propose a index-based algorithm that uses Bertsekas' auction mechanism to determine the bipartite matching. We show that the algorithm has expected regret at most near-log-squared. This is the first distributed multi-armed bandit learning algorithm in a general setting.
Rahul Jain is the K. C. Dahlberg Early Career Chair and an Assistant Professor in the EE & ISE Departments at the University of Southern California, Los Angeles, CA. He received his B.Tech from IIT Kanpur, and an MA in Statistics and a PhD in EECS from the University of California, Berkeley. Prior to joining USC in Fall 2008, he was at the IBM T J Watson Research Center, Yorktown Heights, NY. He is a recipient of the NSF CAREER award in 2010, the ONR Young Investigator award in 2012, an IBM Faculty award in 2010, the James H. Zumberge Faculty Research and Innovation Award in 2009 and a Best paper award at a couple of conferences. His interests span game theory and network economics, queueing theory, stochastic models, optimization and learning with applications to communication networks and power systems.