3 years ago

Analysis of Thompson Sampling for Graphical Bandits Without the Graphs.

Ness Shroff, Zizhan Zheng, Fang Liu

We study multi-armed bandit problems with graph feedback, in which the decision maker is allowed to observe the neighboring actions of the chosen action, in a setting where the graph may vary over time and is never fully revealed to the decision maker. We show that when the feedback graphs are undirected, the original Thompson Sampling achieves the optimal (within logarithmic factors) regret $\tilde{O}\left(\sqrt{\beta_0(G)T}\right)$ over time horizon $T$, where $\beta_0(G)$ is the average independence number of the latent graphs. To the best of our knowledge, this is the first result showing that the original Thompson Sampling is optimal for graphical bandits in the undirected setting. A slightly weaker regret bound of Thompson Sampling in the directed setting is also presented. To fill this gap, we propose a variant of Thompson Sampling, that attains the optimal regret in the directed setting within a logarithmic factor. Both algorithms can be implemented efficiently and do not require the knowledge of the feedback graphs at any time.

Publisher URL: http://arxiv.org/abs/1805.08930

DOI: arXiv:1805.08930v1

You might also like
Never Miss Important Research

Researcher is an app designed by academics, for academics. Create a personalised feed in two minutes.
Choose from over 15,000 academics journals covering ten research areas then let Researcher deliver you papers tailored to your interests each day.

Researcher displays publicly available abstracts and doesn’t host any full article content. If the content is open access, we will direct clicks from the abstracts to the publisher website and display the PDF copy on our platform. Clicks to view the full text will be directed to the publisher website, where only users with subscriptions or access through their institution are able to view the full article.