We design new differentially private algorithms for the problems of adversarial bandits and bandits with expert advice. For adversarial bandits, we give a simple and efficient conversion of any non-private bandit algorithms to private bandit algorithms. Instantiating our conversion with existing non-private bandit algorithms gives a regret upper bound of O(KTε)O\left(\frac{\sqrt{KT}}{\sqrt{\varepsilon}}\right)O(εKT), improving upon the existing upper bound O(KTlog(KT)ε)O\left(\frac{\sqrt{KT \log(KT)}}{\varepsilon}\right)O(εKTlog(KT)) in all privacy regimes. In particular, our algorithms…
Source link