Episode

Generalization of ERM in Stochastic Convex Optimization: The Dimension Strikes Back

with Vitaly Feldman

In stochastic convex optimization the goal is to minimize a convex function $F(x) \doteq \E_{f\sim D}[f(x)]o v e r a c o n v e x s e t \K \subset \R^dw h e r e Di s s o m e u n k n o w n d i s t r i b u t i o n a n d e a c h f(\cdot)i n t h e s u p p o r t o f Di s c o n v e x o v e r \K.T h e o p t i m i z a t i o n i s b a s e d o n i.i.d. s a m p l e s f^1,f^2,\ldots,f^nf r o m D.A c o m m o n a p p r o a c h t o s u c h p r o b l e m s i s e m p i r i c a l r i s k m i n i m i z a t i o n(E R M)t h a t o p t i m i z e s F_S(x) \doteq \frac{1}{n}\sum_{i\leq n} f^i(x).H e r e w e c o n s i d e r t h e q u e s t i o n o f h o w m a n y s a m p l e s a r e n e c e s s a r y f o r E R M t o s u c c e e d a n d t h e c l o s e l y r e l a t e d q u e s t i o n o f u n i f o r m c o n v e r g e n c e o f F_S_t_ o F_o_ v e r \K.W e d e m o n s t r a t e t h a t i n t h e s t a n d a r d \ell_p/\ell_q_s_ e t t i n g o f L i p s c h i t z_−_b o u n d e d f u n c t i o n s o v e r a \K_o_ f b o u n d e d r a d i u s,E R M r e q u i r e s s a m p l e s i z e t h a t s c a l e s l i n e a r l y w i t h t h e d i m e n s i o n d.T h i s n e a r l y m a t c h e s s t a n d a r d u p p e r b o u n d s a n d i m p r o v e s o n \Omega(\log d)d e p e n d e n c e p r o v e d f o r \ell_2/\ell_2_s_ e t t i n g i n(S h a l e v_−_S h w a r t z e t a l.2009).I n s t a r k c o n t r a s t,t h e s e p r o b l e m s c a n b e s o l v e d u s i n g d i m e n s i o n_−_i n d e p e n d e n t n u m b e r o f s a m p l e s f o r \ell_2/\ell_2_s_ e t t i n g a n d \log d_d_ e p e n d e n c e f o r \ell_1/\ell_\infty$ setting using other approaches. We also demonstrate that for a more general class of range-bounded (but not Lipschitz-bounded) stochastic convex programs an even stronger gap appears already in dimension 2.