强化学习几个不等式推导
学习强化学习multi-armed bandit的时候,几个概率不等式没有推导过程课上直接使用了,这里记一下推导过程。
相关不等式:
Markov, Chebyshev, Chernoff Bonding, Hoeffding
Markov’s Inequality
$$\mathbb{P}(X\ge a)\le\frac{\mathbb{E}[X]}{a},a>0$$
证明1
Let $Y_a=\begin{cases}0, X<a\\a, X\ge a\end{cases}$, then Ya ≤ X which implies 𝔼[Ya] ≤ 𝔼[X], combining with the fact that 𝔼[Ya] = ℙ(X ≥ a) × a, we obtain $\mathbb{P}(X\ge a)\le\frac{\mathbb{E}[X]}{a}, a>0$, as needed.
证明2
𝔼[X] = ∫0∞xp(x)dx ≥ ∫a∞xp(x)dx ≥ ∫a∞ap(x)dx = ℙ(X ≥ a) × a
证明3
Let $Y=\frac{U}{a}$ which gives $\mathbb{E}[Y]=\frac{\mathbb{E}[X]}{a}$, then ℙ(X ≥ a) = ℙ(Y ≥ 1). What we want to show is ℙ(Y ≥ 1) ≤ 𝔼[Y], i.e. 𝕀{Y ≥ 1} ≤ Y (then take expectation on both sides give the desired result).
Case 1: Y ≥ 1. Then the inequality holds.
Case 2: Y < 1. Then the indicator is 0 showing the inequality holds,
Hence we conclude the Markov’s Inequality holds.
Chebyshev’s Inequality
$$\mathbb{P}(|X-\mu|\ge c)\le\frac{\sigma^2}{c^2}, c>0$$
$$\mathbb{P}(|X-\mu|\ge k\sigma)\le\frac{1}{k^2}, k>0$$
Where μ is the mean, σ is the standard deviation
证明
Using Markov’s Inequality, By letting X := (X − μ)2, a := c2, we see $\mathbb{P}((X-\mu)^2\ge c^2)\le \frac{\mathbb{E}[(X-\mu)^2]}{c^2}=\frac{\sigma^2}{c^2}$ which gives $$\mathbb{P}(|X-\mu|\ge c)\le\frac{\sigma^2}{c^2}$$
By letting c := kσ, we see $$\mathbb{P}(|X-\mu|\ge k\sigma)\le\frac{\sigma^2}{k^2\sigma^2}=\frac{1}{k^2}$$
Chernoff Bonding Method
Let MZ(s) := 𝔼[esZ] be the moment generating function.
ℙ(Z ≥ t) ≤ infs > 0e−stMZ(s)
证明
Apply Markov’s Inequality, ℙ(Z ≥ t) = ℙ(esZ ≥ est) ≤ e−st𝔼[esZ] = e−stMZ(s), take inf of all s > 0 gives the result.
Hoeffding’s Inequality
{Zi}i ∈ Λ ⊂ ℝ be independent random variables. If ai ≤ Zi ≤ bi for all i ∈ Λ, for some ai, bi, then let Sn = ∑iZi, we have
$$\mathbb{P}(|S_n-\mathbb{E}[S_n]|\ge t)\le 2e^{-\frac{2t^2}{\sum_i (b_i-a_i)^2e}}$$
证明
Using Chernoff Bonding Method, and replacing infimum to minimum since it is achievable, we have ℙ(Sn − 𝔼[Sn] ≥ t) ≤ mins > 0e−st𝔼[es(Sn − 𝔼[Sn])]