强化学习几个不等式推导

学习强化学习multi-armed bandit的时候,几个概率不等式没有推导过程课上直接使用了,这里记一下推导过程。

相关不等式:

Markov, Chebyshev, Chernoff Bonding, Hoeffding

Markov’s Inequality

$$\mathbb{P}(X\ge a)\le\frac{\mathbb{E}[X]}{a},a>0$$

证明1

Let $Y_a=\begin{cases}0, X<a\\a, X\ge a\end{cases}$, then Ya ≤ X which implies 𝔼[Ya] ≤ 𝔼[X], combining with the fact that 𝔼[Ya] = ℙ(X ≥ a) × a, we obtain $\mathbb{P}(X\ge a)\le\frac{\mathbb{E}[X]}{a}, a>0$, as needed.

证明2

𝔼[X] = ∫0xp(x)dx ≥ ∫axp(x)dx ≥ ∫aap(x)dx = ℙ(X ≥ a) × a

证明3

Let $Y=\frac{U}{a}$ which gives $\mathbb{E}[Y]=\frac{\mathbb{E}[X]}{a}$, then ℙ(X ≥ a) = ℙ(Y ≥ 1). What we want to show is ℙ(Y ≥ 1) ≤ 𝔼[Y], i.e. 𝕀{Y ≥ 1} ≤ Y (then take expectation on both sides give the desired result).

Case 1: Y ≥ 1. Then the inequality holds.

Case 2: Y < 1. Then the indicator is 0 showing the inequality holds,

Hence we conclude the Markov’s Inequality holds.

Chebyshev’s Inequality

$$\mathbb{P}(|X-\mu|\ge c)\le\frac{\sigma^2}{c^2}, c>0$$

$$\mathbb{P}(|X-\mu|\ge k\sigma)\le\frac{1}{k^2}, k>0$$

Where μ is the mean, σ is the standard deviation

证明

Using Markov’s Inequality, By letting X := (X − μ)2, a := c2, we see $\mathbb{P}((X-\mu)^2\ge c^2)\le \frac{\mathbb{E}[(X-\mu)^2]}{c^2}=\frac{\sigma^2}{c^2}$ which gives $$\mathbb{P}(|X-\mu|\ge c)\le\frac{\sigma^2}{c^2}$$

By letting c := kσ, we see $$\mathbb{P}(|X-\mu|\ge k\sigma)\le\frac{\sigma^2}{k^2\sigma^2}=\frac{1}{k^2}$$

Chernoff Bonding Method

Let MZ(s) := 𝔼[esZ] be the moment generating function.

ℙ(Z ≥ t) ≤ infs > 0estMZ(s)

证明

Apply Markov’s Inequality, ℙ(Z ≥ t) = ℙ(esZ ≥ est) ≤ est𝔼[esZ] = estMZ(s), take inf of all s > 0 gives the result.

Hoeffding’s Inequality

{Zi}i ∈ Λ ⊂ ℝ be independent random variables. If ai ≤ Zi ≤ bi for all i ∈ Λ, for some ai, bi, then let Sn = ∑iZi, we have

$$\mathbb{P}(|S_n-\mathbb{E}[S_n]|\ge t)\le 2e^{-\frac{2t^2}{\sum_i (b_i-a_i)^2e}}$$

证明

Using Chernoff Bonding Method, and replacing infimum to minimum since it is achievable, we have ℙ(Sn − 𝔼[Sn] ≥ t) ≤ mins > 0est𝔼[es(Sn − 𝔼[Sn])]

省略

Comments

:D 一言句子获取中...