2025-10-11 6e70803a3228be4837514c58769d5fb3 0 3 分钟 0.5 k0次访问

强化学习几个不等式推导

学习强化学习multi-armed bandit的时候，几个概率不等式没有推导过程课上直接使用了，这里记一下推导过程。

相关不等式：

Markov, Chebyshev, Chernoff Bonding, Hoeffding

Markov’s Inequality

\[ \begin{equation} \mathbb{P}(X\ge a)\le\frac{\mathbb{E}[X]}{a},a>0 \end{equation} \]

证明1

Let \[Y_a=\begin{cases}0, X<a\\a, X\ge a\end{cases}\] then \(Y_a\le X\) which implies \(\mathbb{E}[Y_a]\le \mathbb{E}[X]\), combining with the fact that \(\mathbb{E}[Y_a]=\mathbb{P}(X\ge a)\times a\), we obtain \[ \begin{gather} \mathbb{P}(X\ge a)\le\frac{\mathbb{E}[X]}{a}, a>0 \end{gather} \]

as needed.

证明2

\[ \begin{gather} \mathbb{E}[X]=\int_0^\infty xp(x)dx\ge\int_a^\infty xp(x)dx\ge\int_a^\infty a p(x)dx=\mathbb{P}(X\ge a)\times a \end{gather} \]

证明3

Let \(Y=\frac{U}{a}\) which gives \(\mathbb{E}[Y]=\frac{\mathbb{E}[X]}{a}\), then \(\mathbb{P}(X\ge a)=\mathbb{P}(Y\ge 1)\). What we want to show is \(\mathbb{P}(Y\ge 1)\le \mathbb{E}[Y]\), i.e. \(\mathbb{I}\{Y\ge1\}\le Y\) (then take expectation on both sides give the desired result).

Case 1: \(Y\ge1\). Then the inequality holds.

Case 2: \(Y<1\). Then the indicator is 0 showing the inequality holds,

Hence we conclude the Markov’s Inequality holds.

Chebyshev’s Inequality

\[\mathbb{P}(|X-\mu|\ge c)\le\frac{\sigma^2}{c^2}, c>0\]

\[\mathbb{P}(|X-\mu|\ge k\sigma)\le\frac{1}{k^2}, k>0\]

Where \(\mu\) is the mean, \(\sigma\) is the standard deviation

证明

Using Markov’s Inequality, By letting \(X:=(X-\mu)^2, a:=c^2\), we see \(\mathbb{P}((X-\mu)^2\ge c^2)\le \frac{\mathbb{E}[(X-\mu)^2]}{c^2}=\frac{\sigma^2}{c^2}\) which gives \[\mathbb{P}(|X-\mu|\ge c)\le\frac{\sigma^2}{c^2}\]

By letting \(c:=k\sigma\), we see \[\mathbb{P}(|X-\mu|\ge k\sigma)\le\frac{\sigma^2}{k^2\sigma^2}=\frac{1}{k^2}\]

Chernoff Bonding Method

Let \(M_Z(s):=\mathbb{E}[e^{sZ}]\) be the moment generating function.

\[\mathbb{P}(Z\ge t)\le \inf_{s>0}e^{-st}M_Z(s)\]

证明

Apply Markov’s Inequality, \(\mathbb{P}(Z\ge t)=\mathbb{P}(e^{sZ}\ge e^{st})\le e^{-st}\mathbb{E}[e^{sZ}]=e^{-st}M_Z(s)\), take inf of all \(s>0\) gives the result.

Hoeffding’s Inequality

\(\{Z_i\}_{i\in\Lambda}\subset\mathbb{R}\) be independent random variables. If \(a_i\le Z_i\le b_i\) for all \(i\in\Lambda\), for some \(a_i, b_i\), then let \(S_n=\sum_i Z_i\), we have

\[\mathbb{P}(|S_n-\mathbb{E}[S_n]|\ge t)\le 2e^{-\frac{2t^2}{\sum_i (b_i-a_i)^2e}}\]

证明

Using Chernoff Bonding Method, and replacing infimum to minimum since it is achievable, we have \(\mathbb{P}(S_n-\mathbb{E}[S_n]\ge t)\le \min_{s>0}e^{-st}\mathbb{E}[e^{s(S_n-\mathbb{E}[S_n])}]\)

省略

强化学习几个不等式推导

Markov’s Inequality

证明1

证明2

证明3

Chebyshev’s Inequality

证明

Chernoff Bonding Method

证明

Hoeffding’s Inequality

证明

目录

最新评论