Essential dimension

2012-05-27T06:55:59Z

164.67.64.43: disambiguated lattice

In statistical [[decision theory]], where we are faced with the problem of estimating a deterministic parameter (vector) <math>\theta \in \Theta</math> from observations <math>x \in \mathcal{X},</math> an [[estimator]] (estimation rule) <math>\delta^M \,\!</math> is called '''[[minimax]]''' if its maximal [[Risk function|risk]] is minimal among all estimators of <math>\theta \,\!</math>. In a sense this means that <math>\delta^M \,\!</math> is an estimator which performs best in the worst possible case allowed in the problem.

==Problem setup==
Consider the problem of estimating a deterministic (not [[Bayes estimator|Bayesian]]) parameter <math>\theta \in \Theta</math> from noisy or corrupt data <math>x \in \mathcal{X}</math> related through the [[conditional probability distribution]] <math>P(x|\theta)\,\!</math>. Our goal is to find a "good" estimator <math>\delta(x) \,\!</math> for estimating the parameter <math>\theta \,\!</math>, which minimizes some given [[risk function]] <math>R(\theta,\delta) \,\!</math>. Here the risk function is the [[expected value|expectation]] of some [[loss function]] <math>L(\theta,\delta) \,\!</math> with respect to <math>P(x|\theta)\,\!</math>. A popular example for a loss function is the squared error loss <math>L(\theta,\delta)= \|\theta-\delta\|^2 \,\!</math>, and the risk function for this loss is the [[mean squared error]] (MSE).

Unfortunately in general the risk cannot be minimized, since it depends on the unknown parameter <math>\theta \,\!</math> itself (If we knew what was the actual value of <math>\theta \,\!</math>, we wouldn't need to estimate it). Therefore additional criteria for finding an optimal estimator in some sense are required. One such criterion is the minimax criteria.

==Definition==
'''Definition''' : An estimator <math>\delta^M:\mathcal{X} \rightarrow \Theta \,\!</math> is called '''minimax''' with respect to a risk function <math>R(\theta,\delta) \,\!</math> if it achieves the smallest maximum risk among all estimators, meaning it satisfies

: <math>\sup_{\theta \in \Theta} R(\theta,\delta^M) = \inf_\delta \sup_{\theta \in \Theta} R(\theta,\delta). \, </math>

==Least favorable distribution==
Logically, an estimator is minimax when it is the best in the worst case. Continuing this logic, a minimax estimator should be a [[Bayes estimator]] with respect to a prior least favorable distribution of <math>\theta \,\!</math>. To demonstrate this notion denote the average risk of the Bayes estimator <math>\delta_{\pi} \,\!</math> with respect to a prior distribution <math>\pi \,\!</math> as

: <math>r_\pi = \int R(\theta,\delta_{\pi}) \, d\pi(\theta) \, </math>

'''Definition:''' A prior distribution <math>\pi \,\!</math> is called least favorable if for any other distribution <math>\pi ' \,\!</math> the average risk satisfies <math>r_\pi \geq r_{\pi '} \, </math>.

'''Theorem 1:''' If <math>r_\pi = \sup_\theta R(\theta,\delta_\pi), \, </math> then:

#<math>\delta_{\pi}\,\!</math> is minimax.
#If <math>\delta_{\pi}\,\!</math> is a unique Bayes estimator, it is also the unique minimax estimator.
#<math>\pi\,\!</math> is least favorable.

'''Corollary:''' If a Bayes estimator has constant risk, it is minimax. Note that this is not a necessary condition.

'''Example 1, Unfair coin:''' Consider the problem of estimating the "success" rate of a [[Binomial distribution|Binomial]] variable, <math>x \sim B(n,\theta)\,\!</math>. This may be viewed as estimating the rate at which an [[Fair coin|unfair coin]] falls on "heads" or "tails". In this case the Bayes estimator with respect to a [[Beta distribution|Beta]]-distributed prior, <math>\theta \sim \text{Beta}(\sqrt{n}/2,\sqrt{n}/2) \, </math> is

:<math>\delta^M=\frac{x+0.5\sqrt{n}}{n+\sqrt{n}}, \, </math>

with constant Bayes risk

:<math>r=\frac{1}{4(1+\sqrt{n})^2} \, </math>

and, according to the Corollary, is minimax.

'''Definition:''' A sequence of prior distributions <math> {\pi}_n\,\!</math> is called least favorable if for any other distribution <math>\pi '\,\!</math>,
:<math>\lim_{n \rightarrow \infty} r_{\pi_n} \geq r_{\pi '}. \, </math>

'''Theorem 2:''' If there are a sequence of priors <math> \pi_n\,\!</math> and an estimator <math>\delta\,\!</math> such that
<math>\sup_{\theta} R(\theta,\delta)=\lim_{n \rightarrow \infty} r_{\pi_n} \,\!</math>, then :

#<math>\delta\,\!</math> is minimax.
#The sequence <math>{\pi}_n\,\!</math> is least favorable.

Notice that no uniqueness is guaranteed here. For example, the ML estimator from the previous example may be attained as the limit of Bayes estimators with respect to a [[Uniform distribution (continuous)|uniform]] prior, <math>\pi_n \sim U[-n,n]\,\!</math> with increasing support and also with respect to a zero mean normal prior <math>\pi_n \sim N(0,n \sigma^2) \,\!</math> with increasing variance. So neither the resulting ML estimator is unique minimax nor the least favorable prior is unique.

'''Example 2:''' Consider the problem of estimating the mean of <math>p\,\!</math> dimensional [[Normal distribution|Gaussian]] random vector, <math>x \sim N(\theta,I_p \sigma^2)\,\!</math>. The [[Maximum likelihood]] (ML) estimator for <math>\theta\,\!</math> in this case is simply <math>\delta_{ML}=x\,\!</math>, and its risk is

: <math>R(\theta,\delta_{ML})=E{\|\delta_{ML}-\theta\|^2}=\sum \limits_1^p E{(x_i-\theta_i)^2}=p \sigma^2. \, </math>

[[Image:MSE of ML vs JS.png|thumb|right|350px|MSE of maximum likelihood estimator versus James–Stein estimator]]

The risk is constant, but the ML estimator is actually not a Bayes estimator, so the Corollary of Theorem 1 does not apply. However, the ML estimator is the limit of the Bayes estimators with respect to the prior sequence <math>\pi_n \sim N(0,n \sigma^2) \,\!</math>, and, hence, indeed minimax according to Theorem 2 . Nonetheless, minimaxity does not always imply [[Admissible decision rule|admissibility]]. In fact in this example, the ML estimator is known to be inadmissible (not admissible) whenever <math>p >2\,\!</math>. The famous [[James–Stein estimator]] dominates the ML whenever <math>p >2\,\!</math>. Though both estimators have the same risk <math>p \sigma^2\,\!</math> when <math>\|\theta\| \rightarrow \infty\,\!</math>, and they are both minimax, the James–Stein estimator has smaller risk for any finite <math>\|\theta\|\,\!</math>. This fact is illustrated in the following figure.

==Some examples==
In general it is difficult, often even impossible to determine the minimax estimator. Nonetheless, in many cases a minimax estimator has been determined.

'''Example 3, Bounded Normal Mean:''' When estimating the Mean of a Normal Vector <math>x \sim N(\theta,I_n \sigma^2)\,\!</math>, where it is known that <math>\|\theta\|^2 \leq M\,\!</math>. The Bayes estimator with respect to a prior which is uniformly distributed on the edge of the bounding [[sphere]] is known to be minimax whenever <math>M \leq n\,\!</math>. The analytical expression for this estimator is

:<math>\delta^M=\frac{nJ_{n+1}(n\|x\|)}{\|x\|J_{n}(n\|x\|)}, \, </math>

where <math>J_{n}(t)\,\!</math>, is the modified [[Bessel function]] of the first kind of order ''n''.

==Asymptotic minimax estimator==

The difficulty of determining the exact minimax estimator has motivated the study of estimators of asymptotic minimax --- an estimator <math>\delta'</math> is called <math>c</math>-asymptotic (or approximate) minimax if

:<math>\sup_{\theta\in\Theta} R(\theta,\delta')\leq c \inf_\delta \sup_{\theta \in \Theta} R(\theta,\delta).</math>

For many estimation problems, especially in the non-parametric estimation setting, various approximate minimax estimators have been established. The design of approximate minimax estimator is intimately related to the geometry, such as the [[metric entropy number]], of <math>\Theta</math>.

==Relationship to Robust Optimization==
[[Robust optimization]] is an approach to solve optimization problems under uncertainty in the knowledge of underlying parameters,.<ref name=kassam/><ref name=ben_tal/> For instance, the [[Minimum mean square error|MMSE Bayesian estimation]] of a parameter requires the knowledge of parameter correlation function. If the knowledge of this correlation function is not perfectly available, a popular minimax robust optimization approach<ref name=verdu/> is to define a set characterizing the uncertainty about the correlation function, and then pursuing a minimax optimization over the uncertainty set and the estimator respectively. Similar minimax optimizations can be pursued to make estimators robust to certain imprecisely known parameters. For instance, a recent study dealing with such techniques in the area of signal processing can be found in.<ref name=nisar_book/>

In R. Fandom Noubiap and W. Seidel (2001) an algorithm for calculating a Gamma-minimax decision rule has been developed,
when Gamma is given by a finite number of generalized moment conditions. Such
a decision rule minimizes the maximum of the integrals of the risk function
with respect to all distributions in Gamma. Gamma-minimax decision rules are of interest in robustness studies in Bayesian statistics.

==References==
*E. L. Lehmann and G. Casella (1998), ''Theory of Point Estimation,'' 2nd ed. New York: Springer-Verlag.
*F. Perron and E. Marchand (2002), "On the minimax estimator of a bounded normal mean," ''Statistics and Probability Letters'' '''58''': 327–333.
*J. O. Berger (1985), ''Statistical Decision Theory and Bayesian Analysis,'' 2nd ed. New York: Springer-Verlag. ISBN 0-387-96098-8.
*R. Fandom Noubiap and W. Seidel (2001), ''An Algorithm for Calculating Gamma-Minimax Decision Rules under Generalized Moment Conditions,'' Annals of Statistics, Aug., 2001, vol. 29, no. 4, p. 1094–1116
* {{Cite journal
|first=C. |last=Stein |authorlink=Charles Stein (statistician)
|year=1981
|title=Estimation of the mean of a multivariate normal distribution
|journal=[[Annals of Statistics]]
|volume=9 |issue=6 |pages=1135–1151
|doi=10.1214/aos/1176345632 |mr=630098 | zbl = 0476.62035
}}
{{Reflist|refs=
<ref name=verdu>S. Verdu and H. V. Poor (1984), "On Minimax Robustness: A general approach and applications," IEEE Transactions on
Information Theory, vol. 30, pp. 328–340, March 1984.</ref>
<ref name=kassam>S. A. Kassam and H. V. Poor (1985), "Robust Techniques for Signal Processing: A Survey," Proceedings of the IEEE, vol. 73, pp. 433–481, March 1985.</ref>
<ref name=ben_tal>A. Ben-Tal, L. El Ghaoui, and A. Nemirovski (2009), "Robust Optimization", Princeton University Press, 2009.</ref>
<ref name=nisar_book>M. Danish Nisar. [http://www.shaker.eu/shop/978-3-8440-0332-1 "Minimax Robustness in Signal Processing for Communications"], Shaker Verlag, ISBN 978-3-8440-0332-1, August 2011.</ref>
}}

{{DEFAULTSORT:Minimax Estimator}}
[[Category:Decision theory]]
[[Category:Estimation theory]]

formulasearchengine - User contributions [en]

Essential dimension