German tank problem

In the statistical theory of estimation, the German tank problem consists of estimating the maximum of a discrete uniform distribution from sampling without replacement. In simple terms, suppose there exists an unknown number of items which are sequentially numbered from 1 to N. A random sample of these items is taken and their sequence numbers observed; the problem is to estimate N from these observed numbers.

The problem can be approached using either frequentist inference or Bayesian inference, leading to different results. Estimating the population maximum based on a single sample yields divergent results, whereas estimation based on multiple samples is a practical estimation question whose answer is simple (especially in the frequentist setting) but not obvious (especially in the Bayesian setting).

The problem is named after its historical application by Allied forces in World War II to the estimation of the monthly rate of German tank production from very limited data. This exploited the manufacturing practice of assigning and attaching ascending sequences of serial numbers to tank components (chassis, gearbox, engine, wheels), with some of the tanks eventually being captured in battle by Allied forces.

Suppositions

The adversary is presumed to have manufactured a series of tanks marked with consecutive whole numbers, beginning with serial number 1. Additionally, regardless of a tank's date of manufacture, history of service, or the serial number it bears, the distribution over serial numbers becoming revealed to analysis is uniform, up to the point in time when the analysis is conducted.

Example

Assuming tanks are assigned sequential serial numbers starting with 1, suppose that four tanks are captured and that they have the serial numbers: 19, 40, 42 and 60.

A frequentist approach (using the minimum-variance unbiased estimator) predicts the total number of tanks produced will be:

N\approx 74

A Bayesian approach (using a uniform prior over the integers in for any suitably large $\Omega$ ) predicts that the median number of tanks produced will be very similar to the frequentist prediction:

N_{med}\approx 74.5

whereas the Bayesian mean predicts that the number of tanks produced would be:

N_{av}\approx 89

Let $N$ equal the total number of tanks predicted to have been produced, $m$ equal the highest serial number observed and $k$ equal the number of tanks captured.

The frequentist prediction is calculated as:

N\approx m+{\frac {m}{k}}-1=74

The Bayesian median is calculated as:

N_{med}\approx m+{\frac {m\ln(2)}{k-1}}=74.5

The Bayesian mean is calculated as:

N_{av}\approx (m-1){\frac {k-1}{k-2}}=89

These Bayesian quantities are derived from the Bayesian posterior distribution:

\Pr(N=n)={\begin{cases}0&{\text{if }}n<m\\{\frac {k-1}{k}}{\frac {\binom {m-1}{k-1}}{\binom {n}{k}}}&{\text{if }}n\geq m,\end{cases}}

This probability mass function has a positive skewness, related to the fact that there are at least 60 tanks. Because of this skewness, the mean may not be the most meaningful estimate. The median in this example is 74.5, in close agreement with the frequentist formula. Using Stirling's approximation, the posterior may be approximated by an exponentially decaying function of n,

\Pr(N=n)\approx {\begin{cases}0&{\text{if }}n<m\\(k-1)m^{k-1}n^{-k}&{\text{if }}n\geq m,\end{cases}}

which results in the following approximation for the median:

N_{med}\approx m+{\frac {m\ln(2)}{k-1}}

and the following approximations for the mean and standard deviation:

{\begin{aligned}N&\approx \mu \pm \sigma =89\pm 50,\\\mu &=(m-1){\frac {k-1}{k-2}},\\\sigma &={\sqrt {\frac {(k-1)(m-1)(m-k+1)}{(k-3)(k-2)^{2}}}}.\end{aligned}}

Navigácia: Veda >

Analytika
Antropológia
Aplikované vedy
Bibliometria
Dejiny vedy
Encyklopédie
Filozofia vedy
Forenzné vedy
Humanitné vedy
Knižničná veda
Kryogenika
Kryptológia
Kulturológia
Literárna veda
Medzidisciplinárne oblasti
Metódy kvantitatívnej analýzy
Metavedy
Metodika

Metodológia vedy
Náboženstvo a veda
Náučná literatúra
Podvody vo vede
Popularizácia vedy
Potravinárstvo
Prírodné vedy
Pseudoveda
Scientometria
Spoločenské vedy
Teórie
Teatrológia
Technické vedy
Technika
Terminológia
Umenie
Výskum

Veda
Veda a technika podľa štátu
Veda a technika podľa kontinentu
Veda a technika podľa roka
Veda v kozme
Vedci
Vedecká literatúra
Vedecké databázy
Vedecké experimenty
Vedecké konferencie
Vedecké metódy
Vedecké ocenenia
Vedecké organizácie
Vedecké parky
Vedeckí spisovatelia
Vzdelávanie
Záhady

Príbuzné výrazy:

Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok.
Podrobnejšie informácie nájdete na stránke Podmienky použitia.