Courtesy of Sixthform.info Maths, a rather unusual sports article. It seems that some gambling company is predicting the outcome of World Cup matches in a manner which involves a Poisson distribution. At least, that's how it started out; by the time the newspaper got at it, we have:
For those with a degree in statistics: in the equation, 'n' is the number of goals scored, 'lambda' is the expected number of goals, 'e' is a natural logarithm and the exclamation mark is 'factorial', a function of 'n'. P is the probability distribution of goals scored. Well, we said you needed a degree.I'm pretty sure that I studied this in high school, sans statistics degree, and I'm pretty sure that at the time 'e' did not represent the natural logarithm; quite the opposite, in fact. But I'm getting ahead of myself. Let's talk about some probability distributions.
The simplest example of probability would be a fair coin flip, in which there's a 50-50 chance that a coin is going to come up heads or tails. I'd tell you to ignore the possibility of the coin landing on its side, but the last time someone said that the coin actually did land on its side. (It's true, I have a buddy that saw it with his own eyes.) Anyway, if we agree to call heads "state 1" and tails "state 2," we can write P(1) = 1/2, meaning that the probability is 1/2 that the coin will end up in state 1, and P(2) = 1/2, meaning that the probability is 1/2 that the coin will end up in state 2. A more complicated example would be the sum of two dice. There are 6 * 6 = 36 possible outcomes, of which only one will give a total of 2 on both dice. Thus, P(2) = 1/36. On the other hand, there are all sorts of ways to make 7: 1 + 6, 2 + 5, 3 + 4, 4 + 3, 5 + 2, and 6 + 1, so we write P(7) = 6/36 = 1/6.
But what if you want to know the probability of something more complicated? For instance, what's the probability that you flip a coin twelve times and it comes up heads exactly eight of them? Sure, you could try to do something like we did with the dice above, but without a healthy framework, you could drive yourself crazy. If you look a little harder, though, there's a nice mathematical way to handle this: break the problem in half.
First, what's the probability that the outcome is going to be HTHHHTHHTHHT? Well, the probability of the coin coming up heads each time is 1/2, as it the probability of coming up tails; so the overall probability is (1/2) * (1/2) * ... (1/2) = (1/2)^12 = 1/1024. Of course, this is the same probability as for, say, HTHHHTTHTHHH or HHTHHTHHTHHT or any other string of twelve outcomes of coin flips. This means that if we can figure out how many different ways there are for 12 coin flips to come up heads exactly 8 times, then we're in good shape.
If you haven't seen it before, the answer is called the choose function or binomial coefficient (so named because if you expand a binomial raised to a power, like (x+y)n, they appear in the coefficients of the polynomial). We write C(12,8) = 12!/(8! (12-8)!); if you stare at this expression for a minute, you should be able to see what it's doing and why it works. (Hint: what does 12!/(12-8)! alone represent?) So, overall, we have P(m) = C(12, m) * (1/1024) for 0 < m < n.
Generalizing this to n tries with probability p, we have P(m) = C(n, m) pm (1-p)n-m. (If p is the probability of "heads," then 1-p is the probability of "tails," whatever "heads" and "tails" mean in our situation.) And now it's time to bring in a little calculus.
Let's forget what we were talking about before for a minute and suppose that we have some event which occurs, on average, n times each minute, and we want to know the probability that it will occur m times in some given minute. We can use our model from above to approximate this as follows: choose some number of pieces S to divide the minute into; then we expect the event to occur n/S times during each piece, and if S > n we can even suppose that n/S is the probability that the event will occur during that interval. Plugging this into our model above, P(m) = C(S, m) (n/S)m (1-n/S)S-m. The smaller we make the divisions, the better model we have, so let's let S go to infinity.

The notation that looks like a fraction without a bar is simply another notation for the choose function C(S, m). Looking at the terms on the right hand side, it's clear that the one on the far right will simply go to 1 as S increases; and the term next to it on the left is the very definition of e-n. This leaves us with

Now let's think about that term on the inside there. We have C(S,m) = S(S-1)(S-2)...(S - m + 1) / m! so as S -> oo the m terms are going to be negligible. That is to say, C(S,m) behaves approximately like Sm/m! in this limit, so dividing out the Sm will give us a value of 1/m! for this limit. All in all, we get
for our distribution. So if England has been averaging 3 goals a game, and we assume that their opponent hasn't got anything to do with how many goals they score (it seems like a strange assumption, but apparently one that works out for the people in the article -- you remember the article, right?) then the chance that they score only 2 goals this game will just be e-3 32 / 2! = 22.4% by this model.

1 comments:
On a somewhat related note (but not really), there's an article in the Feb. 2000 American Mathematical Monthly about seeding strategies for knockout tournaments to maximize fairness and public interest in the gaming schedule: "What is the Correct Way to Seed a Knockout Tournament".
Post a Comment