gauss.m
(Sum of uniform distributions)


1 2 3 4 5 6
1 1 1 1 1 1

2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
10 15 21 25 27 27 25 21 15 10

Suppose we roll two dice and consider the sum that appears on their faces. The smallest such sum is two and the largest is twelve. Now consider how many ways we can achieve each of these sums. For example, for a sum of six, there are five ways - (1 5) (5 1) (2 4) (4 2) (3 3). The second table above shows the results when we count these ways for each possible sum. Note that if we add up all the blue numbers in this table we get 36 which makes sense because there are 62 possible outcomes from rolling two dice. So if we divide the blue numbers by 36 we will have the probability that each sum occurs each time we roll the dice. We don't even have to plot this one to see that the shape of this function is two sides of a triangle (i.e. a line with positive slope followed by a line with a negative slope)

The first table above is for a single die and hardly needs mentioning since there is only one way of achieving each of the sums. The probability distribution we get when we divide by six is known as a uniform distribution (i.e. all the probabilities are the same)

The third table shows the results when we count the ways to achieve each possible sum when three dice are rolled. For example, for a sum of six, there are ten ways - (2 2 2) (1 1 4) (1 4 1) (1 1 4) (1 2 3) (1 3 2) (2 1 3) (2 3 1) (3 1 2) (3 2 1). Note that if we add the blue numbers in this table we get 216 (63) as we expect.


This is the result when we plot the counts in this last table on the y axis vs. the sums on the x axis, and can be thought of as the sum of three uniform probability distributions.

If you have seen plots of Gaussian distributions (also called normal distributions) this curve will look familiar to you - with its concave downwards shape near the peak and its concave upwards shape near the tails. It's not precisely Gaussian, but it is close enough to fool the eye. If we were to plot the results for four dice, the shape would be even closer to Gaussian.

So why is this happening? The result can be expected once we understand The Central Limit Theorem, a remarkable result that can't easily be attributed to a single mathematician. (Contributions are from de Moivre, Laplace, Lyapunov, Polya, Cauchy, Bessel, Poisson and others.) The theorem tells us that when you take repeated (infinite) samples from the same uniform distribution, the result is characterized by the equation e-x2 (known as the normal distribution). Actually, the theorem is more general than that, stating that as long as the repeated samples are taken from the same distribution, it doesn't matter what this distribution is, the final result will still be normal. (The generalization to any distribution is perhaps the most remarkable aspect of the theorem.)


When you start the gauss.m application this is what you will see (without the yellow zoom box which I created after the application started). gauss.m does essentially the same thing as the experiment above with the dice, except instead of starting with a length six uniform distribution, here we use a length 100 uniform distribution. And instead of combining just 3 of those distributions we combine 10 of them. However, when this application starts, only the first four traces are enabled. The first trace is the true normal distribution calculated using its exponential formula (e-x2) and is shown as a white dotted line. The next 3 traces are what you get from combining 2,3, or 4 of these uniform distributions using a convolution (an easier way to solve the counting problem mentioned above where we count the number of ways each outcome can be achieved). By the time we get up to combining 4 of the distributions (the red line of trace 4), the result is so close to the dotted white line that it is hard to see the difference. So at least for the default zoomed-out overall view, we don't need to enable the remaining traces.

Since the traces are so close together, you probably would think of zooming in on a portion of the display so we can see the differences clearly. Here I created the yellow zoom box around one portion of the plot. This is typically done with a double click and a drag of the mouse. There are other ways to create the zoom box and to move it around once it's created, and you can find out about this from the Zooming and panning section. As you will learn in that section, normally after creating the zoom box, you click inside or somewhere near the zoom box (but not on the zoom box itself) and the display limits will be updated so that the entire plot area is filled with the data that was inside the zoombox. From that section, you will also learn about the magnifying lens mode where an additional figure is created to show the data inside the expansion box.

If you haven't yet experimented with the lens mode, now would be a good time. BEFORE you create the expansion box, RIGHT click on the button labeled with an "o" in the 4 button group in the lower left corner. Then create an expansion box and observe that a new magnifying lens type of figure is created.

Now turn off the lens mode by right clicking on the "o" button again and close the lens figure. Then create another expansion box such as the one in this figure.



Usually creating an expansion box with the lens mode turned off will not open a magnifying lens figure, but in this application it does. That's because inside gauss.m we have included a function called lens and have enabled the function by including the 'MotionZoom',@lens parameter in the plt argument list. The lens function inside gauss.m is enabled whenever plt's lens mode is turned off. In this example the internal gauss lens function is quite similar to the plt lens function, but not exactly the same. For example, notice that below the TraceID box there is an additional box containing 10 numbers (one number for each trace). This number is the area under each curve between the expanded x limits (.433549 & .746673 in this example). For the first number (.10246 in white) this represents the probability that a number chosen from a Gaussian distribution of zero mean and unit variance will fall between .433549 and .746673. The other nine areas associated with the other nine traces (Sum2 thru Sum10) only approximate this probability and is noticeably inaccurate for the second trace (purple - Sum 2) since that is the convolution of just two of the uniform distributions. Note that as we drag the expansion box left or right in the main window, the area under the curve numbers are continually updated just like the traces are. (The area numbers don't change if we drag the expansion box vertically, the numbers include the areas down to the x axis.)

Now that we have zoomed in so much with the lens figure we can easily see the separation between the four traces and so you may want to enable all the traces in this view. Simply click on the trace name in the TraceID box and that particular trace will be enabled. Or double click on any trace name in the TraceID box and all 10 traces will be enabled.

This unique lens figure was included partly so that we could display this extra information (the areas), but mostly to show how easy it is to create an application where the lens mode does anything we want.



To remind you that you can see this zoomed in view by opening an expansion box, this help text appears in the upper left corner of the plot. To reduce clutter, this help text disappears as soon as you click anywhere inside the plot, although you can re-enable this help text by right clicking on the Help tag in the menu box.



Another way to see how the 10 traces differ is to plot the error functions (i.e. how each trace differs from the true Gaussian function). Since trace 1 is the Gaussian function itself, its error function is not interesting (i.e. zero) so it isn't plotted. To display the error functions simply check the "Plot errors only" checkbox to the left of the plot which will produce the plot shown here. (All 10 traces have been enabled to produce this figure, which again is most easily done by double clicking on any of the TraceID items.) Unchecking the checkbox will return the display to its original form.

Note that there is a button just below the checkbox called "Cum" which stands for "Cumulative". When you click it, a new figure window appears which looks like the one shown below.



Here we have again only enabled the first four traces, but of course, you can enable all the traces or any subset of them from the TraceID box. The first trace (dashed white line) is the integral of the normal distribution and so it represents the probability that a randomly chosen sample from the normally distribution is less than the value on the x axis. So for example, at x = 0 the curve has a value of 0.5 which means that there is a 50% chance that the random sample will be less than 0. This makes sense because we have been considering a distribution with a mean of zero. As x approaches infinity, the dashed curve approaches one. This also makes sense because there is a 100% chance that a random sample chosen from the distribution is less than infinity.

The remaining curves (Sum2 thru Sum10) are approximations of this cumulative distribution. This approximation becomes more accurate as we add in more uniform distributions, so if you enable just the first and last traces you will see that they are quite similar.



Copyright © 2024
Paul Mennen