**The Law of Large Numbers and the Central Limit Theorem: A Polling
Simulation**

** **

** **

**Nov. 24, 2008**

** **

**This is for those who say:
"Math was my worst subject in high school". If you've ever placed a
bet at the casino, at the track or played the lottery, you already know the
basics. It's about probability. It's about common sense. It's not all that
complicated.**

** **

**It's for Excel spreadsheet
users who enjoy creating math models. The Excel model can be downloaded:**

**Monte
Carlo Polling Simulation Excel Model**

** **

**It's for reporters, blogs and
politicians who seek the truth: Robert Koehler, Brad Friedman, John Conyers,
Barbara Boxer, Mark Miller, Fitrakis, Wasserman, Kathy Dopp,
Steve Freeman, Ron Baiman, Jonathan Simon, Alistair
Thompson, Paul Krugman, Keith Olbermann,
Mike Malloy, Randi Rhodes, Thom Hartman Stephanie Miller, Joseph Cannon, Sam
Seder, Janeane Garofalo,
etc.**

** **

**It's for those who have taken
algebra, probability or statistics and want to see how the math is applied to
election polling.**

** **

**It's for graduates with degrees
in mathematics, political science, an MBA, etc. who may or may not be familiar
with simulation concepts. Simulation is a powerful tool for analyzing
uncertainty in simple and complex models. Like in coin
flipping and election polling.**

** **

**It's for browsers who frequent
Discussion Forums.**

** **

**It's for those Corporate Media
reporters who are still waiting for editor approval to discuss the documented
evidence of election fraud, statistical and anecdotal in all elections since
2000.**

** **

**In Selection 2000, Gore won the
popular vote by 540,000. But Bush won the election by a single vote. SCOTUS voted along party lines: Bush 5, Gore 4. That stopped
the **

** **

**It's for the exit poll
naysayers who promote faith-based hypothetical arguments in their unrelenting
attempts to debunk the accuracy of the pre-election and exit polls.**

**________________________________________________________________________**

** **

**FALSE RECALL, RELUCTANT
RESPONDERS, HOW THEY VOTED IN 2000: IMPLAUSIBLE, CONTRADICTORY AND MATHEMATICALLY
IMPOSSIBLE**

** **

**Naysayers have a problem with
the 2004 pre-election and exit polls. Regardless of how many were taken or how
large the samples, the results are never good enough for them. They prefer to
cite two implausible hypotheticals: Bush non-responders
(rBr) and Gore voter memory lapse ("false
recall").**

** **

**How do pollsters handle
non-responders? They just increase the sample-size! Furthermore, statistical
studies show that there is no discernible correlation between non-response
rates and survey results.**

** **

**How do pollsters handle
"false recall"? They know that in a large sample, forgetfulness on
the part of Gore and Bush voters will cancel out! There is no evidence that
Gore voters forget any more than Bush voters.**

** **

**On the contrary, if someone you
knew robbed you in broad daylight, would you forget who it was four years
later? In 2000, Gore and the voters were robbed in broad daylight.**

** **

**Naysayers claim that bias
favored Kerry in the pre-election and exit polls. Yet they offer no evidence to
back it up. They claim that Gore voters forgot and told the exit pollsters they
voted for Bush in 2000. It's their famous "false recall"
hypothetical. They were forced to use it when they could not come up with a
plausible explanation for the impossible weightings of Bush and Gore voter
turnout in the Final National Exit poll. **

** **

**According to the final 2004
NEP, which Bush won by 51-48%, 43% of the 13660 respondents voted for Bush in
2000 while only 37% voted for Gore. This contradicts
the reluctant Bush responder (rBr) hypothesis.
Furthermore, 43% of the 122.3 million who voted in 2004 is 52.57mm, yet Bush
only got 50.45 mm votes in 2000. The 43/37% split is a mathematical
impossibility.**

** **

**In addition, approximately 1.75
mm Bush 2000 voters died prior to the 2004 election. Therefore, no more than
48.7 mm of Bush 2000 voters could have turned out to vote in 2004. The Bush
2000 voter share was 48.7/122.3 (or 39.8%), assuming that all of the Bush 2000
voters still living came to the polls. These mathematical facts are beyond
dispute. Kerry won the final 1:25pm exit poll by 50.93-48.66%, assuming equal
39.8% weights.**

** **

**For the same reason, Kerry must
have done even better than his 51.4-47.6% winning margin at the 12:22am
timeline (13047 respondents). Here the Bush/Gore mix was 41/39%. But we have
just shown that 39.8% was the absolute maximum Bush share. If we apply equal weightings to the
12:22am results, then Kerry won by 52.25-46.77%, a 6.7 million vote margin
(63.8-57.1mm).**

** **

**First-time voters and those who
sat out the 2000 election, as well as Nader and Gore 2000 voters, were
overwhelming Kerry voters. The recorded Bush 2004 vote was 62 million. Where
did he get the 13 million new voters from 2000? How do the naysayers explain
it? Only by ignoring the mathematical facts and raising new implausible
theories.**

** **

**It’s time to put on the
defoggers. We’ve had enough disinformation, obfuscation and misrepresentation.
Let the sunshine in. Let's review the basics.**

** **

**________________________________________________________________________**

** **

**A COIN-FLIP EXPERIMENT**

** **

**Consider this experiment. Flip
a fair coin 10 times. Calculate the percentage of heads. Write it down.
Increase to 20 flips. Calculate the new total percentage. Write it down.**

** **

**Keep flipping. Write down the
percentage after every ten flips. Stop at 100. That's our final coin flip
sample-size.**

** **

**When you're all done, check the
percentages. Is the sequence converging to 50%? That’s the true population mean
(average). That's the Law of Large Numbers.**

** **

**The coin-flip is easily
simulated in Excel. Likewise, in the polling simulations which follow, we will
analyze the result of polling experiments over a range of trials (sample size).**

** **

**_____________________________________________________**

** **

**THE MATHEMATICAL FOUNDATION**

** **

**This model demonstrates the Law
of Large Numbers (LLN). LLN is the foundation and bedrock of statistical
analysis. LLN is illustrated through simulations of polling samples. In a
statistical context, LLN states that the mean (average) of a random sample
taken from a large population is likely to be very close to the (true) mean of
the population.**

** **

**Start of math jargon alert...**

**In probability theory, several
laws of large numbers say that the mean (average) of a sequence of random
variables with a common distribution converges to their common mean as the size
of the sequence approaches infinity.**

** **

**The Central Limit Theorem (CLT)
is another famous result .The sample means (averages) of an independent series
of random samples (i.e. polls) taken from the same population will tend to be
normally distributed (the bell curve) as the number of samples increase. This
holds for ALL practical statistical distributions.**

**End of math jargon alert....**

** **

**It's really not all that
complicated. Naysayers never consider LLN or CLT. They maintain that polls are
not random-samples. They would have us believe that professional pollsters are
incapable of creating accurate surveys (i.e. effectively random samples)
through systematic, clustered or stratified sampling, especially when Bush is
running.**

** **

**LLN and CLT say nothing about
bias.**

** **

**________________________________________________________________**

** **

**POLLING SAMPLE-SIZE**

** **

**Just like in the above
coin-flipping example, the Law of Large Numbers takes effect as poll
sample-size increases. That's why the National Exit Poll was designed to survey
at least 13000 respondents.**

** **

**Note the increasing sequence of
polling sample size as we go from the pre-election state (600) and national
(1000) polls to the state and national exit polls: Ohio (1963), Florida (2846)
and the National (13047).**

** **

**Here is the National Exit Poll
Timeline:**

**Updated; respondents; vote
share**

**3:59pm: 8349; Kerry led 51-48**

**7:33pm: 11027; Kerry led 51-48**

**12:22am:13047;
Kerry led 51-48**

** **

**1:25pm: 13660
; Bush led 51-48**

**The final was matched to the
vote.**

** **

**So much for letting LLN and CLT
do their magic.**

** **

**________________________________________________________________**

** **

** **

**USING RANDOM NUMBERS TO
SIMULATE A SEQUENCE OF POLLS**

** **

**Random number simulation is the
best way to illustrate LLN:**

**1) Assume a true 2-party vote
percentage for Kerry (i.e. 52.6%).**

**2) Simulate a series of 8 polls
of varying sample size.**

**3) Calculate the sample mean
vote share and win probability for each poll.**

**4) Confirm LLN by noting that
as the poll sample size increases,**

**the**** sample mean (average) converges to the population mean
("true" vote).**

** **

**It's just like flipping a coin.**

**Assume there is a p =52.6%
probability that a random poll respondent voted for Kerry (HEADS).**

**This represents Kerry's TRUE
vote (his population mean)**

**Bush is TAILS with a 47.4%
(1-p) probability.**

** **

**A random number (RN) between
zero and one is generated for each respondent.**

**If RN is LESS than Kerry's TRUE
share, the vote goes to Kerry.**

**If RN is GREATER than Kerry's
TRUE share, the vote goes to Bush.**

** **

**For example, assume Kerry's
TRUE 52.6% vote share (.526).**

**If RN is less than .526, Kerry's poll
count is increased by one.**

**If RN is greater than .526,
Bush's poll count is increased by one.**

** **

**The sum of Kerry's votes is
divided by the poll sample (i.e. 13047). This is Kerry's simulated 2-party vote
share. It approaches his TRUE 52.6% vote share as poll samples increase.**

** **

**The LLN works in polling the
same way as in the coin flip experiment.**

** **

**________________________________________________________________**

** **

**THE STATE ELECTORAL VOTE
SIMULATION**

** **

**In addition to simulating
Kerry's popular 2-party vote, the model also includes a State Electoral Vote
(EV) Simulator. The method is similar to the previous National polling samples,
with this exception:**

**Each simulation consists of 100
election trials.**

** **

**When the F9 key is pressed, one
hundred Monte Carlo simulation election trials
are executed for each of the 50 states and DC. In each trial, a random number
(RN) is generated for each state.**

** **

**The RN is compared to the
probability of Kerry winning the state. If RN is less than the probability, the
state EV is added to his total. If RN is greater, Bush wins the state.**

** **

**If Kerry's total EV exceeds
269, he wins the election trial.**

** **

**For example:**

**1) Assume that Kerry and Bush
were tied in the FL exit poll.**

**Therefore, the probability that
Kerry would win FL is 50%.**

**If RN is less than 0.50, Kerry
wins FL 27 electoral votes.**

** **

**2) Assume that Kerry won the CA
exit poll by 55-45%.**

**The probability of winning the
state was 99.9%.**

**If RN is less than .999, Kerry
wins CA 55 electoral votes.**

** **

**Kerry's total number of winning
election trials (out of the 100) is his expected (mean) electoral vote win
probability. In addition to Kerry's expected mean EV (average), his median
(middle), maximum and minimum electoral vote is calculated for the 100 trials.**

** **

**Kerry's state win probability
is calculated using the Excel Normal Distribution Function. Inputs to the NDF:**

**1) Kerry's 2-party share of the
state exit poll**

**2) the
standard deviation Stdev = MoE/1.96**

** MoE
is the poll Margin of Error.**

** **

**__________________________________________________________________**

** **

** **

**THE MARGIN OF ERROR**

** **

**The MoE
(at the 95% confidence level) is the interval surrounding the sample mean which
has a 95% probability
of containing the TRUE population mean.**

** **

**For example, assume a 2% MoE for a state exit poll won by Kerry: 52-48%. The
probability is 95% that Kerry's TRUE vote is in the interval from 50% to 54%.
The (one tail) probability is 97.5% that Kerry's vote will exceed the interval
lower limit of 50%.**

** **

**This is the standard formula
used to calculate the MoE:**

** **

**MoE**** = 1.96 * sqrt (p*(1-p)/n) * DE**

**n**** is the sample size,**

**p**** and 1-p are the 2-party vote shares.**

** **

**DE is the exit poll
"design effect"**** ****ratio of the total number of repondents required using cluster ^{ }random sampling to the
number required using simple random sampling. A cluster randomized trial^{ }which has a large design
effect will require many more samples.^{ }As
the number of respondents increases so does the^{ }design effect. We
can only estimate the impact of the DE on the MoE. But DE is only a factor in exit polls.
There is no equivalent adjustment made to the MoE in
pre-election or approval polls.**

** **

**The MoE
decreases as the sample-size (n) increases while the sample poll mean
approaches the population mean. It's the Law of Large
Numbers. For a given n
sample, the MoE is at it's
maximum value when p =0.50. As
p increases, the MoE declines. In the p-o.50 case,
the formula can be simplified to: MoE = 1.96 * .5 / sqrt (n) =.98 / sqrt (n)**

** **

**Let's calculate the MoE for the 12:22am National Exit Poll:**

**n = 13047 sampled respondents**

**p = Kerry's true 2-party vote
share = .526**

**1-p = Bush's vote share = .474**

** **

**MoE**** = 1.96 * sqrt (.526*.474/13047)= .0086 = 0.86%**

**Adjusting for an assumed 30%
exit poll cluster design effect,**

**MoE**** = 1.30*0.86% = 1.12%**

** **

**Pollsters use proven
methodologies, such as cluster sampling, stratified sampling, etc. to attain a
near-perfect random sample. Why would a polling firm include the MoE for a poll that was not an effective random sample?**

** **

**________________________________________________________________**

** **

** **

**CALCULATING PROBABILITIES**

** **

**Kerry win probabilities are the
main focus of the simulation. They closely match the theoretical probabilities
obtained from the Excel Normal Distribution function.**

** **

**The probabilities are
calculated using two methods:**

**1) running
the simulation and counting the votes**

**2) calculating
the Excel Normal Distribution function**

** **

**Prob**** = NORMDIST (P, V, Stdev, true)**

**P = .526 is the mean Kerry poll
vote share**

**V = 0.50 is the majority vote
threshold.**

**Stdev**** = MoE/1.96. The standard deviation is a measure of
dispersion around the mean.**

** **

**Given that Kerry's led by 3% in
the 2-party vote (12:22am National Exit Poll), his popular vote win probability
was close to 100%. And that assumes a 30% cluster effect!**

** **

**For a 2% lead (51-49), the win
probability is 97.5% (still very high).**

**For a 1% lead (50.5-49.5), it's
81% (4 out of 5).**

**For a 50/50 tie, it’s 50%.**

** **

**The following probabilities are
calculated in the model:**

**1) The confidence level for
Kerry's minimum vote share (MVS).**

**There is a 97.5% probability
that Kerry's true vote exceed MVS.**

**The MVS increases as the
polling sample size grows.**

** **

**2) The probability of Bush
obtaining his recorded two-party vote (51.24%).**

**The probability is virtually
zero that Bush's recorded vote would be almost 4% higher than his 47.4%
two-party share.**

** **

**3) The probability of the state
exit poll discrepancy from the recorded vote is a function of the magnitude of
the deviation, the MoE and cluster effect. The normal
distribution is used to calculate the probability.**

** **

**4) The probability that the MoE is exceeded in any given state is 1 in 40. The
probability that the MoE is exceeded in at least N
states is calculated using the binomial distribution function. The cluster
effect makes a big difference in the probability calculation. As the cluster
effect is increased, so does the MoE and is therefore
less likely to be exceeded.**

** **

**Assuming a 30% cluster effect,
the vote discrepancy exceeded the exit poll MoE for
Bush in 10 states. The probability of this occurrence is 1 in 2.5 MILLION.**

** **

**Assuming a 20% cluster effect,
the MoE was exceeded in 13 states, a 1 in 4.5 BILLION
probability.**

** **

**For a cluster effect of 12% or
less, the MoE was exceeded in 16 states, a 1 in 19
TRILLION probability!**

**_______________________________________________________________**

** **

** **

**SIMULATION GRAPHICS **

** **

**http://www.richardcharnin.com/MonteCarloPollingSimulation_26397_image001.gif**

** http://www.richardcharnin.com/MonteCarloPollingSimulation_22578_image001.gif**

**http://www.richardcharnin.com/MonteCarloPollingSimulation_21396_image001.gif**

**________________________________________________________________**

** **

**DOWNLOADING THE EXCEL MODEL AND
RUNNING THE SIMULATION**

** **

**http://richardcharnin.com/MonteCarloPollingSimulation.zip**

** **

**Two inputs drive the state and
national vote simulations:**

**1) Kerry's 2-party true vote share
(52.6%)**

**2) exit
poll cluster effect (set to 30%).**

** **

**Press F9 to run the simulation.**

**The graphs illustrate polling
simulation output based on the inputs:**

**1- Kerry's 2-party vote (true
population mean): 52.60%**

**2- The Exit Poll Cluster effect
(zero for pre-election): 30%**

** **

**Play "what-if" to see
the effect of changing assumptions:**

**Lower Kerry's 2-party vote
share.**

**Press F9 to run the simulation.**

** **

**Note how the 1% reduction in
Kerry's "true vote" results in a decline of his polling popular and
electoral vote shares , corresponding win
probabilities and minimum vote at the 97.5% confidence level.**

** **

**________________________________________________________________**

** **

**Introduction
to Statistics and Probability**

** **

**List
of statistical topics
List of probability topics
Opinion
polls
Margin
of error
Random
sampling
Standard
deviation
Standard
score
**

**Normal
distribution
Central
limit theorem
Correlation
Illustration of the central limit theorem
Independent identically distributed random variables
**

**Statistical hypothesis testing
Law
of large numbers
Least
squares
**

**Probability
theory
Odds
Random data
Statistical
power
Testing hypotheses
Monte Carlo Simulation and numerical analysis**

** **

** **