The 48th Mersenne Prime

The 48th Mersenne prime has been discovered through the Great Internet Mersenne Prime Search (GIMPS)!  The full story can be found here:

Also, here is some helpful background reading for the mathematically inclined:

Over lunch yesterday, a coworker remarked, “I wonder if the digits are uniformly distributed?”  We had no reason to believe they wouldn’t be, but I decided to investigate.

First, I downloaded and cleaned the data from here.  The 17,425,170 digits of the 48th Mersenne prime take up about 35MB on my hard drive in comma-separated form.  Next, I imported the data in R and used a humble hist() command to generate a histogram.  The graph itself isn’t terribly informative, but below are the frequencies which resulted.  The whole process consumed about 6 real-time seconds on an AMD Phenom II 3 GHz quad-core.

0       1       2       3       4       5       6       7       8       9
1739652 1743497 1739844 1745602 1743528 1739641 1742677 1743436 1743298 1743995

At a glance, the digits do appear to be uniformly distributed.  After all, we’d expect to see 1,742,517 of each digit.  Satisfied?  Hardly.  There are all sorts of statistical tests available for assessing a frequency distribution’s similarity to a known distribution.  For this task, I chose the χ² goodness-of-fit test.  The null hypothesis supposes that observed distribution of digits is consistent with a uniform distribution.

\chi^2 = \sum_{i=1}^{10} \frac{(O_i - E_i)^2}{E_i}

Where O_i = observed number of digit i.  E_i = expected number of digit i.

The result is χ² = 22.2603 and a p-value = 0.008, illustrated below.


In conclusion, the goodness-of-fit test causes us to reject the null hypothesis for “typical” values of α (0.01, 0.05).  Against intuition, the test rejects the notion that the digits are uniformly distributed.

Here’s the R code.  Feel free to modify and explore!


## Author: Nate De Jong
## Date: 2/12/13
## Description: a new Mersenne prime was discovered in Feb. 2013.
## a coworker wondered aloud at lunch "I wonder if the digits are 
   uniformly distributed?"
## this is the result

# begin timing
ptm = proc.time()

digits = scan("M57885161_digits.csv", sep=",")

#plot histogram
bins=c(-0.5, 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5)
digit_hist = hist(digits, breaks=bins)

#display freq. counts
obs = digit_hist$counts

# chi-square test for uniformity
k = 10
N = sum(obs)
expected = rep(N/k, 10)
test_stat = sum((obs - expected)^2 / expected)
p_val = pchisq(test_stat, df=k-1, lower.tail=FALSE)

# end timing
proc.time() - ptm