The 48th Mersenne prime has been discovered through the Great Internet Mersenne Prime Search (GIMPS)! The full story can be found here:

http://www.mersenne.org/various/57885161.htm

Also, here is some helpful background reading for the mathematically inclined:

http://mathworld.wolfram.com/MersennePrime.html

Over lunch yesterday, a coworker remarked, “I wonder if the digits are uniformly distributed?” We had no reason to believe they wouldn’t be, but I decided to investigate.

First, I downloaded and cleaned the data from here. The 17,425,170 digits of the 48th Mersenne prime take up about 35MB on my hard drive in comma-separated form. Next, I imported the data in R and used a humble hist() command to generate a histogram. The graph itself isn’t terribly informative, but below are the frequencies which resulted. The whole process consumed about 6 real-time seconds on an AMD Phenom II 3 GHz quad-core.

0 1 2 3 4 5 6 7 8 9 1739652 1743497 1739844 1745602 1743528 1739641 1742677 1743436 1743298 1743995

At a glance, the digits do appear to be uniformly distributed. After all, we’d expect to see 1,742,517 of each digit. Satisfied? Hardly. There are all sorts of statistical tests available for assessing a frequency distribution’s similarity to a known distribution. For this task, I chose the χ² goodness-of-fit test. The null hypothesis supposes that observed distribution of digits is consistent with a uniform distribution.

Where O_i = observed number of digit *i*. E_i = expected number of digit *i.*

The result is χ² = 22.2603 and a p-value = 0.008, illustrated below.

In conclusion, the goodness-of-fit test causes us to reject the null hypothesis for “typical” values of α (0.01, 0.05). Against intuition, the test rejects the notion that the digits are uniformly distributed.

Here’s the R code. Feel free to modify and explore!

-Nate

## Author: Nate De Jong ## Date: 2/12/13 ## Description: a new Mersenne prime was discovered in Feb. 2013. ## a coworker wondered aloud at lunch "I wonder if the digits are uniformly distributed?" ## this is the result # begin timing ptm = proc.time() digits = scan("M57885161_digits.csv", sep=",") #plot histogram bins=c(-0.5, 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5) digit_hist = hist(digits, breaks=bins) #display freq. counts obs = digit_hist$counts # chi-square test for uniformity k = 10 N = sum(obs) expected = rep(N/k, 10) test_stat = sum((obs - expected)^2 / expected) p_val = pchisq(test_stat, df=k-1, lower.tail=FALSE) # end timing proc.time() - ptm