Statistics [02]: Shakespear’s New Poem

7 minute read

Published:

In 1985, Shakespearean scholar Gary Taylor discovered a nine-stanza poem in a bound folio volume that was attributed to Shakespeare (called the Taylor poem). The size of the newly discovered poem is small relative to the size of Shakespeare’s total work, only 429 total words. Can we prove that the poem was actually written by Shakespeare or not?

Here is the analysis given in “Did Shakespeare write a newly-discovered poem?”


Observation

Of the 429 worlds in the newly discovered poem, 258 are distinct. Therefore, the analysis begins by ranking each of the 258 distinct words in the Taylor poem according to its rarity of usage in the Shakespearean canon. The results are shown in the following table, where the number (denoted as ) denotes the number of distinct words in the Taylor poem which occurred exactly times in the 884647 total words of the Shakespearean canon. For example, 9 distinct words in the poem appeared zero times in the canon, 7 distinct words in the poem appeared 1 times in the canon, etc.

0123456789total
0+975442402340
10+103011121010
20+221531022321
30+311121003316
40+120021121111
50+01111001027
60+01001100104
70+00100100114
80+00110000002
90+00010110003

Estimation

Now that we have the real value of . The next step is to estimate the expected value of assuming Shakespearean authorship, denoted as . Assuming Poisson process, the results of an empirical Bayes estimate are shown in the following table.

0123456789
0+6.974.213.332.842.532.432.162.011.871.76
10+1.621.501.521.511.361.381.331.281.251.22
20+1.181.161.131.111.091.061.041.021.000.98
30+0.960.940.930.910.900.880.860.850.830.82
40+0.800.790.770.760.750.740.730.720.700.69
50+0.680.670.660.650.640.630.620.610.600.59
60+0.580.570.560.550.540.530.520.510.500.50
70+0.490.480.480.470.470.460.450.450.440.44
80+0.430.420.420.410.410.400.390.390.380.38
90+0.370.360.360.350.350.340.340.330.320.32

To broaden the empirical base of the results, seven more Elizabethan poems are analyzed using the same method, with three poems attributed to Ben Jonson, Christopher Marlowe and John Donne respectively, and the other four poems attributed to Shakespear. The results are omitted here.


Modeling

The oabjective is to test whether the observed counts fit the predicted value based on the assumption of Shakespearean authorship. The tests rely upon the following regression model:

where for , have independent Poisson distribution with means .

The null hypothesis:

corresponds to .

The model can also be written in the form:

where we can see that if , increases as ; if , decreases as . Therefore, (slope) is of particular interest.

Tests

Three different tests are performed: Test 1, total account; Test 2, new words; Test 3, slope.

Test 1: Let be the total account of categories , then for Taylor poem. Similarly, let , so that has a Poisson distribution of mean . Test 1 is just the usual test of the simple null hypothesis .

Test 2: The zero count , considered conditional on the total count , has a binomial distribution of index and parameter . Test 2 is the usual test of the simple null hypothesis .

Test 3: Test 3 is the usual test, using large-sample maximum likelihood approximations, of the null hypothesis based on the data . This is equivalent to testing on , in which case has a multinomial distribution depending only upon the slope parameter .

Results

Test 1

PoemTotal Count Expectation
1. JON9588.80.67
2. MAR134106.52.57
3. DON107105.10.20
4. CYM9569.92.86
5. PUC5350.50.37
6. PHO10576.13.13
7. JON10996.71.24
8. JON11895.02.29

Asterisks indicate deviations from null hypothesis.

.

Test 2

PoemNew Words Expectation
1. JON87.140.37
2. MAR1010.120.01
3. DON178.062.90
4. CYM77.130.00
5. PUC13.98-1.64
6. PHO147.892.08
7. JON78.21-0.39
8. JON98.660.16

Test 3

PoemEstmated Slope Estimated Standard Error
1. JON0.2290.112.08
2. MAR-0.3230.08-4.04
3. DON-0.1380.09-1.53
4. CYM-0.0470.10-0.47
5. PUC-0.0500.12-0.42
6. PHO-0.1270.09-1.41
7. JON-0.0340.09-0.38
8. JON-0.0750.09-0.83

Conclusion

Test 1 is the least reliable for discriminating between Shakespearean and non-Shakespearean authorship. Test 2 seems only moderately useful for discerning Shakespearean authorship. Test 3 seems to be promising as a discriminator between Shakespearean versus non-Shakespearean authorship.

On the basis of the results, the Taylor poem appears consistent with the hypothesis of Shakespearean authorship. In particular it passes the slope test, which is the best discriminator among the three. It fails the total count test, but less dramatically than do two of the four Shakespearean poems. Overall it seems fair to say that the Taylor poem fits Shakespearean usage about as well as do the four Shakespeare poems.


From this interesting example, we can have an intuitive feeling that statistics is really about collecting and analyzing data. When properly handled using appropriate methods, some interesting and reliable results can be obtained.


Table of Contents

Comments