Monday, March 15, 2010

Notes on Novels 100-21

On the eve of entering the top 20 I will recap and update some of the data previously analyzed in earlier posts. You can of course see all of the titles and read all of Fuse#8's thoughts plus lots of great commentary on each title at her blog here: #100 - 91, #90-86, #85-81, #80-76, #75-71, #70-66, #65-61, #60-56, #55-51, #50-46, #45-41, #40-36, #35-31, #30-26, #25-21.

Since the last full stat post thirty more titles have been added to the list. In some areas early trends have remained pretty much constant while in others we are beginning to see shifts as we get closer to the more universally acclaimed titles. This post will have decidedly less predicting than the previous breakdowns as at this point there are probably very few surprises remaining, just a matter of order and that is a puzzle I don't think I have any chance of cracking. So instead I'll just show the data.

One area which has remained pretty much steady is the country of publication. The percentage of titles in the published in the United States remains almost the same it did in the last post. With 80 titles accounted for, 59 were first published in the United States, 19 were published in the United Kingdom, 1 was originally published in Sweden and 1 in Germany.

Will we see a non-USA/UK title in the top 20? If the top 20 follows the existing trend, about 15 of the remaining titles come from the United States. That only leaves 5 spots for Lion, the Witch and the Wardrobe, Sorcerer's Stone, Prisoner of Azkaban, Hobbit, Secret Garden, Charlie and the Chocolate Factory, and Anne of Green Gables.

Percentage of books belonging to a series is still at the same spot with 64% of the books on the list belonging to a series. However if you group the titles in sets of ten, you see that the middle rankings (60-30) has significantly fewer series titles than the bottom and top groupings see graph below. I expect the top twenty will have less than the 13 series titles the trends would indicate.

Older titles have begun to appear more frequently on the list with two of the expected pre 1900 titles showing up in the top 30. Below is the distribution of titles by decade. The 2000s have now taken over the lead with 16 titles with help from three Potter titles and a certain 2009 instant classic (can we all just agree to ignore title #21?). With two Potter titles, Holes, Maniac Magee and The Giver still to come, it looks like the 1990s will indeed prevail as the top decade.

Age of author has also shown little change with the forty year olds still dominating the competition with 45% of the titles.

Gender on the other hand is beginning to trend slightly less female. 68% of the authors of titles 100-51 are female, for titles 50-21 only 53% are female. So as we approach the completion of the countdown male authors are appearing with more frequency. Below is a breakdown of female authorship by ranking group.

It would take the remaining 20 titles to all be male authored in order to create a 50-50 split but the the percentages may drop slightly more with the completion of the list

It probably does not come to a surprise when I tell you that J.K. Rowling is now the leading author in points and votes (she is currently in a tie with Beverly Cleary for number of titles as each has 4 on the list though she'll certainly pass Cleary very soon). The Potter scribe has amassed 405 points from 60 votes including a whopping 11 first place votes! Dahl and Enright each have 6 first place votes and while Dahl will likely get some more votes when Charlie appears, he won't be passing Ms. Rowling.

I was surprised to see that more first place votes were accounted for in titles 40-31 than in titles 30-21. The chart below shows the distribution of both votes and first place votes by ranking groups. The total votes bars (in green) are still steadily rising though I expect a sharper increase with the top ten. The first place votes bar however has been very unpredictable. Surprisingly books are still making the list with one, two or even zero in the case of first place votes. Can a book be a top twenty or even top ten book without being anyone's top pick? We shall see... (in case you were wondering, Millions of Cats made the top ten last year with out a single first place vote).

In terms of total points I believe the model below is still undershooting the total points the number one book will earn. When we looked at the same graph for just titles 100-51 we saw the trend line predicting approximately 270 points for the top title. As you can see below the model now shows about 360 points for the winner. I keep expecting to see the big jump in votes we saw with the picture book poll but so far other than a small jump at title 41 (Witch of Blackbird Pond) the points have been raising very slowly. Maybe tomorrow. My current guess is that the winner will have somewhere between 430-475 points and at least 15 first place votes.

Distribution of votes is also beginning to even out. Fourth place votes went from almost the lowest represented vote in titles 100-51 to the most represented vote in titles 100-21. See below.

So far 6008 points have been awarded from 1084 votes (97 of which were first place votes). Not knowing how many voters participated its hard to make sense of these number but I believe the number of voters is well over 250 so there are still lots of votes out there (of course many votes won't be accounted for at all). Using 275 as the number of voters we get 15125 total points. Which means only about 40% of the points have been accounted for so far. By comparison ~60% of the points in last year's picture book poll were accounted for in the top 100 though only 27% of the points were accounted for in the titles 100-21. In that poll, the number won titles accounted for an amazing 10% of all the available points. For the top title in this years poll would need around 600 points to match Where the Wild Things Are's feat, doubtful to say the least.

That's all for now, tomorrow I will post the first Battle of the Books leader board and as well as some thoughts on matches 3 and 4.


Madigan McGillicuddy said...

Top notch work! I'm a sucker for charts and graphs.

My Boaz's Ruth said...

According to Wikipedia, The Secret Garden was originally serialized in The American MAgazine (published in OH) and then published the same year as a book out of both New York and London. I think this can count as a United States book.

L.C.Page, which published Anne of Green Gables, is a Boston company.

My Boaz's Ruth said...

Oh. And according to Wikipedia, Charlie and the Chocolate Factory is also an american book (though obviously not making it to the top 10), published in the US 3 years before it made it to London.: