Massive Randomization Test for three Doublesix Dice

Screenshot 2015-01-27 10.58.48

I hired Timothy Weber to roll each of 3 Longbright final production samples 6,000 times, using this machine:

000001

Note: For a limited time, Tim will run this service for the public ($40/die, plus $10 S&H if you want it mailed back).  If you are interested, contact him via his website at http://timothyweber.org/.

SUMMARY:

The measurements of each sample die  I mailed him are below, in mm, starting with the 1-1 face, then 2-2 face, and so on.  Note: the first measurement (of the 1-1 face) is always the longest (i.e. most problematic) on the production samples.  These are faces that Longbright have hand-polished down for all 180,000 dice, including the dice below.

The second line is the variation of percentage each number (1-6) came up, where 16.67% is ideal

Die #1 (17.75 – 17.40 – 17.51 – 17.39 – 17.46 – 17.43)
14.8% – 18.1%, max bias = 1.9%

Die #2 (17.85 – 17.55 – 17.51 – 17.51 – 17.37 – 17.67)
14.2% – 18.1%, max bias = 2.4%

Die #3 (18.00 – 17.42 – 17.50 – 17.54 – 17.40 – 17.44)
12.6% – 17.6%, max bias = 4.1%

Tim writes:

“The precision dice – casino and GameScience – that I’ve tested have biases around 0.85 – 1.1%.

Generic dice – from Chessex and board games – have biases ranging from 0.63% – 1.4%

Longbright ran a quality control inspection on 238 pieces of the final dice to check the error, and reported these numbers:
17.70-17.85 85%
17.85-17.90 12%
17.90-17.95  3%

I had previously requested that all the dice be under 17.85 mm, but I am not going to make them hand-polish all the dice again. According to this Longbright QC inspection, die #3 will be very rare (since its 1-1 axis is 18.00mm). But if it is so rare, one has to wonder how I happened to get it.  I imagine these Doublesix dice will have an average bias of about 2%, which is slightly worse than Chessex and generic dice.

DETAILS:

This is the first (full and unedited) report from Timothy (with images inserted) regarding the first two dice:

“Wellp… there’s good news and bad news.

The bad news: these dice aren’t performing well.  They’re noticeably worse than generic dice, they fail the two statistical tests, and the histograms look pretty biased just by eye.

The good news: I don’t think it has to do with your design, it’s all about the production.  OK, maybe that’s good news and maybe it isn’t. But at least it might be fixable?

Once I had the dice rolling and in the process of recognizing (had a glitch over the weekend that lost a bunch of data, so I had to re-recognize yesterday), I paid more attention to the measurements. First, I replicated your measurements of these three dice – we agreed within a few hundredths of a millimeter.  Then I calculated the “flatness” of the dice, as defined by one of the original papers on testing d6s: flatness = 1 – (short axis) / (long axis).  It produces a nice dimensionless number that increases as your die tends toward a “flat,” which is a gambling term for a die that’s out of square so as to favor one axis.

The flatness of these three dice are 2.2%, 2.7%, and 3.3%.  For comparison, the Chessex dice I’ve checked are generally under 1%, and casino dice and GameScience dice (and some Chessex) are under 0.3%.

In that paper (I can find the reference if you like), experimentally-observed bias is proportional to flatness, with a slope of 1.91 for d6s.  So, for d6s, flatness of 2.2-3.3% corresponds with a bias of +/- 4.2-6.4%, which means instead of getting a given face 16.6667% of the time, you could get it 10% or 23%.

I don’t know if that holds for the d12 shape, but what I do see in my experimental results is that after 6,000 rolls, the histograms (attached) correspond pretty neatly with the chart of axis measurements – the low bars on the histogram are where the high axis measurements lie, and vice versa.

The low and high percentages for the first two dice are 15%-18% and 14%-18%, which is less than the bias predicted by that paper, but still outside the 95% confidence intervals for a fair die.

analysis

analysis (1)

The chi-squared and Kolmogorov-Smirnov test then basically get upset too.

I think your arrangement that puts the same number on opposite faces is excellent for canceling out any bias due to the loss of material for pips.  But it does exaggerate the effects of the manufacturing tolerance – the traditional “sum to 7” approach has the benefit of improving the Kolmogorov-Smirnov test (and performance for RPGers), because if you get more 1s you also get more 6s, etc., so the numeric values at least balance out over time.

So – I’ll finish up this third one that’s running now if you want, but I’m expecting it’ll show an even heavier bias given that its flatness is more severe.  Let me know what you want to do from there.

Oh, and I held off on making a video because I wasn’t sure how you’d want to communicate this to your backers… but I can do a quick one showing dice rolling and being counted if you like, without commenting on the results.

Sorry it’s not better news!  :(“

And here is part of the next email, including info on the third die:

“Since a low percentage on one face has to come out of one or more other faces (they have to add up to 100%!), it’s conventional to talk about ‘bias,’ which is the percentage that the die deviates from the ideal percentage, plus or minus.

So, for your three dice (the third one’s done – I’ve also attached its plots and result text), we get these percentage ranges and biases:

#1: 14.8% – 18.1%, max bias = 1.9%
#2: 14.2% – 18.1%, max bias = 2.4%
#3: 12.6% – 17.6%, max bias = 4.1%

(The histogram for #3 looks really awesome except for the one face whose axis measurement is way out of line with the others – so if you can get the factory to mill them down to match or adjust their mold, I think these will perform really well!)

analysis (2)

The precision dice – casino and GameScience – that I’ve tested have biases around 0.85 – 1.1%.

Generic dice – from Chessex and board games – have biases ranging from 0.63% – 1.4%.

A die I drilled out so it would be weighted and intentionally unfair had a bias of 2.7%.  It had face percentages ranging from 14.4% to 19.4%.

Another comparison is that for a d5, your expected face probability is 20%, and for a d7, it’s 14.3%.  Those would be biases of 2.4 and 3.3 vs. the expected for a d6.

In general, the chi-squared and K-S tests are a better way to tell fair from unfair dice than comparing bias values for a particular run, because they incorporate more information about the distribution and about what kind of deviations from the norm are significant vs. insignificant.  So I’m giving you the biases under advisement.  ;)”

Thanks, Tim!  What an amazing service you provided.

Focus Group

focus-group

Dear Reader,

I need some help.  What do you guys want me to ask the Focus Group (23 mostly-bulk backers who recently got a single Doublesix die)?

I was thinking about asking them two things on the form I send out (everyone will have instant access to the responses):

1) Would you be satisfied with your pledge if all the dice were similar to the sample you just received?

2) Would you please share your comments here (which will be helpful to the rest of the backers).

Soon, after presenting the Focus Group answers, I will poll the rest of the Team to see how many of them will support moving forward with these specific Longbright dice (or find a Plan C, like try again with Longbright, or find a different manufacturer).

So my question for you, Reader: Are there other/more answers the Focus Group could provide that may help the rest of the Team decide what their preference is (go with these, or seek Plan C)?

Please post your comments HERE (the KS comments section, if you can).

Thanks in advance,

Matt

PS.  For you non-backers that don’t get the Kickstarter updates, please go HERE to see the latest news (there have been a couple of posts recently).

PS2. Meanwhile, Timothy Weber is running a test for randomization, rolling 3 separate Doublesix dice about 6,000 times each and recording the results.  Here is a video of the type of machine he will be using:

PS3: Here are some videos of me and my daughter Quinn talking and packing and shipping the dice for the Focus Group (sounds boring, but I think it is actually pretty cute):