09 March 2017

Shuffling without side effects

Randomization can be separated from shuffling to produce repeatable results.

A while back, I put some code on GitHub to do shuffling. It uses an unbiased shuffle algorithm and has the capability to shuffle with System.Random or RNGCryptoServiceProvider for strong randomization. I have been using it to shuffle exam questions and answers to generate CBT tests. However, randomization presents a significant problem for automated tests (and functional programming) because by definition it depends on unpredictable side effects. I didn't want to use the seeded randomization from System.Random, because it has very weak randomization. Eventually I realized I could use a cryptographically secure RNG, but still shuffle in a repeatable way.

The first thing to realize is the purpose of the RNG for shuffling is to pick a value in the list. One easy way to represent a pick is as a percentage (a value between 0 and 1). Whether there are a thousand things in a list or two, a percentage can be mathematically converted into a position on the list. For a shuffle, you need as many picks (percentages) as you have items in the list, minus 1. Depending on the algorithm used, either the first or last pick always goes into the same position.

Edit: This is similar to how Consistent Hashing works.

So then, it becomes pretty easy to shuffle using an array of random percentages which were generated ahead of time. Here's an "inside-out" shuffle algorithm which returns a shuffled array. The algorithm uses an imperative style and this code reflects that, but this shuffle function is still pure.

/// Shuffle an array using a fair shuffle algorithm with the provided ratios.
/// ratios is an array of values between 0.0 and 1.0, having at least the same length as a, minus 1.
/// a is the array to be shuffled.
/// Returns a shuffled copy of the input array.
let shuffle ( ratios : float array ) ( inArray : 'a array ) =
    let n = Array.length inArray
    if n = 0 then
        Array.empty
    else
        let copy = Array.zeroCreate<'a> n
        copy.[0<- inArray.[0// copy first element over
        for endIndex = 1 to n - 1 do
            let randIndex =
                ratios.[endIndex - 1]
                |> Ratio.scaleBetween 0 endIndex
            if randIndex <> endIndex then
                copy.[endIndex<- copy.[randIndex// move element at randIndex to end
            copy.[randIndex<- inArray.[endIndex// move next item to randIndex
        copy


Converting the percentage (aka ratio) to an index has a deceptive amount of edge cases, but you can find my version of `Ratio.scaleBetween` here with inline explanation. The original shuffle algorithm looped from 0, but a percentage was wasted to get a number between 0 and 0 (duh). So I manually code the first iteration.

For testing, you can use a static array of percentages. And every time you run the shuffle with the same percentages, you will get the same result. In this particular algorithm, an array of all 1.0 values will produce the same ordering as the original list.

For generating the percentages at run-time, you can use the RNG of your choice, even a cryptographically strong one, and optionally store them for auditing. Here's some code I use (at the edge of my system where I allow side effects) to get randoms for shuffling.

let private fx_generateRandoms count =
    use rngCsp = new RNGCryptoServiceProvider()
    let buffer = new RandomBuffer( rngCsp.GetBytescount )
    Array.init count ( fun _ -> buffer.GetRandomRatio() )


Pulling percentages from `RNGCryptoServiceProvider` also has some edge cases and optimizations, so I made the helper class `RandomBuffer`, conveniently on the same GitHub repo.

No comments: