Kasey's Blog: 2017

23 June 2017

VS 2017 runaway usage

My computer started to feel sluggish yesterday. Little things like the mouse taking a second to respond to my movements (in any app). I finally got around to investigating, and discovered this.

On this computer, 19% CPU is ~1.5 cores of compute being used. Even after I closed Visual Studio the Bootstrapper continued to run and eat CPU/Memory.

Based on my other experiences in VS2017 so far, especially the new inbuilt F# tools, I would say it is still needs to bake for a few more updates. Maybe it will stabilize by the time VS 2019 comes out. :-p

But seriously, I do use it now for production work. But it has a few rough edges yet.

03 April 2017

What is the benefit of Functional Programming?

I've seen some posts lately trying to come to grips with or warn against functional programming (FP) patterns in F#. So, I thought I would take a step back and explain how to derive tangible benefits from FP, what those benefits are, and how functional patterns relate to that.

By functional patterns, I mean things like partial application, currying, immutability, union types (e.g. Option, Result), record types, etc.

The most important principle to follow to reap benefits from FP is to write pure functions. "Wait," I hear you say, "you can't make a useful app with only pure functions." And right you are. At some point, you have to perform side effects like asking for the current time or saving something to a database. Part of the learning process for FP (in F#) is getting the sense for where to do side effects. Then the continuing journey from there is how to move side effects as far out to the edges of your program as possible to maximize the amount of pure functions on the inside. The goal being that every significant decision is made by pure functions.

"What's so great about pure functions?" A commonly touted benefit is test-ability, which comes almost by definition. But in my experience it goes much further than that. Programs which are largely pure functions tend to require less time to maintain. To put it another way, pure functions allow me to fearlessly refactor them. Because of their deterministic nature combined with type safety, the compiler can do most of the hard work in making sure I haven't made a gross error. Pure functions certainly don't prevent logic errors (common one for me: incorrectly negating a boolean). But it does prevent a lot of the hard-to-track-down and tedious problems I've dealt with in the past.

"I hear you, but I am skeptical." Yes, and you should be. First, there's a trade-off. And second, I wouldn't expect you to take my word for it. Probably the largest trade-off is with immutable data structures. Instead of mutating data (e.g. updating a value in an input array, which introduces a side effect to the caller), you return a new copy of the data with different values. As you can imagine, copying data costs more (cpu/memory) than updating in place. Generally, I find there's not a significant enough difference to worry about it. But in certain cases performance can dominate other concerns. One benefit of F# is that you can easily switch to doing mutation if you need to.

As for not taking my word for the maintenance/refactoring benefits, I can tell you how I discovered this and suggest you try it. I have to admit, I did NOT discover these benefits in depth in F#. Since F# is multi-paradigm, including OO, it's easy to skip or only go part-way toward learning how to model something with pure functions. It's also easy to get confused about how to do it with so many avenues available. My path of discovery must be credited to Elm -- a relatively young compile-to-javascript language/platform and the inspiration for Redux. Elm is a functional language in which you can only use immutable types and pure functions (although you can call out to Javascript in a controlled way). At first this seems restrictive and a bit mind-bending.

Because Elm's restrictions were unfamiliar, the first code I wrote in Elm was not the greatest, even after coding in F# for several years. (To be fair most of my F# code is for back-end, including performing side-effects.) My Elm code worked, but it just didn't fit quite right or it seemed fiddly. But as I discovered different ways to model things and tried them, I found Elm was actually very amenable to refactors, even sweeping epic ones. As I learned new ways to model in an environment of pure functions, the constraints placed on my code by Elm actually allowed my code to easily grow with me. Large refactors still take a bit of work, but my experience is that it's no longer an impossible-to-plan-for amount of "risky" work. It's just normal work. And I'm not the only one with that experience.

Eventually, I came to understand that this was not because of some white magic that Evan Czaplicki wove into Elm, but because of the insightful design choice that all user code consists of pure functions. A pure function is a contract which a compiler can verify. The contract doesn't specify everything about the function, but it can help you to resolve the majority of the really boring problems at compile time.

So here is where functional patterns come into the picture. They are tactics which help ensure that the compiler can verify the contract on a pure function. I'll run through some examples.

On .NET at least, null has no type and so the compiler cannot verify the contract is fulfilled if you provide a null value. So there's F#'s Option (Maybe in Elm), which not only preserves type, but also lets you know you should handle the "no value" case. Now the contract can be verified again.
Throwing exceptions invalidates a pure function's contract, because exceptions are hidden return paths that are not declared. Enter Result***, which provides a way to return either a success value or an error value in a declarative way, and thus the contract can be verified again.
Mutation (modifying data in-place) cannot be observed or accounted-for in a function declaration. It's also the source of some really hard-to-find bugs. So enter records which are immutable by default. To "change" data, you have to make a copy of the record with different values. Now the contract can be verified again.

Since the basic primitive to provide a "contract" is a function, partial application and currying follow naturally as conveniences. Types like Option and Result are possible (in a non-awkward way) because of union types, aka algebraic data types. And you can make your own union types. They are great for modeling mutually exclusive cases with their own sets of data. Like PaymentType is CreditCard with card data OR Check with #. If you model this with a normal class, you'd likely have nullable properties for both cases. That gives the potential for one, both, or none to be present, which requires a little more work on your part to ensure the data is valid. With union types, the compiler can do some of this work for you.

I hope you can see that functional patterns primarily exist to support writing pure functions. You can't write a useful program entirely out of pure functions, so the goal should be to maximize their usage in order to maximize refactorability. Learning how to do this is a process. Don't fret if you need to use an impure solution to get by until you discover another way. Also realize that writing pure functions in F# requires a little discipline, because the language doesn't provide a way to enforce that. The basic guidelines I follow to produce pure functions are 1) do not mutate input values and 2) result is deterministic... that is: given the same inputs, the output is always the same. I also tend to establish an outer boundary of my programs where side effects (database ops, HTTP calls, current time, etc.) are allowed. From this boundary I can call side-effect functions and pure functions. But pure functions should not call side effect functions. So far this has worked out really well in my F# code. But I'm sure I have more to learn. :)

*** Use Result judiciously. Since exceptions are foundational to .NET, practically all common external libraries throw exceptions. You'll create a lot of work for yourself trying to convert every interaction with an external library into a Result. My rule of thumb is to just let my code throw when I have to execute multiple operations against an external library which will throw. Higher up in the program, I'll convert an exception from my code to a Result error case. I also use Result for business logic failures like validation errors.

09 March 2017

Shuffling without side effects

Randomization can be separated from shuffling to produce repeatable results.

A while back, I put some code on GitHub to do shuffling. It uses an unbiased shuffle algorithm and has the capability to shuffle with System.Random or RNGCryptoServiceProvider for strong randomization. I have been using it to shuffle exam questions and answers to generate CBT tests. However, randomization presents a significant problem for automated tests (and functional programming) because by definition it depends on unpredictable side effects. I didn't want to use the seeded randomization from System.Random, because it has very weak randomization. Eventually I realized I could use a cryptographically secure RNG, but still shuffle in a repeatable way.

The first thing to realize is the purpose of the RNG for shuffling is to pick a value in the list. One easy way to represent a pick is as a percentage (a value between 0 and 1). Whether there are a thousand things in a list or two, a percentage can be mathematically converted into a position on the list. For a shuffle, you need as many picks (percentages) as you have items in the list, minus 1. Depending on the algorithm used, either the first or last pick always goes into the same position.

Edit: This is similar to how Consistent Hashing works.

So then, it becomes pretty easy to shuffle using an array of random percentages which were generated ahead of time. Here's an "inside-out" shuffle algorithm which returns a shuffled array. The algorithm uses an imperative style and this code reflects that, but this shuffle function is still pure.

/// Shuffle an array using a fair shuffle algorithm with the provided ratios.
/// ratios is an array of values between 0.0 and 1.0, having at least the same length as a, minus 1.
/// a is the array to be shuffled.
/// Returns a shuffled copy of the input array.
let shuffle ( ratios : float array ) ( inArray : 'a array ) =
    let n = Array.length inArray
    if n = 0 then
        Array.empty
    else
        let copy = Array.zeroCreate<'a> n
        copy.[0] <- inArray.[0] // copy first element over
        for endIndex = 1 to n - 1 do
            let randIndex =
                ratios.[endIndex - 1]
                |> Ratio.scaleBetween 0 endIndex
            if randIndex <> endIndex then
                copy.[endIndex] <- copy.[randIndex] // move element at randIndex to end
            copy.[randIndex] <- inArray.[endIndex] // move next item to randIndex
        copy

Converting the percentage (aka ratio) to an index has a deceptive amount of edge cases, but you can find my version of `Ratio.scaleBetween` here with inline explanation. The original shuffle algorithm looped from 0, but a percentage was wasted to get a number between 0 and 0 (duh). So I manually code the first iteration.

For testing, you can use a static array of percentages. And every time you run the shuffle with the same percentages, you will get the same result. In this particular algorithm, an array of all 1.0 values will produce the same ordering as the original list.

For generating the percentages at run-time, you can use the RNG of your choice, even a cryptographically strong one, and optionally store them for auditing. Here's some code I use (at the edge of my system where I allow side effects) to get randoms for shuffling.

let private fx_generateRandoms count =
    use rngCsp = new RNGCryptoServiceProvider()
    let buffer = new RandomBuffer( rngCsp.GetBytes, count )
    Array.init count ( fun _ -> buffer.GetRandomRatio() )

Pulling percentages from `RNGCryptoServiceProvider` also has some edge cases and optimizations, so I made the helper class `RandomBuffer`, conveniently on the same GitHub repo.