The MorkiteMine Programming Challenge

Monday, 25 May 2020

tl;dr: I tried to design a programming challenge to be the perfect software developer hiring tool. It went okay.

I feel the need to preface this article with a few things:

I’ve written this article twice before already, with each previous version taking weeks to write and longer to read,
I had planned to write even more articles about MorkiteMine since I couldn’t cover everything I wanted to in those previous articles, despite their length, and
I have never, ever, before in my entire life, seen a group of people so utterly fucking bored as when I tried to explain this project to my former coworkers.

In an attempt to be less verbose and boring, and also to give myself some closure with this project, I am writing this article again.

Unfortunately, it will almost certainly still be very long and boring. Sorry.

What is this, ice?

This whole project happened because I read this blog post by Andrew Lee.

Andrew Lee is one of the founders of Firebase, which is a thing that eventually got bought by Google for well over a hundred dollars. And, that article in particular is about how Firebase did hiring to grow from a wee babe into the multi-hundred dollar business it is today.

While there are plenty of reasons to be interested in their hiring process, I became obsessed with one particular component of it: the GoldMine programming challenge.

As Andrew explains it, the GoldMine programming challenge is a problem developed at Firebase and given to candidates as a take-home test of sorts. He never actually says what GoldMine is concretely, but he does outline what makes it a superb tool for hiring:

the GoldMine challenge doesn’t require any specialized domain knowledge or technical proficiencies,
solutions to the challenge can be objectively scored, and
it is impossible for solutions to achieve an “optimal” score (the score is unbounded).

So, the GoldMine challenge actually functions very much like a competition, where candidates submit programs and those programs are objectively ranked relative to each other.

The challenge is fascinating to me, not just for its ostensibly immense practical value, but for the sheer technical difficulty of fully realizing the concept into a practical, usable hiring framework.

I wanted to know what something like that would actually look like. So, I made MorkiteMine.

I made MorkiteMine

I think this is the most boring part of the entire project, and also the part I could talk the most about, so let me just loosely summarize countless hours of work for you:

I invented MorkiteMine, a graph-based optimization problem.
It is a homebrewed mutation of the travelling salesman problem.
It is objectively scoreable by nature of being an optimization problem.
It realizes the “unbounded” requirement of GoldMine by nature of being NP-complete.1

1: Proved easily with a reduction to TSP.

It realizes the “approachability” requirement of GoldMine by being easily phrased in natural language.

So, that’s what MorkiteMine is, but what is it? The tl;dr is that you are given a graph, with “values” on the nodes and “costs” on the edges, and you want to find a route through the graph such that the amount of value visited is maximized while keeping the costs low. If you want to see the full, technical problem statement, you can find the challenge instructions here.2

2: Extremely boring; please don’t read.

Okay, so that’s what MorkiteMine is, but how do you actually use it to evaluate candidates?

I created the MorkiteMine orchestrator, a Django project for managing the challenge.
It has commands for creating problem instances (mines) according to various configurable parameters.
It can save solution binaries and compute their results for specific problem instances.
It supports some other nice things, like drawing colour-coded mines or visualizing solution scores.
I actually did an okay job of documenting it; see the MorkiteMine project README.

There are still some big items on my wishlist for that project, like parallelizing results computation or animating solutions through mines. I got a bit sick of working on it though, since nobody will ever actually use it, so those things will probably just stay as open issues forever.

Overall, most of this stuff worked out great, and I could talk a lot about it. I think the most interesting things are the parts that didn’t work out, though.

Let’s consider them briefishly.

A Brief Hassle of Time

So, the MorkiteMine challenge is to find the best route you can through a cost-value graph. Moreover, it’s purposefully designed such that you can’t ever do it optimally,3 which means you are really just trying to find better routes as efficiently as you can with respect to run time. There are two major problematic consequence of this:

3: Technically it’s just NP-hard, which means either you can’t do it optimally4, or you’re cracking a lot of crypto.
4: Okay, actually you could happen to do it optimally by accident if you use randomness and get super-duper lucky, but you can’t do it deterministically and optimally.

The duration that a solution runs for is tightly coupled with how well it scores, and
languages which translate to better-optimized machine code have a substantial advantage over others.

The first consequence is less problematic but still not a trivial matter. After seriously (over)thinking this issue, I came to the conclusion that there are really only two options for us:

require submissions to return before some fixed amount of time, disqualifying any that do not, or
require submissions to print solutions iteratively, taking the last output printed before a fixed amount of time for grading.

The first option pushes all of the burden of tracking time to the candidate – we simply timeout their program if it runs too long. With this scheme, to really be competitive, a candidate must actually keep track of time in their program and return as late as possible.

The second option removes any burden of tracking time from the candidate. Instead, it is up to us to timestamp and manage all program outputs, and we can just sigkill the program when we are done with it.

I settled on the second option because I want candidates to be able to focus on the task at hand (i.e. routing through graphs) and not on tracking time.5

5: It also has the added benefit of allowing us to better profile a program’s overall performance by sampling its output at various different times. Which is neat.

So, that addresses the first problematic consequence, but the second consequence is much more difficult to deal with. Recall that one of the top-level requirements for the MorkiteMine challenge is to not favour any particular technical proficiencies, which includes particular languages.

Let me just skip to the end: I was mostly stumped by this problem.

I resorted to classifying solutions into different “weight classes” according to the ostensible speed of the language used. At the time of writing, I have only two weight classes: “compiled” and “interpreted”. This works for now since I only have solutions in Python, C, C++ and C#, but I’m curious how other languages would fit in.6

6: If you want to help grow the challenge, please send me a solution! Just send me your source code along with any instructions for compiling. No hax plz.

Even though that works for now, it is a serious failure to realize the gold standard set by GoldMine.

The Gold Standard

Because I am a huge pain in the ass, I reached out to Andrew Lee in a professional manner7 and asked him how Firebase handled the two things I found most problematic.

7: I created a twitter account and DM’d him.

To my complete surprise, he actually answered me.

I will paraphrase what he said here for clarity and formatting reasons, but you can see a screenshot of his actual answers here.8

8: I think this is okay to post because Firebase has purportedly not used GoldMine since 2014.

On managing program run times:

Firebase picked an arbitrary run time limit (10 minutes, probably). They told candidates what this time limit was, and the really good solutions kept track of time and used all the time they could.

So, Firebase did actually expect candidates to keep track of time in their program. I still prefer the alternate method I ended up settling on for this, but it’s a really nice sanity check to see that Firebase did actually have to address this specifically, and that I wasn’t just over-thinking something simple.

On comparing programs written in different languages:

The key is to pick a problem where the good solutions are algorithmically simpler than naive solutions. If a good solution is O(n) and a naive solution is O(n²), it shouldn’t matter what language you choose or run time limit you set, because the better algorithm should dominate. Part of the “GoldMine challenge” is picking the right tool for the job in the first place, and most of the great GoldMine solutions are in higher-level languages.9 If you find that choosing C over Python is a big advantage, your problem is likely too simple.10

9: I think the implication here is that, generally, the right tool for the GoldMine programming challenge is a language sufficiently ergonomic to implement a more sophisticated, more efficient algorithm (in a reasonable amount of time).
10: Well, MorkiteMine is not simple, but it also does not directly reward algorithmic simplicity, which I think is what Andrew means here.

So that’s the secret sauce: if your problem primarily rewards algorithmic efficiency, then you can be truly language agnostic!

MorkiteMine does not do that.

Last Words

Overall, MorkiteMine mostly realizes the strengths of GoldMine, but it still has three major problems:

it does not reward algorithmic efficiency suffice to be truly language agnostic,
it has so many rules that nobody ever wants to do it, and
I keep talking about it.

It’s not clear to me what kind of problem does reward algorithmic complexity enough to be truly language agnostic, so that is the main thing I will need to figure out if I want to really improve on this concept.

Thanks for reading, and thank you to Andrew for taking the time to answer my questions.