[261] GenStage By Way of Pairing

In which Theron Boerner and I paired on prototyping a Continuous Integration server workflow using GenStage to produce jobs for specific machines and consume them appropriately.

Subscribe now

GenStage By Way of Pairing [08.11.2016]

I've been meaning to cover GenStage for quite a while and finally managed to. The format for this one is pretty different though. Read on.

Project

This is a session where Theron Boerner agreed to pair with me to prototype switching the rabbit-ci build queue system from RabbitMQ to GenStage. There were a few reasons that he wanted to do this:

  • He can eventually get rid of the RabbitMQ dependency.
  • Once a job is enqueued right now, it can't be changed in any way.
  • Making the current RabbitMQ-based queueing system support specific machine types for specific build machines/jobs would be complicated.

Given this, he decided he wanted to just try implementing it in GenStage using Postgres to store the build queues. It didn't hurt that José had recently given a talk at the Elixir London meetup in which he walked through building a background job queue system using GenStage and taking advantage of a new feature in PostgreSQL 9.5 that adds support for row locking.

The code we ended up with is available on the hunterboerner/gen_stage_playground repo on GitHub.

It's a very lengthy pairing session - 2 hours and 20 minutes. With that in mind, here's a rough outline of various timings in the video:

  • 00:00:24 - We discuss the GitHub Issue where he wrote up a couple of candidates for us to work on in our pairing session. He discusses how the system currently works and what he hopes to gain from the GenStage implementation.
  • 00:02:30 - We start to look through GenStage's documentation to talk about what our routing should end up looking like. Ultimately we implemented something comically simple, with just a few producers and a few consumers and having them subscribe directly.
  • 00:04:50 - Theron shows me where José's London talk covers the use of FOR UPDATE SKIP LOCKED in PostgreSQL.
  • 00:05:37 - We kick off the new project and install GenStage as a dependency.
  • 00:13:43 - I realize once again that I want to play with spacemacs.
  • 00:17:15 - We introduce Ecto.
  • 00:27:12 - We write our first test.
  • 00:46:20 - We start making our Producer produce builds rather than integers.
  • 00:54:10 - José sends Theron the code for his talk which makes our cribbing from it substantially simpler.
  • 01:01:00 - We realize that we have to upgrade our version of PostgreSQL on Theron's machine in order to move forward. This consumes the next 10 minutes.
  • 01:11:00 - We can finally run the tests with our code that should start marking jobs as running when the producer produces them.
  • 01:15:00 - I completely derail work briefly as my swiping to another desktop screws with his emacs setup. Wooooops.
  • 01:20:00 - Now that our test shows the producer marks builds as running after producing them, we start implementing the actual BuildConsumer that will run the builds and consequently mark them as complete. This leads to us talking through some GenStage design realizations, reading the docs, and in general trying to figure out how to tweak our demand so we don't consume a ton of builds at once on a single worker. We also gradually figure out how max_demand and min_demand work by watching the logs and being confused for a long time until we are no longer confused.
  • 01:49:00 - 29 minutes later, we finally start to have a decent internal model for how GenStage and demand work, and can finally begin to predict how a given tweak to the max_demand and min_demand options will affect the actual results. Once we get past the rest of this eureka moment, we start talking through design considerations for the BuildConsumer.
  • 01:54:00 - We notice that our returning statement isn't working exactly like we expected in our update. This ultimately is due to the default value in our schema, but it takes us a few minutes to work that out.
  • 01:59:30 - We start actually prototyping the 'work' that the BuildConsumer does, updating the status of the builds appropriately. This leads to a brief discussion of crash semantics inside our consumer.
  • 02:01:00 - Our BuildConsumer is doing what we want! Now we start adding more tests to simulate increasingly appropriate examples of an actual CI queue.
  • 02:10:00 - We model a producer for each build type (think mac builds, linux builds, etc) and consumers that can handle those build types, and verify that our system can model this successfully.
  • 02:12:00 - We introduce Tasks with async/await to run builds concurrently.
  • 02:16:40 - Wrapping up

Summary

So yeah...This is different than anything else I've released, but I think it's useful. Please email me and let me know if you liked this episode or if you hated it. I know it's a massive departure from the normal format, and I don't intend to switch to doing 2 of these a week or anything, but if there's value and people like them I'm completely willing to do them occasionally because it's a fantastic experience for me. Also, I think there needed to be some public discussion of building something slightly non-trivial with GenStage, and there hadn't really been enough of that yet, so it seemed like a useful thing to get out there.

A hybrid approach might be doing this sort of thing and then releasing the pairing session as one episode in a week, and a succinct version of the important meat we learned from it as the second episode. I'm all ears, send me an email or leave a comment and let me know what you want to see!

See you soon!