[002.2] Signal Handling

Interacting with your Sidekiq worker processes via UNIX signals.

Subscribe now

Signal Handling [05.13.2016]

Signals are a POSIX mechanism for inter-process communication. Sidekiq responds to various signals. They're outlined in a link in the resources section, but we'll look at a couple of these, and how they can best be used in deployment.

Project

TTIN

Let's introduce a bug into our worker. We'll modify the easy job so that it sleeps forever, never to complete. This simulates the sort of thing you might want to look into if you write buggy code in a worker:

  def perform(complexity)
    case complexity
    when "super_hard"
      sleep 60
      puts "Really took quite a bit of effort"
    when "hard"
      sleep 10
      puts "That was a bit of work"
    else
      while true  do
        sleep 1
        puts 'zomg bug'
      end
      puts "That wasn't a lot of effort"
    end
  end

OK, so now when we start a new easy job it will keep being worked indefinitely. In one tab:

bundle exec sidekiq -r ./worker.rb

In another:

bundle exec irb -r ./worker.rb

And we'll start an easy worker.

OurWorker.perform_async("easy")

Now let's kill the irb session and check out the workers. We can see it started a job. We'll fire up the web admin to see what's happening in our sidekiq installation:

rackup

We'll visit it at http://localhost:9292

OK, so we can see there's a busy worker, and he's been going for quite some time. We see the thread ID for this worker. It would be really nice if we could figure out precisely what ruby code this worker is executing that's got him stuck.

Of course Sidekiq is awesome, so you can. Send a TTIN signal to it and it will spit out every worker thread's backtrace:

ps ax|grep sidekiq # find the server
kill -TTIN <server_pid> # send it a `TTIN` signal

Now in the server logs, it's spit out a backtrace of every worker thread. We can look through this log for the worker thread id that we identified as hung in the admin, and see what it's busy doing. Most of these are going to just show they're in select, which means they're waiting for work and there are no problems. Our busy thread will be different though.

((( do that )))

And just like we expect, it's on the line in worker.rb that's got the while loop in it. This is a pretty silly example, but you'll almost certainly eventually run into some stuck worker that you need to debug, and this is the proper way to go about figuring out what the problem is.

USR1

You can also signal the server to stop accepting new jobs to work. This is useful if you're about to terminate the server and you'd like it to be done working when you terminate it. The most common case for this is when deploying changes - you send a USR1 signal at the beginning of the deployment, and a TERM signal when you want it to terminate.

Let's signal our server to stop doing work:

kill -USR1 <server_pid>

If we watch the log, we can see that it's stopping accepting new work. We can issue a new job and it won't be worked:

OurWorker.perform_async("hard")

USR2

If you send a USR2 signal, the server will reopen any logfiles that have been rotated using logrotate or something similar. If that matters to you, odds are pretty good you don't need to see it demonstrated! It's just useful to know.

TERM

When you send the server a TERM signal, Sidekiq will stop processing new jobs and terminate after a set amount of time, which is configured with the -t flag at startup. It defaults to 8 seconds. Let's terminate our server now:

kill -TERM <server_pid>

Pidfile

Sidekiq will output a pidfile at a location of your choosing if you provide a -P flag to it.

bundle exec sidekiq -r ./worker.rb -P ~/tmp/sidekiq.pid

Then we can just look at that file to see the sidekiq pid:

cat ~/tmp/sidekiq.pid

sidekiqctl

Sidekiq ships with a control executable called sidekiqctl for shutting it down.

$ sidekiqctl --help
sidekiqctl - stop a Sidekiq process from the command line.

Usage: sidekiqctl <command> <pidfile> <kill_timeout>
 where <command> is either 'quiet' or 'stop'
       <pidfile> is path to a pidfile
H
       <kill_timeout> is number of seconds to wait until Sidekiq exits
       (default: 10), after which Sidekiq will be KILL'd

Be sure to set the kill_timeout LONGER than Sidekiq's -t timeout.  If you want
to wait 60 seconds for jobs to finish, use `sidekiq -t 60` and `sidekiqctl stop
 path_to_pidfile 61`

This executable is more zealous than just sending a TERM signal. It will send the TERM signal, wait up to specified timeout for jobs to finish, and then send a kill -9 to ensure the server is killed.

sidekiqctl stop ~/tmp/sidekiq.pid

Summary

Today we had a brief overview of how Sidekiq handles various UNIX signals. They're vital to appropriate debugging and management of your Sidekiq server. See you soon!

Resources