Wednesday, December 12, 2007

What is a futex and why does Phi hate it?

A futex is a mechanism for dealing with resource contention and synchronisation between (semi)independent threads or processes. It is used to build locks and various other tools for managing threads.

What? didn't understand a word of that? no big deal. it's just a coding concept that means hundreds of different user requests can all access the database at the same time without getting in each others way.

I hate it, because right now, I'm having real trouble with one on Entrecard. The problem is simply a situation called a "deadlock". Imagine the following scenario (simplification):

Piece of code A uses Lock 1, then Lock 2
Piece of code B uses Lock 2, then Lock 1

These will always work, except in the instance when A and B initiate at precisely the same time. When this occurs, code A gets Lock 1, but can't get Lock 2, and code B gets Lock 2, but can't get Lock 1. As a result both of them sit there waiting for the other, and neither of them will release their lock. Immediate result: horrible nasty application hang.

The evil bit is that you'll never notice this scenario when you're developing your application, because a single user almost never triggers this kind of event. Nor are you likely to notice it when your application has a low volume of users, for the same reason. It appears out of nowhere just when you have a bunch of users, and just when you really really don't want things like horrible nasty application hangs.

I have one of these somewhere. The problem is not fixing it, the problem is finding it. Entrecard is a lot of code, with a very high level of abstraction, and the nature of a deadlock makes it very difficult to debug after it has happened in a complex environment. Worse, it happens totally randomly, so I'm unable to narrow down where in the application it is happening. I'm confident I'll get it but HOLY CRAP is it frustrating, especially when I have to drag myself out of bed at 3am because it hung the app server and I have to restart it.

Labels: , , ,