Friday, July 13, 2012

My Mysterious Double Cron

One of the websites I manage at work has a cron associated with it that ran every Monday morning at 1:00 AM to process outstanding invoices.
0 1 * * 1 php cron_script.php >> cron_script.log 2>@1
The script worked beautifully until June 18 when everyone who was charged was billed twice. I first thought my script must have a bug, but further inspection proved that wasn't the case[1].

The output log only had one entry for June 18, so I was inclined to blame it on network errors[2].

But most signs showed that the cron had run twice, at almost the exact same time. I was reluctant to accept this because cron is a stable library, but the evidence was overwhelming. The order of the double-charge pairs was perfectly sequential.
11223344
If both executions weren't at almost the exact same moment you would see something more mixed.
12132434
We could see that from the responses we get back from our credit card processing company and store in the database, information stored in the eCommerce software to show customers when they were charged, and history that our credit card processing company keeps.

So the cron ran twice but logged its activity once. After much debate, the only plausible solution we could divine was that the NTP update script also ran at 1:00 AM and caused the clock to drift backwards a few hundred milliseconds and repeat 1:00 AM. We could explain the log file having on entry by assuming the first execution obtained an exclusive lock on the file.

But then I did some research and found out that NTP almost never drifts the clock backwards (especially not for such small intervals as we were expecting) and redirecting output in the shell to a file does not lock the file.

Our theory was sufficiently busted, and now we have no possible explanations for both anomalies:
  • How did the cron run twice at the same time?
  • Why did the output only get written to the log file once?
Surely it must have been cosmic rays.

[1] Technically my script did have a bug in that it allowed people to be billed twice if the script ran twice at the same moment. That has since been fixed by applying an exclusive lock to the logfile. But I realize now that the fix is insufficient. I have a better idea.

[2] We have had other problems related to network errors between us and our credit card processing company. My inclination to blame the network wasn't random.

No comments: