A few years ago I designed a way to detect bit-flips in Firefox crash reports and last year we deployed an actual memory tester that runs on user machines after the browser crashes. Today I was looking at the data that comes out of these tests and now I'm 100% positive that the heuristic is sound and a lot of the crashes we see are from users with bad memory or similarly flaky hardware. Here's a few numbers to give you an idea of how large the problem is. 🧵 1/5
There’s a jump instruction by an address read from RAM, a bit flip occurred so a condition “if friend greet else kill” worked as “if friend rape else kill”. Absolutely anything can happen, that wasn’t determined by program design flaws and errors. A digital computer is a deterministic system (sometimes there are intentional non-deterministic elements like analog-based RNGs), this is non-deterministic random changes of the state.
In concrete terms - things break without reason. A perfect program with no bugs, if such exists, will do random wrong things if bit flips occur. Clear enough?
There’s a jump instruction by an address read from RAM, a bit flip occurred so a condition “if friend greet else kill” worked as “if friend rape else kill”. Absolutely anything can happen, that wasn’t determined by program design flaws and errors. A digital computer is a deterministic system (sometimes there are intentional non-deterministic elements like analog-based RNGs), this is non-deterministic random changes of the state.
In concrete terms - things break without reason. A perfect program with no bugs, if such exists, will do random wrong things if bit flips occur. Clear enough?