Not so random: the perils of seeding cryptographic operation

It’s worth looking at how random numbers are generated when assessing device and systems security.
18 July 2022

Poorly generated random numbers can undermine security algorithms and protocols. Image credit: Shutterstock.

If you’re asked to think of a number, chances are that you will be drawn to certain digits – your house number, your birthday, the car number of your favorite Formula One driver; the list goes on. Human beings make poor random number generators for this very reason – our selection is weighted according to our preferences and the output will be statistically skewed. Poorly selected random numbers present a problem when seeding cryptographic operations – in other words, providing combinations that help to turn plain text into a codified (and ideally unreadable) form known as cyphertext. If a bad actor can guess the seed, or narrow down the number of attempts, then they’re in a position to reverse engineer the cypher and peek at the original message. What you need in those circumstances is a secure enclave.

On the web, encryption schemes help to protect data in transit. They also help to generate so-called digital certificates that prove that the website, device or software that you are interacting with is trustworthy. By spoofing this information, a third party could make themselves appear authentic and connect to services that should be out of bounds. To make this difficult for the bad guys, device makers strive to keep their users safe by making use of electronic random number generators. These can be standalone or, as is becoming more common, part of a dedicated security subsystem. That dedicated subsystem is a secure enclave – and it’s a strategy being used by Apple and other major product makers.

Reluctantly random

This neatly sidesteps the bias that we discussed at the top of the article – or at least it should. But there are complications to consider. Silicon chips, the building blocks of modern electronic devices, are generally designed to be predictable. Customers want them to perform the same logical operations, reliably, time after time without displaying any random behaviour. And, as Mads Haahr – a professor at Trinity College Dublin’s School of Computer Science and Statistics, and creator of random.org – notes, getting a computer to do something by chance is difficult. The solutions employed fall into two camps – pseudo-random number generators (PRNGs) and true random numbers (TRNGs).

PRNGs rely on mathematical formulae, which have some exciting-sounding names such as the Mersenne Twister (developed in the late 1990s by researchers in Japan). Processors either make use of the algorithm directly or have access to tables of pre-calculated values. But there is a catch. To maximize security, we want the probability of selecting our cryptographic seed to be equal across all possible values. PRNGs can get close, and form the basis of random number generation in many programming languages, but ultimately, they are an approximation, which can open the door to an attack.

TRNGs on the other hand, as the name gives away, aim to provide true random numbers that defy recovery – for example, through a pre-computed list of starting points. To do this, the hardware will generate its entropy (or source of randomness) by probing some kind of physical phenomena. Haahr’s random.org site uses atmospheric noise. Other candidates are unpredictable thermal or electromagnetic sources. Building these elements into the cryptographic workflow scrambles undesired attempts to find patterns, or statistical clues, in the data.

Speed bump

At first sight, given the upgrade that TRNGs provide, it seems like a no-brainer to use them for all operations, but their output – harvested from physical processes taking place around them – can take time to produce. Also, querying the hardware itself may produce a bottleneck and stymie multi-threaded operations if requests are forced into a queue to receive their results. So, in practice, a pool of true random bits will likely be leveraged with pseudo-elements to generate a much larger supply of artificial inputs, now bolstered with cryptographic resilience.

And it’s important to add that the story isn’t over once you’ve perfected the seed generation. Implementation is a massively important part of the security process too. Common pitfalls here include requesting a random number before the entropy pool has had the chance to become sufficiently large – for example, by calling the process very soon after the device has booted.

This could manifest as user id’s that are much easier to guess than developers intended for those that signed up just after a server reset. More dangerous still can be involving times and dates in the seeding process, which may sound like a reasonable thing to do on paper, but can turn out to be a security fail in practice.

Guessable passwords

A good example can be found in Charlie Miller and Chris Valasek’s write up of their 2015 Jeep Cherokee hack [PDF] where the duo noticed that the vehicle’s WiFi password was autogenerated based on the epoch time. The security researchers already knew the year of the vehicle from the registration details and could make some educated guesses as to the month and time of day when it had been first switched on (setting the WiFi password). But things became even simpler for Miller and Valasek as they realized what the manufacturer may have overlooked.

“When the head unit starts up the very first time, it doesn’t know what time it is,” reported the duo. “It has yet to get any signals from GPS or cellular connections.” Under these circumstances the unit defaulted to 00s:00m:00h Jan 1 2013 GMT, dramatically shrinking the number of possibilities for the autogenerated password, rendering it insecure.”