On his Security Now podcast, Steve Gibson recently disagreed (transcript; search for “xkcd”) with the way Randall Munroe calculated the number of bits of entropy of passwords in this excellent XKCD comic:

The disagreement boils down to Randall giving a *lower* bound on the entropy, and Steve giving an *upper* bound (and not realizing—or neglecting to mention—the difference). Randall is right, and Steve is wrong. Let me explain:

# Lower bound

In both passwords featuring in the comic, Randall assumes a particular password template—in effect, an automatic procedure where you input some number of random bits and get a password. An example, similar to (but simpler than) the first of the comic’s password templates: Pick a six-letter dictionary word (in lowercase), capitalize one of the letters, and stick a number at the end. The number of possible passwords generated by this template is the number of words in the dictionary times six times ten; the number of bits of entropy is the base-2 logarithm of this number.

This is a **lower bound** of the entropy in the passwords generated by this template, because we assume that the adversary knows everything—in particular, knows which template the password was made from—*except* the random numbers we used to pick the particular dictionary word, the letter to capitalize, and the last character. Any particular adversary may know less, but no attacker will know more. The lower bound on the entropy describes how many guesses even a maximally well-informed adversary must make.

Of course, there are caveats: In particular, where the template asks for randomness (such as in “pick a dictionary word”), you are expected to use high-quality randomness such as produced by coin flips, dice, or good computer programs. Just trying to *think* of a random word won’t cut it; you’ll be more likely to think of some words than others, and thus get less entropy than you thought.

# Upper bound

An **upper bound** on the entropy of a given password can be obtained by finding a simple template compatible with the password (i.e., such that the password could have been generated from that template) and then counting how many bits of randomness the template uses to produce the password. For example, given the password “zoMbie8”, one candidate template is “pick seven characters that are either lower- or uppercase letters or numbers”; this gives an upper bound of (26+26+10)^7 = 3,521,614,606,208 possible passwords, or about 42 bits.

Another candidate template is “pick a six-letter dictionary word (in lowercase), capitalize any one letter, and add a digit at the end”. The dictionary on my computer has 9300 six-letter words, times six possible positions for the capitalized letter, times ten possible digits at the end, gives 558,000 possible passwords, or about 19 bits.

Now, we can all agree that 19 bits is quite a bit less than 42—meaning that the former is a tighter bound. But how low can we go? Given a specific enough template, the entropy becomes arbitrarily small; consider e.g. the template “take the word ‘zombie8’ and choose at random whether to capitalize the third letter”, which has just two possible passwords, for a single bit of entropy. How can we meaningfully speak of an “upper bound”, if we can make it as small as we want? (Specifically, consider the case where the password was in fact generated by the second template, and the *lower* bound is 19 bits. If our upper bound can be lower than the lower bound, we have a problem.)

The catch is the provision that the template be “simple”, as I sneakily specified in the first sentence describing the upper bound. The precise meaning of “simple” is “describable with few bits”, because strictly speaking, the upper bound is the sum of the number of bits needed to describe the template and the number of random bits used by the template. (This what the parenthesis in the first panel of the comic is talking about.) As long as the latter is much larger than the former we don’t make much of an error by omitting it, but as we make the template more and more specific, the error grows.

A useful rule of thumb for evaluating the simpleness of a template is to consider how many other templates just like it there are. The first template, with seven random characters, has very few variants (you could choose to include other characters than letters and digits or change the length, but that’s about it). The second template has more, but still not many (you could e.g. change the number of capitalized characters, and change the position of the digit). The third template obviously has a great many very similar variants—just choose another word or another digit!

# So, which one of them is the password strength?

Which number do you go by when choosing a password? A lower bound, which describes how strong your password is guaranteed to be, or an upper bound, which describes how strong it isn’t?

Which number is more useful printed on elevators? The number of people they’re guaranteed to hold, or the number of people guaranteed to be enough to make the cable snap?

# Password strength meters

So, the lower bound on the entropy is the useful number when creating a pasword. But which number do you get when testing your password in a password strength meter (such as this one)? An *upper* bound! There’s a simple reason for this: when calculating the lower bound, you need to know what template was used; just seeing one particular password produced by the template isn’t enough.

The password strength meter tries a number of templates. If one of them is close enough to what was actually used, its entropy estimate can be quite good; but if not, it can be way, way off (and it’s always an overestimate, never an underestimate).

Password strength meters are useful for identifying weak passwords, but they can’t guarantee that a given password isn’t weak. That can only be done by generating the password from a template that uses enough random bits.

And that’s why Randall is right and Steve is wrong—Randall gets his numbers from the amount of random bits poured into the password template to generate the password (lower bounds), while Steve gets his numbers by feeding Randall’s passwords to a password strength meter (upper bounds). It’s not that Steve isn’t doing his math right, it’s that he isn’t doing the right math.

# How hard is it to crack?

The whole point of a password is that it shouldn’t be guessable, so in a very real sense, the strength of a password is defined by how hard it is to crack.

Password cracking programs work essentially the same way as password strength meters, but backwards: using a succession of templates, they try every combination of random bits that can be fed to that template and see if the generated password is the right one. They start by using slightly complex templates (such as ones based on a random dictionary word), and if those fail, fall back to simpler templates (based on trying every possible password of some bounded length). If the template actually used to generate the password is close to one of the templates the cracker tries, the number of tries needed won’t be all that much more than what’s guaranteed by the lower bound on the password’s entropy. But if the actual template isn’t close to any of the ones the cracker tries, it needs many more tries.

Consider for example the password “zoMbie8”; if the cracker tries templates based on a single dictionary word with a few simple tweaks, it should be able to find it in only millions of tries, but if it has to resort to trying raw combinations of letters and digits, the tries will number in the millions of millions.

This means that there are two ways to make a secure password: use a template the password crackers don’t know about (or don’t bother to try, because so few people use it for their passwords), or use any old template and feed it with enough random bits. The former strategy relies on outwitting smart people who spend much of their time coming up with better ways to crack passwords; the latter just takes more coin flips. It’s security by obscurity vs. real security.

(Excercise for the reader: What’s the problem with Steve’s Password Haystack scheme?)

# Postscript: The *actual* entropy

I’ve talked a lot about lower and upper bounds on a password’s entropy, so you might be wondering the obvious: What’s the *actual* entropy, and why aren’t I talking about it?

The entropy of a string of characters (which is just what a password is) is closely related to its Kolmogorov complexity—in a sentence, the entropy of your password is equal to the length of the shortest computer program that generates it. However, this definition is useless to us, since it can be shown that Kolmogorov complexity isn’t computable.

A password template, being essentially just a string, has its entropy computed the same way, which is why I was just waving my hands when discussing it earlier. Approximating it by zero as we do gives lower bounds that are lower than strictly necessary, and upper bounds that are lower than they should be and thus run the risk of not being correct (but nothing much in the analysis depends on them being correct).

You say that the lower bound is when the attacker knows everything there is to know about the method you used to generate your password

Say the system is that you can pick a 6 letter word from a dictionary of 100, and append a single diget to the end to create a simpe 7 digit password.

The lower bound for the entropy of the system by your count is 100*10, where the attacker knows everything there is to know about the creation of the password.

I would contend that if the attacker knew everything about the creation of your password, the entropy would in fact be 1. (100*10 / the fact that he knows your password becasue he knows EVERYTHING about the creation of it.)

Your argument is surly completely invalid, since you are assuming that the most the attacker can know about a password is the system used to create it, when it is in fact the password itself.

As stated in the post, for the lower bound I’m assuming that the password creator uses (1) a password template, for example “pick a 6 letter word from a dictionary of 100, and append a single digit to the end”; and (2) as many random bits as necessary to make the template produce an actual password (in your example, we’d need log(100*10)/log(2)=9.97 bits).

The (realistic) assumption I make is that the attacker might correctly guess the template with just a small number of guesses, since a small number of templates are used by a large number of people; but cannot know what random bits you used, since if she does, e.g. by having installed a keyboard logger on your machine, password strength is irrelevant.

Your point—that the worst case is equivalent to the attacker already knowing your password—is of course valid, but hardly useful when trying to assess how much work it would be for a realistic attacker to crack a password.

Note the similarity with crypto, where we assume that the attacker has complete knowledge of which encryption algorithm etc. was used, but no knowledge about the 128 (or however many) bits we’ve used as key. An actual attacker might not know what encryption algorithm was used, but trying them all just adds a factor of perhaps 100 to the difficulty—about the same as you get by increasing the key length by 7 bits. So we might as well assume that the algorithm isn’t secret. Whereas if we can’t assume that the key is secret, the attacker has already won.

Ok, i don’t disagree with your maths, simply your definition of lower bound, and the fact that a lower bound of 1 is not useful information.

If we can agree in abstract terms that the more the attacker knows about the password the less combinations of bits he has to try to find it (do we agree here?), then i would say that it follows that the lower bound is 1, and the upper bound is defined by maximum password length and avalible symbols.

If the attacker knows the whole password bar 1 bit, then two guesses are required, 2 bits 4 and so on.

The worst case is that the attacker knows the whole password, but there are a lot of ‘less worse’ cases between that case and the one where all the attacker knows is possibly the template used.

I think you are suggesting that this (knowing the template used) is the most common level of knowlage an attacker has about a password. If this is true (in some context, say googlemail passwords) then i don’t disagree that this is a useful assumption, but that doesn’t make it the lower bound.

As for wether knowing the lower band is 1 being useful or not, i would say that it is somewhat useful. After all if i could patent a password system where even when the attacker knows my password the lower bound was > 1, i would very quickly become very rich … if i could implement it … unlikely :-)

i tried to read the article and got completely lost, i dont know much about password cracking, doesnt seem like this is a good article to start.

but one thing did strike my interest.

where are these “password” templates?

it seems most sites dont require you to use any sort of template other than 8<16 num+sym+letters.

The password template is essentially the parameters you set when creating the password. For example, if you always make passwords that are a dictionary word followed by two numbers, like apple12, blue22, or halo97, then that is what the template would be. That’s a common template and as such would probably be cracked fairly quickly by programs. Hope this helps answer your question.

I enjoyed this article, and I think that it is a fairly good analysis of the subject, except for your postscript. Entropy is inextricably linked to probability, and Kolmogorov complexity is not really relevant to the current discussion.

In order to compute the actual password entropy, you would need to know the underlying probability distribution from which the passwords are chosen. The actual entropy is equal to the negative log (base 2, if we are measuring entropy in bits) of the probability mass function at that particular password (since the probability distribution of passwords is discrete).

Of course, we don’t know (and can never know) the exact probability distribution from which passwords are chosen, but we can make a good approximation based on analysis of password leaks and common passwords. It is this analysis which gives the various common password templates people choose from, and since there are so few of these, we can assume that they only provide a few bits of entropy. This can be added to the entropy that the template itself takes as input, which, again, can only be estimated if they are chosen by a human. You address this pretty well in your article.

An attacker will be most efficient if they guess the passwords with the least entropy first. This follows directly from the definition of entropy: these passwords have the highest probability of being chosen by users from the underlying probability distribution. The answer to this, of course, is as you stated in your article: choose a template that allows you to insert a lot of entropy, such as the xkcd-style template (though, personally, I would use a larger dictionary and more words).

I’m not convinced that Shannon style entropy is appropriate for strings as short as passwords/phrases and even less convinced that 2^entropy equates to inverse guessability.

A systems approach should be taken for password guessability. Countermeasures such as bad try lockouts will foil a human on words as simple as zombie8.

So we must be concerned with computer cracking against stolen lists of passwords.

Given recent advances in GPU cracking tools whereby each word can be treated as a single token allows cracking to attempt a large number of tries against a large number of templates in a clock time that was unthinkable just a few years ago.

Word tokens provides a much larger search space than single character tokens, but not so big to be impossible.

Storing such lists properly and using high quality hashing and salting is absolutely necessary. Statistics I’ve seen recently on cracking is almost always in reference to poorly hashed and poorly salted lists.

I still believe in passphrases though, so long as it has enough (?) words along with special characters as delimiters.

[…] is why the oft-cited XKCD scheme for generating passwords — string together individual words like […]