How accurate is this XKCD 936 comic?
How accurate is this XKCD comic from August 10, 2011?
I've always been an advocate of long rather than complex passwords, but most security people (at least the ones that I've talked to) are against me on that one. However, XKCD's analysis seems spot on to me. Am I missing something or is this armchair analysis sound?
I love xkcd 936 and agree with his basic point -- passphrases are great for adding entropy, but think he low balled the entropy on the first password.
Let's go through it: Random dictionary word. xkcd: 16 bits, Me: 16 bits. A random word from a dictionary with ~65000 words is lg(65000) ~ 16. Very reasonable Adding in capitalization. xkcd: 1 bit, Me: 0 bits (deal with in common subsitutions). 1 bits means its there or not there -- which seems very low for the complexity added by adding capitalization to a password -- generally when capitalization to a password its in a random place or I can think of other many possible capitalization schemes (capitalize everything, capitalize the last letter). I'm going to group this with common substitutions.
Common substitutions. xkcd: 3 bits, Me: 13 bits. Only 8 choices for leet speak substitutions, even when only sometimes used? I can think of on average ~2 ways to leet alter each letter (like a to @,4; b to 8,6; c to (,[, <) and add in the original or capitalizing the letter, then for a 9 letter password, I've increased the entropy by 18 (lg(4**9)=18), if I randomly choose each letter to leet alter or not. This is too high for this password, however. A more reasonable approach would be say I randomly chose say 4 letters to "common substitutions" to (one of two leet options, plus capitalization for three options for each letter being substituted). Then it gets to lg( nCr(9,4) 3*4) ~ 13 bits.
Adding two random characters '&3' to the end. xkcd: 8 bits, Me: 15 bits. I'm considering it as two characters to one of ~4 places (say before the word, one before and one after, both after, or both smack in the middle of the word). I'll also let non-special characters be in these two added letters. So assuming an 88 character dictionary (56 characters+10 digits, plus 32 symbols) you add lg(4 * 80**2) ~ 15 bits of entropy.
So I have the calculation as not being 16+2+2+8=28 bits of entropy, but being 16+13+15=44 bits very similar to his passphrase. I also don't think 3 days at 1000 guesses/sec is by any means "easy" to guess or the plausible attack mechanism. A remote web service should detect and slow down a user trying more than 10000 guesses in a day period for any specific user. Much more likely are other attacks (e.g., key loggers on public computers, a malicious admin at a service logging passwords and reusing them, eavesdropping if ssl not used, get access to the server somehow (e.g., SQL injection; break into server room; etc)).
I use a passphrase when its necessary -- e.g., strong encryption (and not 44-bits more like 80-bits -- typically 8 word diceware passphrase plus two or three modifications -- e.g., misspell a word or substitute a word for a non diceware word starting with the same two letters; E.g., if you had "yogurt" come up maybe substitute it for "yomama"). For websites (no money involved or security permissions), I don't care about trivial passwords are typically used. I do notice that for often used passwords, I'm much much better at typing passwords then I am at typing my passphrases (which get annoying when you have to re-key in a ~50 character sentence a few times). Also for passwords, I often prefer finding a random sentence (like a random song lyric -- to a song no one would associate with you that's not particularly meaningful) and come up with a password based on the words (like sometimes use first letter; last letter; or substitute a word for a symbol; etc). E.g. L^#g&B9y3r from "Load up on guns and bring your" from Smells Like Teen Spirit. TL;DR: Randall is right, if you (a) assume you can check passwords at 1000/s for three days without getting slowed down, and (b) can make many assumptions about the password: constructed based on a rare dictionary word, that is capitalized, has some leet substitutions for some commonly substituted letters, and has a symbol and number added at the end. If you only slightly generalize the allowed substitutions (like I did) and characters added at the end, you get a similar entropy to the passphrase.
In summary, both are probably secure for most purposes with a low threat level. You are much more vulnerable by other exploits, something Randall would likely agree with [538] [792]. In general having password requirements like having a upper/lower/symbol/number is good, as long as long high-entropy passphrases are also allowed. Force additional complexity for shorter passwords, but allow over ~20 characters to be all lower case. Some users will choose poor passphrases just as they choose poor passwords (like "this is fun" which is idiotically claimed to be ridiculously secure here or using their child's name or their favorite sports team). Requiring special characters may make it non-trivial to easily guess (say by a factor of 100-1000 -- changing a password from being 10 likely guesses to 10000 is very significant). Sure it won't prevent any bot on a weak web service that allows thousands of bad login attempts per second, but it forces an addition of a modest level of security which would hinder efforts at sites that limit bad logins or from the unsophisticated manually guessing the password. Sort of like how a standard 5-pin house lock is fundamentally insecure as anyone can learn to pick it in minutes (or break the glass window); however in practice locking your door is good as it provides some safety against the unsophisticated who don't have tools handy (and breaking windows comes with its own dangers of alerting others).