A good friend of mine, Ian, recently forwarded me an internet joke. The headline was something like: “All credit card PIN numbers in the World leaked” The body of the message simply said 0000 0001 0002 0003 0004 … |
Ian’s messages made me chuckle. Then, later the same day, I read this XKCD cartoon. The merging of these two humorous topics created the seed for this article.
|
||
I love Randall’s work. My favorite, to date, is this one. I have a signed copy of it on my office wall. Like many of his creations, this cartoon is excellent at bifurcating readers; people read it, then either smile and chuckle, or stare blankly at it followed by a “Huh? I don’t get it!” comment. Then you explain it, and get a reply “Yeeaaaaaa…no, I still don’t get it!” Esoteric humor in action. You can be cool and buy his signed artwork too. |
||
|
There are 10,000 possible combinations that the digits 0-9 can be arranged to form a 4-digit pin code. Out of these ten thousand codes, which is the least commonly used? Which of these pin codes is the least predictable? Which of these pin codes is the most predictable? If you were given the task of trying to crack a random credit card by repeatedly trying PIN codes, what order should you try guessing to maximize your chances of selecting the correct number in the shortest time? |
If you had to make predication about what the least commonly used 4-digit PIN is, what would be your guess?
This tangentially relates to the XKCD cartoon. In Randall’s cartoon, the perpetrator’s plan backfired because his selected license plate was so unique that it was very memorable. What is the least memorable license plate? Ask any spy you know (snigger) what the best way to blend into a crowd is. Their answer will be not stand out, to appear “normal”, and not be notable in any way.
People are notoriously bad at generating random passwords. I hope this article will scare you into being a little more careful in how you select your next PIN number. Are you curious about what the least commonly used PIN number might be? How about the most popular? Read on … |
This article is not intended to be a hacker bible, or to be used as a utility, resource, or tool to help would-be thieves perform nefarious actions. I will only disclose data sufficient to make my points, and will try to avoid giving specific data outside of the obvious examples. I do not want to be an enabler for script-kiddies. Please do not email me asking for the database I used; if you do, you will be wasting your time as I’m not going to respond. I’m not going to sell, donate or release the source data – don’t ask!
Obviously, I don’t have access to a credit card PIN number database. Instead I’m going to use a proxy. I’m going to use data condensed from released/exposed/discovered password tables and security breaches.
Over the years, there have been numerous password table security breaches: Some very high profile, some low profile, but all embarrassing (and many exceedingly expensive; both in direct fines and indirect loss of business through erosion of trust and reputation). Fool me once, well, no, even that’s not really acceptable, but fool me twice … I’ll go even further: Any developer who stores the password table of their database in clear text should be so mortified by this lack of security that they should not be sleeping at night until they fix it. Ignoring the fact that you should never have ever coded it this way, you have an obligation to learn from these past breaches. If you work for a company and are knowledgeable that your customer database is “protected” by such lightweight security then run, don’t walk, to your CEO/Presidents office, pound on the door and insist (s)he puts out a mandate to fix the matter with extreme prejudice. Don’t leave until you get an affirmative response. Badger, badger then badger them again. Make yourself a proverbial thorn in their side. |
I’m not trying to sell my services as a consultant here (though if you are interested, my rates are very reasonable compared to the cost of legal defense, potential FTC sanctions, class action suits, shareholder backlash, fines, loss of reputation and business …) There are plenty of security experts in the industry who can help you (if you need help filtering them and don’t have referrals, someone who has CISSP qualifications is a good place to start).
Bottom line Security strengthens with layers, and the simple application of encryption on your database table can help protect your customer’s data if this table is exposed. It does not defend against all possible attacks, but it does nothing but good things. What possible reason is there store things in clear-text?
By combining the exposed password databases I’ve encountered, and filtering the results to just those rows that are exactly four digits long [0-9] the output is a database of all the four digit character combinations that people have used as their account passwords. |
Given that users have a free choice for their password, if users select a four digit password to their online account, it’s not a stretch to use this as a proxy for four digit PIN codes.
I was able to find almost 3.4 million four digit passwords. Every single one of the of the 10,000 combinations of digits from 0000 through to 9999 were represented in the dataset.
The most popular password is 1234 … |
… it’s staggering how popular this password appears to be. Utterly staggering at the lack of imagination … |
… nearly 11% of the 3.4 million passwords are 1234 !!! |
The next most popular 4-digit PIN in use is 1111 with over 6% of passwords being this. In third place is 0000 with almost 2%. A table of the top 20 found passwords in shown at the right. A staggering 26.83% of all passwords could be guessed by attempting these 20 combinations! (Statistically, with 10,000 possible combination, if passwords were uniformly randomly distributed, we would expect the these twenty passwords to account for just 0.2% of the total, not the 26.83% encountered) Looking more closely at the top few records, all the usual suspects are present 1111 2222 3333 … 9999 as well as 1212 and (snigger) 6969 . It’s not a surprise to see patterns like 1122 and 1313 occurring high up in the list, nor 4321 or 1010 . 2001 makes an appearance at #19. 1984 follows not far behind in position #26, and James Bond fans may be interested to know 0007 is found between the two of them in position #23 (another variant 0070 follows not much further behind at #28). |
|
The first “puzzling” password I encountered was 2580 in position #22. What is the significance of these digits? Why should so many people select this code to make it appear so high up the list?
Then I realized that 2580 is a straight down the middle of a telephone keypad! | ||
(Interestingly, this is very compelling evidence confirming the hypothesis that a 4-digit password list is a great proxy for a PIN number database. If you look at the numeric keypad on a PC-keyboard you’ll see that 2580 is slightly more awkward to type on the PC than a phone because the order of keys on a keyboard is the inverted. Cash machines and other terminals that take credit cards use a phone style numeric pads. It appears that many people have an easy to type/remember PIN number for their credit card and are re-using the same four digits for their online passwords, where the "straight down the middle" mnemonic no longer applies). |
(Another fascinating piece of trivia is that people seem to prefer even numbers over odd, and codes like 2468 occur higher than a odd number equivalent, such as 1357 ).
As noted above, the more popular password selections dominate the frequency tables. The most popular PIN code of 1234 is more popular than the lowest 4,200 codes combined!
That's right, you might be able to crack over 10% of all codes with one guess! Expanding this, you could get 20% by using just five numbers!
Below is a cumulative frequency graph:
Statistically, one third of all codes can be guessed by trying just 61 distinct combinations!
The 50% cumulative chance threshold is passed at just 426 codes (far less than the 5,000 that a random uniformly distribution would predict). Paranoid yet?
OK, we've investigated most frequently used PINS and found they tend to be predictable and easy to remember, let's turn for a second to the bottom of the pile. What are the least "interesting" (least used) PINS? In my dataset the answer is 8068 with just 25 occurrences in 3.4 million (this equates to 0.000744%, far, far fewer than random distribution would predict, and five orders of magnitude behind the most popular choice). To the right are the twenty least popular 4-digit passwords encountered.
|
|
Many of the high frequency PIN numbers can be interpreted as years, e.g. 1967 1956 1937 … It appears that many people use a year of birth (or possibly an anniversary) as their PIN. This will certainly help them remember their code, but it greatly increases its predictability.
Just look at the stats: Every single 19?? combination can be found in the top fifth of the dataset!
Below is a plot of this in graphical format. In this chart, each yellow line represents a PIN number that starts 19??
If all the passwords were uniformly distributed, there should be no significant difference between the frequency of occurrence of, for instance, 1972 and any other PIN ending in seventy two ??72 . However, as we shall see, this is not the case at all.
1972 occurs in ordinal position #76 (with a frequency 0.099363%). Here’s a histogram for the occurrences of all ??72 probabilities.
You can clearly see the spike at 1972 (with smaller spikes at 7272 and 1472 )
If you calculate the ratio of the peak of 1972 to the average of all the other ??72 PINS you get the ratio of 22:1
PINS starting with 19?? are much more likley to occur. Of course, it’s not just 1972. Here is plot of the ratio of 19 to non-19 for all hundred combinations. Along the x-axis are all the combinations of last two digits 帽X, and for each of these the ratio of the 19XX to average of all the other ??XX occurrences has been calculated. Here’s the chart:
It's a pretty good approximation for a demographic chart! (suggested by the red-dashed trend line) which would probably allow a fair estimation of the ages (years of birth) of the people using the various websites. (Of course, hackers invert this strategy and use the age of a target to try and give information to guess a user's PIN. Looking at this graph, this might give them up to a 40x advantage!)
Just about all the ratios are above 1.0. The noteable exceptions are ??34 and ??00 (which are easy to explain, since the massive popularity of 1234 and 0000 dwarf 1934 and 1900 respectively). Simiarly 33 44 55 66 … are lower than expected as the quad codes like 3333 mask out even the 1933 boost.
There are also spikes in the graph corresponding to the popular PINS of 1919 1984 and 1999