Defending Your Network from RockYou2021 

(Last updated on August 16, 2021)

In June 2021, a large data dump was posted to a popular internet hacking forum. This dataset was termed “rockyou2021,” named after the popular password brute-force wordlist known as Rockyou.txt.  

Media and Twitter alike were abuzz with what to do about RockYou2021. You would not be alone if you were wondering if or how you should protect your network from RockYou2021. We asked our research team to do a deep dive on the dataset, the results of which we’re sharing in this post. And while some on Twitter were advising this dataset was full of junk data that didn’t need any action, our team’s verdict wasn’t quite the same. 

What’s in Rockyou2021? 

The intent of this dataset was to be used to assist in the brute-force attacks on password hashes with the goal of finding a password in the wordlist to log into the service or system that the hash protects. This dataset was described as a combination of “COMB” (Collection of Many Breaches), and wordlists generated from Wikipedia, and other sources.  

Since this dataset is a wordlist, rather than a dump of existing credentials from existing sources (COMB records aside), no usernames are paired with these records. The dataset is simply a wordlist to be used as possible passwords for brute-force, or cracking attempts.

Our team’s analysis of RockYou2021 

An analysis was performed on the rockyou2021 wordlist; this analysis was completed using standard text-manipulation tools in order to collect subsets, as well as the records were randomly shuffled, split into subsets, and then processed to calculate appropriate statistics. 

For the purposes of the analysis (generating password statistics around the records in question), a subset of ~ 200m records was taken from the complete dataset of ~8.5b records, giving a sample of ~ 2.4%. 

For demonstration purposes, the following are a sample of records taken from the complete dataset: 


The distribution of popular common passwords as ‘base-words’ (words used in combination with letters/numbers/punctuation, or modified via casing or ‘leet-speak’) appears as follows: 

Note that the large count of passwords based on ‘123456’ is due to the many variations of number-based passwords. A lot of common passwords are based on number patterns, and those would be considered a superset of this pattern; for example, 123456789 would have ‘123456’ as a ‘base-word’ of the password. Because of that, these groupings can be partially misleading when considering passwords consisting of only numbers. One should strive, when generating a password policy, to prevent passwords and passphrases consisting of only numbers, or only letters, encouraging sufficient entropy via other characters and randomness. A similar pattern is seen with the reliance on ‘qwerty’ as a base-word in the dataset; weak passwords trend towards “keyboard-walking” patterns, since many users find them easy to remember, increasing their frequency in leaks and similar datasets.

An analysis of the lengths of records displays a similar pattern, permutations on common passwords (a part of generating a good wordlist) results in passwords that trend towards higher length; a password permutated on a keyboard walk such as ‘qwerty’ will be longer than the base-word itself.

The dataset trends towards longer passwords, necessitating the enforcement of either harder to remember longer passwords to avoid collisions with the wordlist, or optimally, the use of passphrases.  

Our team also took a look at the complexity of the RockYou2021 records. Below you can find the breakdown of how many records fall into different complexity types as well as some examples. 

Complexity Type: Lowercase letters and numbers (loweralphanum) 
RockYou2021 record count: 34,296,199 (34.06%) 
Examples: sta8342, residerais6 

Complexity Type: Lower and uppercase letters with numbers (mixedalphanum) 
RockYou2021 record count: 20,526,308 (20.38%) 
Examples: BEllow2588, peDiortho95 

Complexity Type: Lowercase letters only (loweralpha) 
RockYou2021 record count: 15,398,980 (15.29%) 
Examples: nadajuez, namchaithailand 
Complexity Type: Lowercase letters and special characters (loweralphaspecial) 
RockYou2021 record count: 5394563 (5.36%) 
Examples: pimbava-os, @mb@|it 

Complexity Type: Lower and uppercase letters and special characters (mixedalphaspecial) 
RockYou2021 record count: 2,432,456 (2.42%) 
Examples: All’Arrabbiatela, Baker_tentb 

Complexity Type: Lower and Uppercase letters (mixedalpha) 
RockYou2021 record count: 6,737,899 (6.69%) 
Examples: DenisedeRidder, BlackMightyWax 

Complexity Type: Uppercase letters and numbers (upperalphanum) 
RockYou2021 record count: 5,044,179 (5.01%) 
Examples: CIZAWOVY1, EDUARDO6592 
Complexity Type: Uppercase letters and special characters (upperalphaspecial) 
RockYou2021 record count: 284,279 (0.28%) 
Complexity Type: Lowercase letters, special characters, and numbers (loweralphaspecialnum) 
RockYou2021 record count: 3,811,000 (3.78%) 
Examples: rhs;ysq52, promu|gat3 

Complexity Type: Numbers only (numeric) 
RockYou2021 record count: 3,303,380 (3.28%) 
Examples: 66748719, 87925501 

Complexity Type: Lower and uppercase letters, special characters, and numbers (mixedalphaspecialnum) 
RockYou2021 record count: 1,582,514 (1.57%) 
Examples: D3PR3Da7!0NS, 75Henri- 

Complexity Type: Uppercase letters only (upperalpha) 
RockYou2021 record count: 1,154,030 (1.15%) 

Complexity Type: Uppercase letters, special characters, and numbers (upperalphaspecialnum) 
RockYou2021 record count: 462,041 (0.46%) 
Examples: 9753(OL>@$^*, <MNBGJL”_098 

Complexity Type: Special characters and numbers (specialnum) 
RockYou2021 record count: 274,758 (0.27%) 
Examples: @12345678910111213@, 8#####@*_*_*_(0-0) 

The above breakdown indicating that adding most of RockYou2021 to a breached password protection list is not required as sufficient complexity rules could protect against over 95% of all records in RockYou2021. By simply requiring upper, lower, numbers, and special characters, one would rule out a valid password being contained in the following categories (comprising of 96.5% of our sample). 

Password Policy Recommendations for RockYou2021 

There is no one-size-fits-all password policy recommendation for organizations looking to prevent attacks making use of the RockYou2021 list. Each organization will have different compliance needs and security concerns. 

However, the strongest defense against an attempt to brute force or crack hashes using this wordlist, would be to use sufficiently long passphrases, or sufficiently long complex strings. As recommended in the NIST Special Publication 800-63B, section, 

“ Verifiers SHALL require subscriber-chosen memorized secrets to be at least 8 characters in length. Verifiers SHOULD permit subscriber-chosen memorized secrets at least 64 characters in length.” 

If looking to simply make use of password length as a defense, organizations could simply require long passwords or passphrases. The majority of RockYou2021 records had less than 22 characters and most records on the longer end of the range were not human readable. Organizations could take the approach of encouraging the use of passphrases, requiring a minimum of that many characters or set a lower minimum but incentivize longer passwords with length-based password aging in Specops Password Policy. 

Incentivizing 22+ character passwords with Specops Password Policy’s length-based password aging

Another approach is to combine a length requirement alongside character requirements. After reviewing the password length analysis in RockYou2021, and the complexity of records contained in this dataset; our team found that using a strong password policy requiring 16 characters or more, and encouraging higher entropy in the passphrase, such as some capital letters, or other complex characters would rule out more than 95% of records on the wordlist. This is not only a defense against the cracking of hashes (or brute-forcing) via wordlists such as Rockyou2021, but it is also a defense against brute-forcing these records; the longer the secret, and the higher the entropy, the more costly (and therefore less likely to succeed) the brute-force attack.

A password rule requiring mixed alpha, numbers, and special characters (preferably a complex passphrase) could simply rule out the dataset. By setting a minimum password length, and using a complexity rule building on the required complexity classes, one is able to prevent users from creating passwords weak to this kind of bruteforce wordlist generation. Using Specops Password Policy to configure such a policy:

At the end of the day, RockYou2021 was not a large dump of breached passwords (though it did contain some). However, it is still a wordlist attackers may choose to use in their attacks against your network.

The use of either Specops Password Policy, or an equivalent password filter to enforce sound password policies, is the best defense against attacks with these types of datasets. Combined with a sound leaked password breach protection service, such as Specops Breached Password Protection, organizations can raise the level of effort required for attackers to breach their networks via a password attack.

Back to Blog