Rockyou2024 analysis: Mega password list or just noise? 

Back in June 2021, a large data dump called ‘rockyou2021’ was posted on a popular hacking forum. It was named after the popular password list used in brute-force attacks called ‘Rockyou.txt’ – and it was a pretty big story at the time. You can see our team’s analysis on it here.  

Fast forward to 2024 and you may have seen a new compilation doing the rounds: ‘Rockyou2024.’ The original forum post claimed the password list contained ‘over 9.9 billion passwords.’ It’s a serious claim – so our team have dug into the details to analyze whether this is something organizations should be concerned about.   

Original forum post for Rockyou2024 password list
Original forum post for Rockyou2024 password list

What’s really in Rockyou2024? 

Like with ‘Rockyou2021’, there’s been a fair bit of news around ‘Rockyou2024.’ Some sources have repeated the claim of this being the largest password database leak in history with over 10 billion records. However, our team’s analysis indicates this is far from true. The dataset is neither useful as a wordlist, nor is it an alleged list of passwords that can be used to attack potential targets. In all honesty, it’s mostly garbage data, and we wouldn’t recommend focusing energy or efforts on it. 

Specops data analysis  

The delta between ‘Rockyou2024’ and ‘Rockyou2021’ is 54GB of 146GB. This results in approximately 1.5 billion (1,489,515,500) new records. Of these, processing out a number of standard hashes (as this type of data is usually a lot of these kind of compilations): 

  • 138 261 666 bcrypt.txt 
  • 480 116 331 md5.txt 
  •  202 577 427 sha1.txt 
  •  4 5676 538 sha256.txt 
  • 0 sha512crypt.txt 
  • 6 743 064 sha512.txt 
  • 873 375 026 total 

This results in additions of 616,140, 474 records (~ 20GB). Note that this is non-inclusive of all hashes or truncated hashes. It’s simply a nuclear stripping of some common hash types to truncate the data down. 

Of these, splitting the records by length, you get the following character length groups in descending order: 

  • 09  2.1G  
  • 34  833M  
  • 38  686M  
  • 24  650M  
  • 10  399M  
  • 12  264M  
  • 11  245M  
  • 13  199M  
  • 08  179M  
  • 14  176M  
  • 63  128M  
  • 20  127M  
  • 51  114M  
  • 41  113M  
  • 15  111M  
  • 32   85M  

With counts of: 

  • 09  223 055 687  
  • 34   24 940 213  
  • 38   18 423 609   
  • 24   27 229 401  
  • 10   37 978 951  
  • 12   21 259 190  
  • 11   21 384 664  
  • 13   14 833 594  
  • 08   20 780 673  
  • 14   12 288 359  
  • 63    2 085 716  
  • 20    6 322 637  
  • 51    2 289 684  
  • 41    2 814 447  
  • 15    7 248 179  
  • 32    2 689 279  

This amounts to a total of 445 624 283, or 73% of the records. If we now break these lengths down further: 

09 

9-digit numbers of various complete and partial strings. As usual, this length sits in the middle of a password length range. There may be valid passwords in here, but they’re unlikely to be different from other alternative wordlists. It’s unlikely much of the data is good. 

34 

A lot of Russian strings, poorly parsed strings, various hashes and truncated hashes. 

38 

Similar to 34, Russian language strings, various hashes and truncated hashes, for example truncated bcrypt hashes. 

24 

Largely base64 encoded strings, but they’re not English text. They’re either unicode or another layer of encoding before base64. Not worth investigating right now. As well as, similar to the other longer classes, foreign language strings; which would be better served by other wordlists and rules or masks. 

10 

Similar to 9 – a large collection of 10-digit numbers into strings. 

  • numeric: 18023570 (47.46%) 
  • loweralphanum: 11489073 (30.25%) 
  • mixedalphanum: 2423042 (6.38%) 
  • loweralpha: 2063234 (5.43%) 

12 

Collection of 12-character strings, numerics, and a lot of IPs. 

  • loweralphanum: 6384403 (30.03%) 
  • mixedalphaspecialnum: 4633689 (21.8%) 
  • specialnum: 3040090 (14.3%) 
  • mixedalphanum: 1546756 (7.28%) 
  • loweralpha: 1407371 (6.62%) 
  • mixedalphaspecial: 922710 (4.34%) 
  • mixedalpha: 768977 (3.62%) 

11 

Collection of 11-character strings, numerics, and some IPs. 

  • loweralphanum: 8871162 (41.48%) 
  • numeric: 4863557 (22.74%) 
  • mixedalphanum: 1839704 (8.6%) 
  • loweralpha: 1614811 (7.55%) 
  • specialnum: 1070368 (5.01%) 
  • mixedalpha: 822882 (3.85%) 

13 

Collection of 13-chartacter strings, numerics, and a lot of IPs. 

08 

What we would expect to see from 8 characters. 

  • loweralphanum: 10880391 (52.36%) 
  • loweralpha: 3873181 (18.64%) 
  • mixedalphanum: 2651258 (12.76%) 
  • mixedalpha: 906904 (4.36%) 
  • upperalphanum: 627632 (3.02%) 

14 

A lot of unicode junk and stuff clipped out of what looks like ‘ncurses’ output. 

63 

Generally, just junk, such as poorly processed email addresses and strings from telegram scraping.  

20 

A lot of Russian and junk. We see this trend from other long character classes; people tend not to run passwords that long, and the way these compilations are collected leads to a lot of hashes or just plain garbage. The leaker in question will often just ram a bunch of data breaches together without any processing for a large file size and media clout (the screenshot of their name on the forum post, etc).  

We continue to go off the rails past this point with longer and longer strings of poorly processed junk that isn’t usable for a wordlist to either attack hashes or use as a password in a spray attack. 

So, what’s the key takeaway? 

What this really comes down to, is the person in question has taken ‘Rockyou2021’ (which received do much uproar for the number of records) and added more collected data from other seemingly low-quality sources. They’ve then posted it with the claim of it being a huge new list in order to get clout and credit. 

This dataset should pose minimal to no risk to existing Specops customers, and the value of this dataset as a wordlist in cracking or other attacks is extremely nebulous to nil. The dataset is too large to be of any realistic use as part of any effort to crack a given hash and there’s simply too much low-quality data to successfully use in attacks. The value of the data is negligible compared to good, prepared wordlists and rulesets in the hands of a capable actor. 

This list does not in any way impact the threat model of any of our customers and should generally just be ignored as another clickbait compilation. 

Security recommendations to deal with Rockyou2024

At the end of the day, RockYou2024 was not a large dump of breached passwords as claimed (though it did contain some). However, there is still a potential that some of the contained data could come from other wordlists or be generated with other attack types. There’s no one-size-fits-all password policy recommendation for organizations looking to prevent attacks making use of the RockYou2021 and Rockyou2024 lists. Each organization will have different compliance needs and security concerns.  

The use of either Specops Password Policy, or an equivalent password filter to enforce sound password policies, is the best defense against attacks with these types of datasets. Its Breached Password Protection feature continuously scans your Active Directory for breached and compromised passwords, notifying end users that they need to change their password immediately.  

If looking to simply make use of password length as a defense, organizations could simply require long passwords or passphrases – you can follow a best practice guide for helping end users create long passphrases here. Organizations can also choose to incentivize longer passwords with length-based password aging in Specops Password Policy.  

Interested to know how Specops Password Policy could fit in with your organization? Get in touch and speak to an expert today

 

(Last updated on July 10, 2024)

picture of author marcus white

Written by

Marcus White

Marcus is a Specops cybersecurity specialist based in the UK. He’s been in the B2B technology sector for 8+ years and has worked closely with products in email security, data loss prevention, endpoint security, and identity and access management.

Back to Blog