This website uses cookies to ensure you get the best experience on our website. Learn more
ALIEN TXTBASE data-dump analysis: Dangerous or junk?
Specops researchers have been digging into the ALIEN TXTBASE data-dump, which was recently merged into the HaveIBeenPwned (HIBP) dataset by Troy Hunt. After some analysis of the over 200 million passwords in this dataset, we estimate about 20 million are new to the Specops Breached Password Protection database – so we’ve added those in to keep customers protected. The majority appear to have already been covered by our threat intelligence sources, honeypot systems, and the removal of duplicates.
How did we get hold of ALIEN TXTBASE?
Back in January 2025, a government agency reached out to Troy Hunt (HIBP) regarding some large (4gb) files being offered for sale on Telegram. This data was provided to Mr. Hunt and he started work on processing it for inclusion into HIBP (apparently talking about a month). During this time, a third party had begun offering the dataset via various third party sharing sites via posts on Breached Forums. This is where we (Specops) acquired the ALIEN TXTBASE dataset from, and we proceeded to analyze it and process it for inclusion in our own breached password database.
Analysis of the ALIEN TXTBASE data dump
As with the Rockyou2024 data dump, our researchers found it wasn’t quite the mega-leak it was initially hyped as. The dump contained a pretty standard distribution of base words, passwords, and lengths – essentially a lot of peoples’ local password stores. There was a non-zero amount of junk, telegram URLs, and other stuff mashed in there too. It’s clear this is someone collecting and processing a lot of stealer logs into one.
Due to the size of the dataset, it wasn’t possible to perform an analysis on the entire dump as a whole. As such, a random sample of 92,7790,080 records was taken (approximately 30% of the total), and an analysis was performed against this subset.
This dataset also had an initial first pass by the user sharing it reducing it from a standard stealer log format (json of the URL, username, password, and other data) into two of the following formats:
- url:username:password
- url|username|password
And then some records had broken formats, or otherwise extra data appended. Not all records were complete, or were whole clean sets of credentials.
Domain occurrences
The dataset is varied due to the nature of stealer logs (malware pulling records stored in local password stores, such as in a Chrome browser). We can see it does seem to trend towards social media and consumer mail accounts, rather than internal corporate accounts. However, this doesn’t mean corporate accounts aren’t represented at all. You can see a few selected domains below that highlight this difference.
Domain | Occurrence count |
---|---|
x.com (formerly Twitter) | 15,727,771 |
microsoft.com | 993,554 |
irs.gov | 108,944 |
Password occurrences
Backing up similar findings to our 2025 Breached Password Report on malware-stolen credentials, the dataset contains millions of occurrences of common weak passwords such as 123456, admin, and password. Outside of the junk data (highlighted in red in the below table), it’s clear this dataset is stealer logs cleaned up from their raw JSON into url:username:password with no concern for whether the data is good or not. You’ll also note the telegram chat URL being appended. This is common with a lot of stealer logs; the ratio of quality clean data that can be immediately used in an attack is often low.
It typically requires some amount of careful parsing for an attacker to get down to data that can be used in an attack. If they were searching for a specific domain for a specific attack, it could be more beneficial. But if an attacker wanted to just try a large number of domains, there’s some work involved to process the dump into a clean set of data they could use for an attack.
The prevalence of ‘Spy Hunter’ is interesting and possibly related to this anti-malware tool – although it’s hard to say for certain.
Password | Occurrence count |
---|---|
123456 | 1,830,528 |
[UNKNOWN or V70] | 1,657,685 |
admin | 1,072,449 |
12345678 | 796,449 |
password | 602,910 |
123456789 | 602,315 |
//t.me/+hfTW5AYawTo4NG4Ji | 498,183 |
1234 | 428,201 |
Spy_Hunter4 | 358,011 |
[UNKNOWNorV70] | 347,099 |
Base words
We can reach a similar conclusion to the above when looking at the base words. The dataset contains a lot of junk and also a pretty standard set of human-generated passwords – mostly weak and commonly used ones. This suggests it’s mostly an amalgamation of other data dumps, rather than a treasure trove of breached credentials for a hacker to get stuck into.
The telegram link leads to a telegram that sells stealer logs; so it’s highly likely this is one of the original sources of the data.
Base term | Occurrence count |
---|---|
password | 1,932,026 |
admin | 1,750,728 |
unknown or v | 1,673,231 |
qwerty | 1,111,319 |
spy_hunter | 652,888 |
daniel | 582,727 |
asdf | 507,033 |
t.me/+hftw5ayawto4ngji | 499,802 |
welcome | 469,636 |
gabriel | 465,572 |
Password lengths
While there are some longer records in play, it’s interesting to note that a password policy enforcing a length of over 15 characters would offer protection against >97% of the passwords in this dataset. Encouraging the use of long, easy-to-remember passphrases is a valuable measure for organizations to take.
Password length | Occurrence count and % |
---|---|
0 (username with no password) | 176,216,952 (18.99%) |
8 | 129,699,005 (13.98%) |
10 | 116,523,698 (12.56%) |
9 | 109,960,521 (11.85%) |
11 | 83,329,641 (8.98%) |
12 | 68,409,023 (7.37%) |
15 | 51,628,997 (5.56%) |
13 | 44,164,784 (4.76%) |
14 | 31,930,748 (3.44%) |
6 | 26,910,038 (2.9%) |
7 | 18,434,911 (1.99%) |
16 | 16,932,733 (1.83%) |
17 | 7,950,070 (0.86%) |
4 | 6,504,890 (0.7%) |
5 | 6,050,192 (0.65%) |
18 | 6,000,837 (0.65%) |
What’s the takeaway with ALIEN TXTBASE?
It’s clear from our analysis that there isn’t much new novel data in this dataset. It’s most likely a compilation of previous stealer logs, which have been collected and cleaned into a url:username:password format. One thing of interest we can note is the shift from dark web forums to Telegram for selling these kind of data dumps, thanks to its ease of access and anonymity.
Overall, this data dump is large but not dissimilar to others that our threat intelligence team turn up. For example, the ALIEN TXTBASE compilation has been compared to others such as COMB and C1-5. So why was it ‘marketed’ as a serious data leak? Remember, cybercriminals want to either gain notoriety or profit from selling this stuff – so they have a vested in making out they hold an enormous leak, even if in reality a lot of it is from old breaches.
For organizations, it serves as a reminder that hackers continue to recycle old compromised data. This highlights the risk of allowing end users to choose the weak or common passwords that always show up in these ‘leaks’. There’s a critical need for educating end users on the risks of password reuse and adding multi-factor authentication. Organizations should also enhance their threat intelligence capabilities to track emerging risks from alternative platforms like Telegram.
Protect your organization from password attacks
The use of either Specops Password Policy, or an equivalent password filter to enforce sound password policies, is the best defense against attacks with these types of datasets. Our Breached Password Protection feature continuously scans your Active Directory for breached and compromised passwords, notifying end users that they need to change their password immediately.
Specops works closely with the KrakenLabs Threat Intelligence team to harvest datasets such as ALIEN TXTBASE from Telegram and the dark web, then add them to our database of over four billion unique compromised credentials. On top of that, Specops Password Policy it continuously scans your Active Directory and alerts end users if they’re found to be using a breached password that’s been recently added to the database. Interested to see how it works? Try Specops Password Policy for free.
I can’t find anything specifically related to that password, but it would make sense; that tool is pretty sketchy tood to be using a breached password that’s been recently added to the database. Interested to see how it works? Try Specops Password Policy for free.
(Last updated on April 24, 2025)
Related Articles
-
Creating a custom password-exclusion dictionary with ChatGPT
When cybercriminals attempt to crack passwords, it makes sense to go for the lowest hanging fruit. They’re going to start by trying the most common, easy-to-guess passwords, as chances are some end users are bound to have chosen them. So it makes sense for organizations to use the same logic – block the weakest passwords…
Read More -
How we use Threat Intelligence to find new breached passwords
What makes a good breached password list? Numbers are a good start – the more breached passwords you can cross-reference against your Active Directory, the better. You want to maximize your chances of detecting end users who are using compromised passwords. However, quality matters too. Take Rockyou2024, a password list claimed by a poster on…
Read More -
HIBP adds 284M malware-stolen accounts: Takeaways on Telegram & infostealers
Leaked credentials are in high demand on underground marketplaces. A database of stolen credentials is a like a giant box of keys to a hacker. With the use of the right software, they can rapidly try these keys against user accounts in the hope that one fits and they gain unauthorized access to an organization….
Read More