Free Active Directory Auditing Tool

Home / Blog / News and Research / ALIEN TXTBASE data-dump analysis: Dangerous or junk?

ALIEN TXTBASE data-dump analysis: Dangerous or junk?

Marcus White

Last updated on May 14, 2025

Specops researchers have been digging into the ALIEN TXTBASE data-dump, which was recently merged into the HaveIBeenPwned (HIBP) dataset by Troy Hunt. After some analysis of the over 200 million passwords in this dataset, we estimate about 20 million are new to the Specops Breached Password Protection database – so we’ve added those in to keep customers protected. The majority appear to have already been covered by our threat intelligence sources, honeypot systems, and the removal of duplicates.

How did we get hold of ALIEN TXTBASE?

Back in January 2025, a government agency reached out to Troy Hunt (HIBP) regarding some large (4gb) files being offered for sale on Telegram. This data was provided to Mr. Hunt and he started work on processing it for inclusion into HIBP (apparently talking about a month). During this time, a third party had begun offering the dataset via various third party sharing sites via posts on Breached Forums. This is where we (Specops) acquired the ALIEN TXTBASE dataset from, and we proceeded to analyze it and process it for inclusion in our own breached password database.

*Breached forums post advertising ALIEN TXTBASE*

*Telegram channel referred to in the Breached Forum post above*

Analysis of the ALIEN TXTBASE data dump

As with the Rockyou2024 data dump, our researchers found it wasn’t quite the mega-leak it was initially hyped as. The dump contained a pretty standard distribution of base words, passwords, and lengths – essentially a lot of peoples’ local password stores. There was a non-zero amount of junk, telegram URLs, and other stuff mashed in there too. It’s clear this is someone collecting and processing a lot of stealer logs into one.

Due to the size of the dataset, it wasn’t possible to perform an analysis on the entire dump as a whole. As such, a random sample of 92,7790,080 records was taken (approximately 30% of the total), and an analysis was performed against this subset.

This dataset also had an initial first pass by the user sharing it reducing it from a standard stealer log format (json of the URL, username, password, and other data) into two of the following formats:

url:username:password
url|username|password

And then some records had broken formats, or otherwise extra data appended. Not all records were complete, or were whole clean sets of credentials.

Domain occurrences

The dataset is varied due to the nature of stealer logs (malware pulling records stored in local password stores, such as in a Chrome browser). We can see it does seem to trend towards social media and consumer mail accounts, rather than internal corporate accounts. However, this doesn’t mean corporate accounts aren’t represented at all. You can see a few selected domains below that highlight this difference.

Domain	Occurrence count
x.com (formerly Twitter)	15,727,771
microsoft.com	993,554
irs.gov	108,944

Password occurrences

Backing up similar findings to our 2025 Breached Password Report on malware-stolen credentials, the dataset contains millions of occurrences of common weak passwords such as 123456, admin, and password. Outside of the junk data (highlighted in red in the below table), it’s clear this dataset is stealer logs cleaned up from their raw JSON into url:username:password with no concern for whether the data is good or not. You’ll also note the telegram chat URL being appended. This is common with a lot of stealer logs; the ratio of quality clean data that can be immediately used in an attack is often low.

It typically requires some amount of careful parsing for an attacker to get down to data that can be used in an attack. If they were searching for a specific domain for a specific attack, it could be more beneficial. But if an attacker wanted to just try a large number of domains, there’s some work involved to process the dump into a clean set of data they could use for an attack.

The prevalence of ‘Spy Hunter’ is interesting and possibly related to this anti-malware tool – although it’s hard to say for certain.

Password	Occurrence count
123456	1,830,528
[UNKNOWN or V70]	1,657,685
admin	1,072,449
12345678	796,449
password	602,910
123456789	602,315
//t.me/+hfTW5AYawTo4NG4Ji	498,183
1234	428,201
Spy_Hunter4	358,011
[UNKNOWNorV70]	347,099

Base words

We can reach a similar conclusion to the above when looking at the base words. The dataset contains a lot of junk and also a pretty standard set of human-generated passwords – mostly weak and commonly used ones. This suggests it’s mostly an amalgamation of other data dumps, rather than a treasure trove of breached credentials for a hacker to get stuck into.

The telegram link leads to a telegram that sells stealer logs; so it’s highly likely this is one of the original sources of the data.

Base term	Occurrence count
password	1,932,026
admin	1,750,728
unknown or v	1,673,231
qwerty	1,111,319
spy_hunter	652,888
daniel	582,727
asdf	507,033
t.me/+hftw5ayawto4ngji	499,802
welcome	469,636
gabriel	465,572

Password lengths

While there are some longer records in play, it’s interesting to note that a password policy enforcing a length of over 15 characters would offer protection against >97% of the passwords in this dataset. Encouraging the use of long, easy-to-remember passphrases is a valuable measure for organizations to take.

Password length	Occurrence count and %
0 (username with no password)	176,216,952 (18.99%)
8	129,699,005 (13.98%)
10	116,523,698 (12.56%)
9	109,960,521 (11.85%)
11	83,329,641 (8.98%)
12	68,409,023 (7.37%)
15	51,628,997 (5.56%)
13	44,164,784 (4.76%)
14	31,930,748 (3.44%)
6	26,910,038 (2.9%)
7	18,434,911 (1.99%)
16	16,932,733 (1.83%)
17	7,950,070 (0.86%)
4	6,504,890 (0.7%)
5	6,050,192 (0.65%)
18	6,000,837 (0.65%)

What’s the takeaway with ALIEN TXTBASE?

It’s clear from our analysis that there isn’t much new novel data in this dataset. It’s most likely a compilation of previous stealer logs, which have been collected and cleaned into a url:username:password format. One thing of interest we can note is the shift from dark web forums to Telegram for selling these kind of data dumps, thanks to its ease of access and anonymity.

Overall, this data dump is large but not dissimilar to others that our threat intelligence team turn up. For example, the ALIEN TXTBASE compilation has been compared to others such as COMB and C1-5. So why was it ‘marketed’ as a serious data leak? Remember, cybercriminals want to either gain notoriety or profit from selling this stuff – so they have a vested in making out they hold an enormous leak, even if in reality a lot of it is from old breaches.

For organizations, it serves as a reminder that hackers continue to recycle old compromised data. This highlights the risk of allowing end users to choose the weak or common passwords that always show up in these ‘leaks’. There’s a critical need for educating end users on the risks of password reuse and adding multi-factor authentication. Organizations should also enhance their threat intelligence capabilities to track emerging risks from alternative platforms like Telegram.

Protect your organization from password attacks

The use of either Specops Password Policy, or an equivalent password filter to enforce sound password policies, is the best defense against attacks with these types of datasets. Our Breached Password Protection feature continuously scans your Active Directory for breached and compromised passwords, notifying end users that they need to change their password immediately.

Specops works closely with the KrakenLabs Threat Intelligence team to harvest datasets such as ALIEN TXTBASE from Telegram and the dark web, then add them to our database of over four billion unique compromised credentials. On top of that, Specops Password Policy it continuously scans your Active Directory and alerts end users if they’re found to be using a breached password that’s been recently added to the database.

Interested to see how it works? Try Specops Password Policy for free.

Last updated on May 14, 2025

Written by

Marcus White

Marcus is a cybersecurity product specialist based in the UK, with 8+ years experience in the tech and cyber sectors. He writes about authentication, identity and access management, and compliance.

Back to Blog

Table of Contents

Free Active Directory Auditing Tool

ALIEN TXTBASE data-dump analysis: Dangerous or junk?

Table of Contents

Marcus White

How did we get hold of ALIEN TXTBASE?

Analysis of the ALIEN TXTBASE data dump

Domain occurrences

Password occurrences

Base words

Password lengths

What’s the takeaway with ALIEN TXTBASE?

Protect your organization from password attacks

Marcus White

Free Active Directory Auditing Tool!

Table of Contents

Free Active Directory Auditing Tool

ALIEN TXTBASE data-dump analysis: Dangerous or junk?

Table of Contents

Marcus White

How did we get hold of ALIEN TXTBASE?

Analysis of the ALIEN TXTBASE data dump

Domain occurrences

Password occurrences

Base words

Password lengths

What’s the takeaway with ALIEN TXTBASE?

Protect your organization from password attacks

Marcus White

Related Articles

Creating a custom password-exclusion dictionary with ChatGPT

How we use Threat Intelligence to find new breached passwords

HIBP adds 284M malware-stolen accounts: Takeaways on Telegram & infostealers

Free Active Directory Auditing Tool!