Skip to content

Scraping the darkweb’s .onions – interview with Doctor Chaos

  • by

This time I have a fascinating interview with a fellow infosec community member, a pentester and the creator of osint.party, a research project to keep track of websites on the Tor network – Doctor Chaos.

If there is any law enforcement, other government agency or even corporations out there that wants to know more about my research, work and other cool things that I've found - please reach out.
doctor chaos osint party
Doctor Chaos
creator of osint.party

Standard beginning – who are you, what’s your background, what’s your experience and motivation?

I’m Doctor Chaos. I became a penetration tester by accident about 10 years ago and I’ve been doing it professionally ever since. I recently decided to do more OSINT work in my free time to help out the community and spend some time building software that helps others catch bad people.

What is OSINT.PARTY?

OSINT.PARTY is my current side project, it slowly evolved over time from a small crawler that finds new and interesting onions to a complete metadata collection project. I eventually decided to publish my work and share it with others to see if anyone else could use the data to find interesting things on Tor.

So what’s the history behind this project?

OSINT.PARTY initially started early / mid 2020 as a hobby project of mine to track things around the Tor network. At some point I started sharing various screenshots and bits of data on OSINT / Threat Intel related discords and people seemed very interested so early 2021 I decided to launch the project publicly.

The project currently consists of a simple crawler that checks every single onion address in the database for up/downtime and it extracts some bits of meaningful metadata like HTTP headers, specific HTML tags, and other content like email addresses, BTC addresses and it tries to automatically de-anonimize the website if possible.

At this time, the database has roughly 40 million metadata records.

osint party

How can one use your platform / service / whatever you refer to it as? Is there a cost? What are the benefits?

The platform is currently quite bare. There are a few API endpoints to retrieve basic metadata about a onion address and there are a few endpoints to run search queries for BTC addresses, email addresses and SSH keys. I’ve also recently added support for direct integration with Maltego so users can directly use my tools in their existing workflow.

Outside of just crawling OSINT.PARTY also exposes both a HTTP and a Maltego API endpoint that lets researchers and interested people query on the data in my dataset. It’s a bit rough but I’m planning to eventually work on a GUI and turn this into a Shodan-like product.

There are also a few other features in the pipeline such as full text & title search once I figure out how to deal with CSAM and work around the rules & regulations.

The project is always evolving and I’m always open to suggestions and new ideas from users. Give it a try and see if there is something you are missing : – )

What is your methodology for researching the darkweb entities and why do it in the first place?

I collect as much metadata as I can and start correlating data with data that I find elsewhere. I mainly focus on de-anonimizing servers and websites because that usually leads to arrests 🙂

As to why – it’s because I enjoy chasing malicious actors. Finding out where their servers are hosted and figuring out how terrible their OPSEC is only to then share it with law enforcement and seeing the website go offline a few weeks or months later. I hope to one day be able to turn it into a job and pwn markets all day 🙂

What learning resources would you recommend to people who are interested in darkweb investigations?

It really depends on what you are after, if you just want to de-anonimize servers familiarize yourself with the basics of how web applications function, sometimes a application might have to call to external resources to retrieve a image or a URl that you provide. There are a few good blog posts out there that explain these basic principles – like this.

For myself – I always look at investigations from a metadata perspective. I try to collect everything out there and just put it into a big Maltego project.

What is the most outrageous / strangest thing you encountered on the darkweb?

A Russian carding operation that had failed to properly protect their admin interface allowing a malicious actor to drain their hot Monero and BTC wallets.

I’ve recently started running more invasive de-anonimization scans that involve looking for SVN, GIT and HG directories that are served on the web. Via this I managed to de-anonimize a specific carding shop and extract their full source code, the names of the authors and the location of their build server.

I’ve passed this information on to some friends in law enforcement and started browsing around the code.

At some point I figured out that they had a few unprotected endpoints that allow direct withdrawals of the BTC & XMR in their hot wallets. I’ve never done anything with it but I suspect others might have as I’ve seen the admins hastily change or remove the endpoints.

How much of the darkweb (in your estimate) is illegal content vs privacy or free speech enabling stuff?

I’d say a large part of the darkweb is just spam, trash and other illegal content. There is very little actual good stuff out there. My tracker has ~100,000 onions and most of those can be classified as phishing sites, fake sites and other malicious contents.

Do you know any positive or funny darkweb stories? As in, not related to illicit activity?
 
Hmm. One of the more fun Tor stories that I know about is how a college student got nabbed doing a bomb threat FBI agents tracked Harvard bomb threats despite Tor – The Verge.
 
It basically boils down to “if you are the only one using Tor and you do something with Tor you are going to get nabbed : – )”

Can you share some darkweb opsec tips / privacy setup methods for those embarking on darkweb research?

Use separate identities. Do everything inside a VM or dedicated computer and take some time off. There is a lot of bad shit out there and sometimes you just need to take some time off a investigation to give your head a break.

What is your daily driver operating system? What do you like seeing in an investigator’s OS build?
 
My “daily driver” for OSINT related stuff is a old Lenovo X230 running Qubes OS with Heads.
 
I’m very paranoid about the physical security of my machines so I’ve gone above and beyond to build a reasonibly secure workstation that I feel safe leaving unattended at times.
 
My laptop is covered in tamper evident material, the BIOS & TPM are covered in a big blob of epoxy, /boot is signed with a PGP key, I’ve got HOTP and TOTP going and heads pulls various measurements and only releases the FDE key when the measurements + a password match. It’s a system that I feel safe leaving somewhere unattended 🙂
 
Outside of that I also use a boring Windows 10 workstation because sometimes a gamer needs to play some games.
 
If you were to name 3 favourite privacy-enabling online services – and what makes them your favourite?
 
  • Signal – Tried and tested secure messenger that just works.
  • Protonmail – Hosting my own mail always gets quite messy so I’m just defaulting to Proton despite some odd privacy issues.
  • Tor – It stinks. But there is nothing better out there.

Whatever else you think is important / want to mention or talk about?

Yes! If there is any law enforcement, other government agency or even corporations out there that wants to know more about my research, work and other cool things that I’ve found – please reach out. I’d love to share my knowledge with others. The dataset that I have is massive and I love to collaborate with others to knock out bad actors!

doc@chaos.institute

NOTE: If you work in LE and investigate cybercrime, you absolutely should reach out to Doctor Chaos – just please remember to use your official LE work email address to contact him.

Leave a Reply

Your email address will not be published. Required fields are marked *