In Oct 2024, a hacker breached Internet Archive website (a non-profit digital library) and updated the site with below JavaScript alert message:
"Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach? It just happened. See 31 million of you…"
The hacker stole a 6.5 GB SQL file that contained email IDs, hashed passwords, screen names of 31 million users!
So, how did this happen?
Attack Flow: (as per threat actor’s claims)
Attacker gathers details of all domains and sub domains owned by Internet Archive (IA) (there are many automated tools that do this in a click).
Now starts scanning them for sensitive files > Identifies an exposed GitLab config file on one of the dev servers.
This GitLab config file contained a sensitive auth token that had access to IA’s code base > Using this token, attacker downloads entire Internet Archive source code.
Attacker now scans this source code further > Finds more tokens and credentials! > One such credential was a database connection string.
Attacker now accesses database > Downloads the IA user database > Gets access to emails, password hashes and more.
Updates the database to show up an alert message when users visit the site (defacement).
Key Insights:
Subdomains are like different rooms in the same house, each with its own specific door and purpose. Leaving even one door open can expose sensitive data across the entire domain.
Know your territory better than your enemy. As an outsider, a hacker needs to "identify" your sub domains. As an owner, you already "know" (or should know) what are your subdomains, what's hosted on them, their security configuration etc. Leverage this advantage that you have.
Two weeks post the initial hack, the hacker responded to support tickets from a IA's support email account mocking the entire response process. This is because the stolen API keys were not rotated.
The speed of your response depends on the depth of your preparation. In security, preparation means everything—having trained IR team, having right detections, ability to rapidly identify live secrets and rotate them etc. The more prepared you are, the faster you can control the situation and limit the damage.
Attackers don’t care if you’re a for-profit or a non-profit; if you have a digital presence and a large following, you’re are going to be a target. Every organization with digital assets needs to understand this, regardless of their mission.
The leaked passwords are bcrypt hashed. Bcrypt is specifically designed to make password cracking difficult by incorporating "work factor" that determines how long it takes to compute the hash. This cost factor is adjustable, allowing administrators to increase the hashing time as computing power improves, making it difficult for attackers to brute-force passwords. Bcrypt applies multiple rounds of hashing, essentially stretching the password into a more complex format. This makes it resistant to brute-force attacks because even short passwords require considerable computational power to hash.