Résumé
Yahoo suffered the largest data breach ever recorded: an August 2013 intrusion that, after later revisions, was found to have exposed all three billion of its user accounts, plus a separate state-sponsored 2014 breach of about 500 million accounts. Stolen data included names, emails, phone numbers, dates of birth, security questions and answers, and, in the larger 2013 breach, passwords hashed with the weak, fast MD5 algorithm, which made them practical to crack. In the 2014 breach the attackers also stole Yahoo's account-management tooling and forged authentication cookies to log into accounts with no password at all. Yahoo knew of the breaches but did not disclose them until 2016, during Verizon's acquisition, which cut the purchase price by $350 million and earned the first-ever SEC fine for failing to disclose a breach. It is the lesson in strong password hashing, session-cookie integrity, MFA, and timely, honest disclosure.
How it happened
There were at least two distinct breaches, and Yahoo got both halves wrong.
The 2013 breach is the record-setter. Attackers made off with Yahoo's entire user database, which after a 2017 revision was confirmed to cover all three billion accounts that existed at the time. The passwords were "protected" with MD5, a hashing algorithm so fast and outdated that cracking large batches of it is routine; many were effectively recoverable. Worse, the security questions and answers, the things people reuse to recover other accounts, were stored partly in the clear.
The 2014 breach, attributed to Russian state actors, was smaller (about 500 million accounts) but technically nastier. The attackers stole the proprietary code and secret keys behind Yahoo's account-management system, then used them to forge authentication cookies. A forged cookie let them open a targeted account directly, with no password and no login prompt; while the stolen database covered all 500 million accounts, prosecutors say the cookie-forging was actually used to break into about 6,500 specifically targeted accounts, journalists, government officials, and company staff. (The 2014 passwords were mostly protected with the stronger bcrypt, unlike the 2013 MD5 set.) US prosecutors later indicted two officers of Russia's FSB and two criminal hackers for it.
The damage
Three billion accounts is, to date, the largest breach in history, and the stolen data, names, phone numbers, birth dates, security answers, and crackable password hashes, is exactly the raw material that fuels years of downstream attacks. Those billions of recovered passwords feed credential-stuffing campaigns against every other site, because so many people reuse them. The handling made it worse: Yahoo had known about the breaches well before it told anyone, disclosing only in 2016 as Verizon was buying the company. Verizon knocked $350 million off the price. The US Securities and Exchange Commission later fined the successor company $35 million, its first enforcement action for failing to disclose a breach to investors, on top of a roughly $117.5 million class-action settlement. Of the four men indicted, only the Canadian hacker Karim Baratov was ever prosecuted, sentenced to five years in a US prison; the two FSB officers remain in Russia, beyond reach.
Why Yahoo still matters
Yahoo is the standing argument for four basics. First, store passwords properly: never MD5 or any fast, unsalted hash, but a slow, salted, memory-hard one like Argon2, bcrypt, or scrypt, so a stolen database is not an open one. Second, protect the integrity of session tokens: if attackers can steal the keys that mint your authentication cookies, they walk in without ever needing a password, so those keys are crown jewels. Third, offer and push MFA, so a cracked or stuffed password is not enough on its own. And fourth, disclose breaches promptly and honestly: Yahoo's concealment compounded the damage, cratered its sale price, and set the legal precedent that hiding a breach is itself a punishable failure.
Comment le corriger
- Invalidate all sessions and forged cookies, and rotate the signing secrets used to mint authentication tokens.
- Force password resets and migrate every stored password to a slow, salted hash (bcrypt, scrypt, or Argon2).
- Reset security questions and push users onto MFA.
- Disclose promptly and accurately to users and regulators; concealment compounds the damage.
Comment l’éviter
- Hash passwords with a slow, salted, memory-hard algorithm (Argon2, bcrypt, or scrypt), never MD5 or unsalted hashes.
- Sign and validate session cookies and tokens with protected secrets so they cannot be forged, and bind them to context.
- Offer and encourage MFA so a stolen or cracked password is not enough on its own.
- Protect account-management and admin tooling as crown jewels; its theft is what enabled the cookie forgery.
- Detect and disclose breaches quickly; have an incident-response and notification plan ready before you need it.
Références
- https://en.wikipedia.org/wiki/Yahoo_data_breaches
- https://www.justice.gov/archives/opa/pr/us-charges-russian-fsb-officers-and-their-criminal-conspirators-hacking-yahoo-and-millions
- https://www.fenwick.com/insights/publications/yahoos-35m-sec-settlement-takeaways-from-the-first-enforcement-action-for-failure-to-disclose-a-data-breach
- https://techcrunch.com/2018/05/29/yahoo-hacker-sentenced-karim-baratov-fsb/
- https://www.npr.org/sections/thetwo-way/2017/10/03/555016024/every-yahoo-account-that-existed-in-mid-2013-was-likely-hacked
Vulnérabilités liées
Tout OpSec →- HIGHOPSEC-INTERNET-ARCHIVE-2024
The Internet Archive, the nonprofit behind the Wayback Machine, had a brutal October 2024: a data breach, a website defacement, and a wave of DDoS attacks, all at once. Underneath the chaos was an unglamorous root cause. An authentication token sat in plain text in a public config file; the team rotated it repeatedly, but each new token landed right back in the same exposed file, so the leak never actually closed. With it, an attacker downloaded the source code, found more credentials hardcoded inside, and walked out with a database of 31 million users. Weeks later a second token from that same stolen code, for the support system, exposed 800,000 support tickets, some with people's ID documents. It is the lesson that rotating a secret is useless if it goes straight back into a public file, and that one leak unravels everything.
- HIGHOPSEC-MERCEDES-BENZ-2024
Publicly disclosed January 30, 2024, a Mercedes-Benz employee accidentally committed a GitHub authentication token to a public repository, leaving it exposed from September 29, 2023. RedHunt Labs found the token during an internet-wide scan; it granted unrestricted, unmonitored access to Mercedes-Benz's internal GitHub Enterprise Server, allowing anyone to download private source-code repositories that could contain API keys, cloud access keys, database connection strings, blueprints, and SSO passwords. After notification, the token was revoked on January 24, 2024. Mercedes-Benz stated customer data was not affected but could not confirm whether anyone besides the researchers accessed the repositories during the exposure window.
- CRITICALOPSEC-MIDNIGHT-BLIZZARD-2024
In January 2024, Microsoft revealed that Russia's foreign-intelligence service, the same APT29 behind SolarWinds, had been reading the email of its senior leadership. The way in was almost insulting in its simplicity: a forgotten, non-production test account with a weak password and no MFA. The attackers guessed the password by spraying common ones across many accounts, then pivoted through a forgotten over-privileged application to grant themselves access to corporate mailboxes, including those of executives and the security and legal teams. It is the lesson that your security is only as strong as the account you forgot about, and that even Microsoft's perimeter fell to a missing MFA checkbox.
- HIGHOPSEC-OKTA-2023
Okta is an identity provider: the single front door thousands of companies use to log their employees into everything. So when Okta's customer-support system was breached in late 2023, the blast radius was a who's-who of security-conscious companies. The entry point was almost mundane. An employee had signed into their personal Google account on an Okta laptop and saved a corporate service-account password into it; the attacker got that password and walked into Okta's support system. There they downloaded diagnostic files that customers had uploaded, some of which contained live session tokens, and used those tokens to step directly into the customers' own Okta environments. It is the lesson that session tokens are as good as passwords, support systems are production systems, and a personal browser profile can be the crack in the wall.
- CRITICALOPSEC-23ANDME-2023
23andMe held the most personal data there is: people's DNA. In 2023 attackers got into more than 18,000 accounts and, through a single social feature, turned that into the genetic and ancestry data of roughly 6.9 million people. The break-in required no flaw in 23andMe at all. Attackers simply took username-and-password pairs leaked from other companies' breaches and tried them, betting, correctly, that people reuse passwords. The accounts had no MFA, and 23andMe did not notice the five-month wave of automated logins. From those footholds, the attackers scraped relatives' data through an opt-in feature, and the fallout, fines, a $50 million settlement, and ultimately bankruptcy and a fire-sale of the DNA database itself, shows that a breach can be fatal even when your own systems were never hacked.
- HIGHOPSEC-MICROSOFT-SAS-2023
Microsoft's AI research team shared open-source training data via an Azure Storage Shared Access Signature (SAS) token committed to a public GitHub repo around July 2020. The token was misconfigured to scope access to the entire storage account with full-control permissions instead of the intended read-only bucket, so anyone with the link could view, delete, and overwrite files. Wiz researchers discovered it in June 2023, finding 38 terabytes of exposed internal data including two employees' workstation disk backups with secrets, private keys, passwords, and over 30,000 internal Teams messages. Writable pickle-format models created a model-poisoning supply-chain risk; Microsoft revoked the token and reported no customer data was exposed.