Have You Been Pawned? Prevent the Use of Leaked Passwords with New Security Feature
While we're waiting for the v4.6 LTS release of Ibexa DXP, let's look into a useful feature that was introduced in v4.5 in May.
Passwords are leaked all the time. Sometimes attackers are going after specific users, but most often a vulnerability in some software or service is exploited to grab huge sets of user data, sometimes including password hashes. In poorly designed systems even clear-text passwords could be leaked. Such "dumps" are sold for profit and bought by those who wish to make further profit from the users' data, for instance by ransom attacks, phishing, or other crimes. These exploits are made easier by the fact that users often use the same password on several different services. So, what can we do to protect our users?
User's data are dumped in the hundreds of millions. Service owners must of course inform their own users that a breach has happened. The EU's GDPR regulation imposes big fines on companies who fail to do this. But even when users are informed, will they do the right thing and change their passwords, on all services where they have used that same password? Not always, and that sets them up for further attacks.
White hats to the rescue
The good thing about these breaches is that eventually, they end up in public forums where good guys also get hold of the data. One of these good guys is Troy Hunt from Microsoft, who set up the service https://haveibeenpwned.com which makes leaked usernames and password hashes available to anyone, searchable, for free. How is that a good thing, you may ask? Well, for one, you can search using your own email address, to find out if your data has been leaked, and from where. That's certainly useful for those who know this can be done, but the API takes it to the next level, helping secure even those who know nothing about password security.
Using the API, we as website owners can check if the password has been exposed whenever a user creates a new account on our site or changes their password. We can't check existing passwords since we don't have them, we only have the hashes. We could check the password every time users log in, but that would amount to a lot of unnecessary API requests for the same passwords, repeatedly. We owe it to Troy Hunt to use his service efficiently.
Checking passwords in Ibexa DXP
Symfony has implemented the haveibeenpwned API as the NotCompromisedPassword constraint, which can be used like any other constraint in a validator. We have made use of this constraint as a configurable password validation rule in Ibexa DXP. Since v4.5, it's ready for you to use.
To avoid surprises when you upgrade, this new password rule is not enabled by default. But it's easy to start using it: In the DXP backend, edit the User content type. Expand the details for the User Account field definition. Check the box for "Password must not be contained in a public breach" and store the content type. That's it!
While you're here, you may want to consider the other password rule settings. I wrote a little about them before, see the section on "Login and user information" in the Security Checklist blog post.
But is it safe to send passwords to some external API?
No! So, it's a good thing we're not doing that. Let's take a moment to appreciate the clever way it works. On the face of it, you'd think there are only two ways to perform this check: Either we could send the password (or a hash of it) to the API, which checks if the same exists in its database, and returns a yes/no answer. As much as Troy Hunt seems to be a decent fellow, we shouldn't trust him and his service that much.
Or we could download the entire database from his servers, install it at our server, and search through it every time we want to check a password. This would be slow and take up a lot of space, and of course we'd have to update it frequently. We are in fact allowed to download it, but we wouldn't want to. So, what alternative could possibly exist?
The solution: k-anonymity
The clever solution is a middle ground between the two options above. We make a hash of the password we're about to check, using the same algorithm as the API uses. We then take the first five characters of the hash (a small fraction) and use it as input in a range query to the API. The API searches through its database and returns a list of all the hashes that begin with the same five characters, about 800 to 1,000 entries.
We then search through this short list locally, which takes no time at all. Either our password hash is in that list, which means the password must be rejected as invalid and the user be told that it has been leaked. Or it's not on the list, and all is well. Importantly, the API never learns the outcome. It knows only that our password is either in the returned list of 1,000 entries (in which case we'll discard it), or not in the database at all. And it knows the first five characters in the hash, which given the length and properties of the hash is not useful for any nefarious purpose.
This algorithm is derived from the mathematical concept of k-anonymity. It is used for instance in medical research on anonymized patient records. The data set has k-anonymity if, for every record in it, there are k - 1 other identical records. In our case, about 800 to 1,000 entries, and only we will know if our password is one of them. For more details, please check out this excellent post on the Cloudflare Blog: Validating Leaked Passwords with k-Anonymity, by Junade Ali.
In conclusion, we hope that you will enable this feature. You have nothing to lose, and your users will benefit greatly.