Thanks for HIBP and this downloader. At first I was considering using it, but the API of HIBP passwords is so easy that I wrote a small shell script for it.
It was slow as hell, because it had no parallelism at all. It was far too slow for my taste, thus I was thinking about adding parallelism. And that’s when I stumbled on curl’s URL globbing feature.
curl is the swiss army knife of HTTP downloading and it supports patterns/globbing and massive parallelism, and pretty much every aspect of HTTP downloads (proxies, HTTP1/2/3, all SSL/TLS versions, etc.).
Here’s a single curl commandline that downloads the entire HIBP password hash database into the current working directory:
curl -s --remote-name-all --parallel --parallel-max 150 "https://api.pwnedpasswords.com/range/{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}"
On an always-free Oracle Cloud VM this finished (for me) in 13.5 minutes. 😉
The URL globbing and the --remote-name-all
options have been around in curl for ages (i.e. for over a decade) and the --parallel*
options have been added in Sep 2019 (v7.66.0).
So pretty much all “recent” Linux distros already contain a curl version that fully support this commandline.
Curl is cross-platform, e.g. you can download a Windows version too.
If you don’t have the necessary 30-40 GB free space for the entire hash dump, you can get away with less by downloading in smaller batches and instantly compressing them.
Here’s a command that you can fire and forget (i.e. disconnect / log out) on any Linux PC/server and it’ll do the job in batches:
d="$(date "+%F")"
nohup bash -c 'd="'$d'"; chars=(0 1 2 3 4 5 6 7 8 9 A B C D E F); printf -v joined "%s," "${chars[@]}"; charscomma="${joined%,}"; hibpd="$(pwd)/hibp_${d}"; for c in "${chars[@]}"; do prefix="hibp_${d}_${c}" dir="${hibpd}/${prefix}"; mkdir -p "$dir"; cd "$dir"; date; echo "Starting in $dir with prefix $c ..."; curl -s --remote-name-all --parallel --parallel-max 150 -w "%{url}\n" "https://api.pwnedpasswords.com/range/${c}{$charscomma}{$charscomma}{$charscomma}{$charscomma}"; cd "$hibpd"; BZIP2=-9 tar cjf "${prefix}.tar.bz2" "$prefix" && rm -r "$prefix"; done; echo "Finished"; date' > "hibp_$d.log" 2>&1 &!
This doesn’t support suspend and resume of the download job (other HIBP downloaders do), but since it finishes pretty quickly (if you have a good enough internet connection), I don’t see any reason for this feature.
You can easily assemble the server responses into a single ~38 GB file with the following commandline (on Linux):
find . -type f -print | egrep -ia '/[0-9a-f]{5}$' | xargs -r -d '\n' awk -F: '{ sub(/\r$/,""); print substr(FILENAME, length(FILENAME)-4, 5) $1 ":" $2 }' > hibp_all.txt
You can sort it easily based on the second field to get the most “popular” hashes:
sort -t: -k2 -rn hibp_all.txt | head -n100
To get just the most popular hashes:
sort -t: -k2 -rn hibp_all.txt | head -n100 | cut -d: -f1
Pystinger is a Python-based tool that enables SOCKS4 proxying and port mapping through webshells. It…
Introduction When it comes to cybersecurity, speed and privacy are critical. Public vulnerability databases like…
Introduction When it comes to cybersecurity, speed and privacy are critical. Public vulnerability databases like…
If you are working with Linux or writing bash scripts, one of the most common…
What is a bash case statement? A bash case statement is a way to control…
Why Do We Check Files in Bash? When writing a Bash script, you often work…