Thanks for HIBP and this downloader. At first I was considering using it, but the API of HIBP passwords is so easy that I wrote a small shell script for it.
It was slow as hell, because it had no parallelism at all. It was far too slow for my taste, thus I was thinking about adding parallelism. And that’s when I stumbled on curl’s URL globbing feature.
curl is the swiss army knife of HTTP downloading and it supports patterns/globbing and massive parallelism, and pretty much every aspect of HTTP downloads (proxies, HTTP1/2/3, all SSL/TLS versions, etc.).
Here’s a single curl commandline that downloads the entire HIBP password hash database into the current working directory:
curl -s --remote-name-all --parallel --parallel-max 150 "https://api.pwnedpasswords.com/range/{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}"
On an always-free Oracle Cloud VM this finished (for me) in 13.5 minutes. 😉
The URL globbing and the --remote-name-all
options have been around in curl for ages (i.e. for over a decade) and the --parallel*
options have been added in Sep 2019 (v7.66.0).
So pretty much all “recent” Linux distros already contain a curl version that fully support this commandline.
Curl is cross-platform, e.g. you can download a Windows version too.
If you don’t have the necessary 30-40 GB free space for the entire hash dump, you can get away with less by downloading in smaller batches and instantly compressing them.
Here’s a command that you can fire and forget (i.e. disconnect / log out) on any Linux PC/server and it’ll do the job in batches:
d="$(date "+%F")"
nohup bash -c 'd="'$d'"; chars=(0 1 2 3 4 5 6 7 8 9 A B C D E F); printf -v joined "%s," "${chars[@]}"; charscomma="${joined%,}"; hibpd="$(pwd)/hibp_${d}"; for c in "${chars[@]}"; do prefix="hibp_${d}_${c}" dir="${hibpd}/${prefix}"; mkdir -p "$dir"; cd "$dir"; date; echo "Starting in $dir with prefix $c ..."; curl -s --remote-name-all --parallel --parallel-max 150 -w "%{url}\n" "https://api.pwnedpasswords.com/range/${c}{$charscomma}{$charscomma}{$charscomma}{$charscomma}"; cd "$hibpd"; BZIP2=-9 tar cjf "${prefix}.tar.bz2" "$prefix" && rm -r "$prefix"; done; echo "Finished"; date' > "hibp_$d.log" 2>&1 &!
This doesn’t support suspend and resume of the download job (other HIBP downloaders do), but since it finishes pretty quickly (if you have a good enough internet connection), I don’t see any reason for this feature.
You can easily assemble the server responses into a single ~38 GB file with the following commandline (on Linux):
find . -type f -print | egrep -ia '/[0-9a-f]{5}$' | xargs -r -d '\n' awk -F: '{ sub(/\r$/,""); print substr(FILENAME, length(FILENAME)-4, 5) $1 ":" $2 }' > hibp_all.txt
You can sort it easily based on the second field to get the most “popular” hashes:
sort -t: -k2 -rn hibp_all.txt | head -n100
To get just the most popular hashes:
sort -t: -k2 -rn hibp_all.txt | head -n100 | cut -d: -f1
The cp command, short for "copy," is the main Linux utility for duplicating files and directories. Whether…
Introduction In digital investigations, images often hold more information than meets the eye. With the…
The cat command short for concatenate, It is a fast and versatile tool for viewing and merging…
What is a Port? A port in networking acts like a gateway that directs data…
The ls command is fundamental for anyone working with Linux. It’s used to display the files and…
The pwd (Print Working Directory) command is essential for navigating the Linux filesystem. It instantly shows your…