Words Scraper : Selenium Based Web Scraper To Generate Passwords List

Selenium based web scraper to generate passwords list.

Installation

$Download Firefox webdriver from https://github.com/mozilla/geckodriver/releases
$ tar xzf geckodriver-v{VERSION-HERE}.tar.gz
$ sudo mv geckodriver /usr/local/bin # Make sure it is in your PATH
$ geckodriver –version # Make sure webdriver is properly installed
$ git clone https://github.com/dariusztytko/words-scraper
$ sudo pip3 install -r words-scraper/requirements.txt

Use Cases

Scraping words from the target’s pages

$ python3 words-scraper.py -o words.txt https://www.example.com https://blog.example.com

Such generated words list can be used to perform online brute-force attack or for cracking password hashes:

$ hashcat -m 0 hashes.txt words.txt

Use –depth option to scrape words from the linked pages as well. Optional –show-gui switch may be used to track the progress and make a quick view of the page:

$ python3 words-scraper.py -o words.txt –depth 1 –show-gui https://www.example.com

Generated words list can be expanded by using words-converter.py script. This script removes special chars and accents. An example Polish word źdźbło! will be transformed into the following words:

  • źdźbło!
  • zdzblo!
  • źdźbło
  • zdzblo

$ cat words.txt | python3 words-converter.py | sort -u > words2.txt

  • Scraping words from the target’s Twitter

Twitter page is dynamically loaded while scrolling. Use –max-scrolls option to scrape words:

$ python3 words-scraper.py -o words.txt –max-scrolls 300 –show-gui https://twitter.com/example.com

  • Scraping via Socks proxy

$ ssh -D 1080 -Nf {USER-HERE}@{IP-HERE} >/dev/null 2>&
$ python3 words-scraper.py -o words.txt –socks-proxy 127.0.0.1:1080 https://www.example.com

Usage

Usage: words-scraper.py [-h] [–depth DEPTH] [–max-scrolls MAX_SCROLLS]
[–min-word-length MIN_WORD_LENGTH]
[–page-load-delay PAGE_LOAD_DELAY]
[–page-scroll-delay PAGE_SCROLL_DELAY] [–show-gui]
[–socks-proxy SOCKS_PROXY] -o OUTPUT_FILE
url [url …]
Words scraper (version: 1.0)

Positional Arguments:
url URL to scrape

Optional Arguments:
-h, –help show this help message and exit
–depth DEPTH scraping depth, default: 0
–max-scrolls MAX_SCROLLS
maximum number of the page scrolls, default: 0
–min-word-length MIN_WORD_LENGTH
default: 3
–page-load-delay PAGE_LOAD_DELAY
page loading delay, default: 3.0
–page-scroll-delay PAGE_SCROLL_DELAY
page scrolling delay, default: 1.0
–show-gui show browser GUI
–socks-proxy SOCKS_PROXY
socks proxy e.g. 127.0.0.1:1080
-o OUTPUT_FILE, –output-file OUTPUT_FILE
save words to file

R K

Recent Posts

Best OSINT Tools for Journalists 2026: Verify Sources, Images and Claims

Journalists use OSINT to verify public information before publishing. In 2026, misinformation, AI-generated images, fake…

2 hours ago

Install Docker on Ubuntu 20.04: Complete Step-by-Step Guide

Docker is an open-source platform that lets you package and run applications inside containers. Each container…

12 hours ago

Install PostgreSQL on Ubuntu: Database Setup and Admin Guide

PostgreSQL (often called Postgres) is an open-source relational database system. It supports advanced features like JSON…

13 hours ago

Install Xrdp Remote Desktop on Ubuntu: Setup and Connect

Xrdp is an open-source server that lets you connect to your Ubuntu machine from another computer…

13 hours ago

Tomcat 9 on Ubuntu 20.04: Install, Configure, and Start

Apache Tomcat is an open-source web server and Java servlet container. It is one of the…

13 hours ago

Automatic Updates on Ubuntu: Set Up unattended-upgrades

Keeping your Ubuntu system updated is one of the best ways to protect it. Security…

15 hours ago