Words Scraper : Selenium Based Web Scraper To Generate Passwords List

Selenium based web scraper to generate passwords list.

Installation

$Download Firefox webdriver from https://github.com/mozilla/geckodriver/releases
$ tar xzf geckodriver-v{VERSION-HERE}.tar.gz
$ sudo mv geckodriver /usr/local/bin # Make sure it is in your PATH
$ geckodriver –version # Make sure webdriver is properly installed
$ git clone https://github.com/dariusztytko/words-scraper
$ sudo pip3 install -r words-scraper/requirements.txt

Use Cases

Scraping words from the target’s pages

$ python3 words-scraper.py -o words.txt https://www.example.com https://blog.example.com

Such generated words list can be used to perform online brute-force attack or for cracking password hashes:

$ hashcat -m 0 hashes.txt words.txt

Use –depth option to scrape words from the linked pages as well. Optional –show-gui switch may be used to track the progress and make a quick view of the page:

$ python3 words-scraper.py -o words.txt –depth 1 –show-gui https://www.example.com

Generated words list can be expanded by using words-converter.py script. This script removes special chars and accents. An example Polish word źdźbło! will be transformed into the following words:

  • źdźbło!
  • zdzblo!
  • źdźbło
  • zdzblo

$ cat words.txt | python3 words-converter.py | sort -u > words2.txt

  • Scraping words from the target’s Twitter

Twitter page is dynamically loaded while scrolling. Use –max-scrolls option to scrape words:

$ python3 words-scraper.py -o words.txt –max-scrolls 300 –show-gui https://twitter.com/example.com

  • Scraping via Socks proxy

$ ssh -D 1080 -Nf {USER-HERE}@{IP-HERE} >/dev/null 2>&
$ python3 words-scraper.py -o words.txt –socks-proxy 127.0.0.1:1080 https://www.example.com

Usage

Usage: words-scraper.py [-h] [–depth DEPTH] [–max-scrolls MAX_SCROLLS]
[–min-word-length MIN_WORD_LENGTH]
[–page-load-delay PAGE_LOAD_DELAY]
[–page-scroll-delay PAGE_SCROLL_DELAY] [–show-gui]
[–socks-proxy SOCKS_PROXY] -o OUTPUT_FILE
url [url …]
Words scraper (version: 1.0)

Positional Arguments:
url URL to scrape

Optional Arguments:
-h, –help show this help message and exit
–depth DEPTH scraping depth, default: 0
–max-scrolls MAX_SCROLLS
maximum number of the page scrolls, default: 0
–min-word-length MIN_WORD_LENGTH
default: 3
–page-load-delay PAGE_LOAD_DELAY
page loading delay, default: 3.0
–page-scroll-delay PAGE_SCROLL_DELAY
page scrolling delay, default: 1.0
–show-gui show browser GUI
–socks-proxy SOCKS_PROXY
socks proxy e.g. 127.0.0.1:1080
-o OUTPUT_FILE, –output-file OUTPUT_FILE
save words to file

R K

Recent Posts

How to Check Website for Malware and Protect Your Site

Website malware is one of the biggest threats for website owners, bloggers, businesses, and WordPress…

7 hours ago

Install Python on Ubuntu 26.04 Like a Pro

If you want to Install Python on Ubuntu systems for development, automation, or scripting, Ubuntu…

9 hours ago

PostfixAdmin Setup on Ubuntu 26.04

Managing virtual mail users manually can quickly become difficult on a busy mail server. That’s…

13 hours ago

How to Add User to Sudoers on Ubuntu Easily

Managing administrative access properly is essential for every Linux system. When you Add User Sudoers…

16 hours ago

Install Google Chrome on Ubuntu in Minutes

Installing Google Chrome on Ubuntu systems is a simple process that gives users access to…

18 hours ago

LAMP Stack Ubuntu 26.04 Installation Guide

Setting up a LAMP Stack Ubuntu server is one of the fastest ways to host…

1 day ago