DorXNG is a modern solution for harvesting OSINT
data using advanced search engine operators through multiple upstream search providers.
On the backend, it leverages a purpose-built containerized image of SearXNG, a self-hosted, hackable, privacy-focused meta-search engine.
Our SearXNG implementation routes all search queries over the Tor network while refreshing circuits every ten seconds with Tor’s MaxCircuitDirtiness
configuration directive.
Researchers have also disabled all of SearXNG’s client-side timeout features.
These settings allow for the evasion of search engine restrictions commonly encountered while issuing many repeated search queries.
The DorXNG client application is written in Python 3 and interacts with the SearXNG API to issue search queries concurrently.
It can even issue requests across multiple SearXNG instances. The resulting search results are stored in a SQLite3
database.
Researchers have enabled every supported upstream search engine that allows advanced search operator queries:
Google
DuckDuckGo
Qwant
Bing
Brave
Startpage
Yahoo
For more information about what search engines SearXNG supports, See: Configured Engines
Please DO NOT use the DorXNG client application against any public SearXNG instances.
Buy Us A Beer!
Shout Outs
Shout out to the developers of Tor
and SearXNG
for making this possible. Go donate to both projects!
Shout out to the developer of pagodo for the inspiration and that sweet ghdb_scraper.py script!
Last but certainly not least, shout out to j0hnny. The OG Dork.
Setup
LINUX ONLY Sorry Normies
Install DorXNG
git clone https://github.com/researchanddestroy/dorxng
cd dorxng
pip install -r requirements.txt
./DorXNG.py -h
Download and Run Our Custom SearXNG Docker Container (at least one). Multiple SearXNG instances can be used. Use the --serverlist
option with DorXNG. See: server.lst
When starting multiple containers, wait at least a few seconds between starting each one.
docker run researchanddestroy/searxng:latest
If you would like to build the container yourself,
git clone https://github.com/researchanddestroy/searxng # The URL must be all lowercase for the build process to complete
cd searxng
DOCKER_BUILDKIT=1 make docker.build
docker images
docker run <image-id>
By default DorXNG has a hard coded server
variable in parse_args.py that is set to the IP address that Docker will assign to the first container you run on your machine 172.17.0.2.
This can be changed, overwritten with --server
or --serverlist
.
Start Issuing Search Queries
./DorXNG.py -q 'search query'
Query the DorXNG Database
./DorXNG.py -D 'regex search string'
Instructions
-h, --help show this help message and exit
-s SERVER, --server SERVER
DorXNG Server Instance - Example: 'https://172.17.0.2/search'
-S SERVERLIST, --serverlist SERVERLIST
Issue Search Queries Across a List of Servers - Format: Newline Delimited
-q QUERY, --query QUERY
Issue a Search Query - Examples: 'search query' | '!tch search query' | 'site:example.com intext:example'
-Q QUERYLIST, --querylist QUERYLIST
Iterate Through a Search Query List - Format: Newline Delimited
-n NUMBER, --number NUMBER
Define the Number of Page Result Iterations
-c CONCURRENT, --concurrent CONCURRENT
Define the Number of Concurrent Page Requests
-l LIMITDATABASE, --limitdatabase LIMITDATABASE
Set Maximum Database Size Limit - Starts New Database After Exceeded - Example: --limitdatabase 10 (10k Database Entries) - Suggested Maximum Database Size is 50k
when doing Deep Recursion
-L LOOP, --loop LOOP Define the Number of Main Function Loop Iterations - Infinite Loop with 0
-d DATABASE, --database DATABASE
Specify SQL Database File - Default: 'dorxng.db'
-D DATABASEQUERY, --databasequery DATABASEQUERY
Issue Database Query - Format: Regex
-m MERGEDATABASE, --mergedatabase MERGEDATABASE
Merge SQL Database File - Example: --mergedatabase database.db
-t TIMEOUT, --timeout TIMEOUT
Specify Timeout Interval Between Requests - Default: 4 Seconds - Disable with 0
-r NONEWRESULTS, --nonewresults NONEWRESULTS
Specify Number of Iterations with No New Results - Default: 4 (3 Attempts) - Disable with 0
-v, --verbose Enable Verbose Output
-vv, --veryverbose Enable Very Verbose Output - Displays Raw JSON Output
Tips
You might occasionally encounter a Tor exit node that upstream search providers have already shunted, in which case your search results will be sparse.
Not to worry… Just keep firing off queries.
Keep your DorXNG SQL database file and rerun your command, or use the --loop
switch to iterate the main function repeatedly.
Most often, the more passes you make over a search query, the more results you’ll find.
Also keep in mind that we have made a sacrifice in speed for a higher degree of data output. This is an OSINT
project after all.
Each search query you make is being issued to 7
upstream search providers… Especially with --concurrent
queries this generates a lot of upstream requests. So have patience.
Keep in mind that DorXNG will continue to append new search results to your database file. Use the --database
switch to specify a database filename, the default filename is dorxng.db
.
This probably doesn’t matter for most, but if you want to keep your OSINT
investigations separate, it’s there for you.
Four concurrent search requests seem to be the sweet spot. You can issue more, but the more queries you issue at a time, the longer it takes to receive results.
It also increases the likelihood you will receive HTTP/429 Too Many Requests
responses from upstream search providers on that specific Tor circuit.
If you start multiple SearXNG Docker containers too rapidly, Tor connections may fail to establish.
While initializing a container, a valid response from the Tor Connectivity Check function looks like this:
Checking Tor Connectivity..
{"IsTor":true,"IP":"<tor-exit-node>"}
If you see anything other than that, or if you start to see HTTP/500
response codes coming back from the SearXNG monitor script (STDOUT in the container), kill the Docker container and spin up a new one.
HTTP/504 Gateway Time-out
Response codes within DorXNG are expected sometimes. This means the SearXNG instance did not receive a valid response within one minute.
That specific Tor circuit is probably too slow. Just keep going! There really isn’t a reason to run a ton of these containers.
Yet… How many you run really depends on what you’re doing. Each container uses approximately 1.25GBs
of RAM.
Running one container works perfectly fine, except you will likely miss search results. So use --loop
and do not disable --timeout
.
Running multiple containers is nice because each has its own Tor circuit thats refreshes every 10 seconds.
When in running --serverlist
mode disable the --timeout
feature so there is no delay between requests (The default delay interval is 4 seconds).
Keep in mind that the more containers you run, the more memory you will need. This goes for deep recursion too. We have disabled Python’s maximum recursion limit…
The more recursions your command goes through without returning to main
the more memory the process will consume. You may come back to find that the process has crashed with a Killed
error message.
If this happens, your machine ran out of memory and killed the process. Not to worry, though… Your database file is still good.
If your database file gets exceptionally large, it inevitably slows down the program and consumes more memory with each iteration.
Those Python Stack Frames are Thicc…
Researchers have seen a marked drop in performance with database files that exceed approximately 50 thousand entries.
The--limitdatabase
option has been implemented to mitigate some of these memory consumption issues. Use it in combination with --loop
to break deep recursive iteration inside iterator.py and restart from main
right where you left off.
Once you have a series of database files, you can merge them all (one at a time) with --mergedatabase
.
You can even merge them all into a new database file if you specify an unused filename with --database
.
Do NOT merge data into a database that is currently being used by a running DorXNG process. This may cause errors and potentially corrupt the database.
The included query.lst file contains every dork that currently exists on the Google Hacking Database (GHDB). See: ghdb_scraper.py
Researchers have already run through it for you… Our ghdb.db
file contains over one million entries and counting! You can download it here ghdb.db if you’d like a copy.
An example of querying the ghdb.db
database:
./DorXNG.py -d ghdb.db -D '^http.*\.sql$'
A rewrite of DorXNG
in Golang
is already in the works. (GorXNG
? | DorXNGNG
?)
Researchers are gonna need more dorks… Check out DorkGPT
Examples
Single Search Query
./DorXNG.py -q 'search query'
Concurrent Search Queries
./DorXNG.py -q 'search query' -c4
Page Iteration Mode
./DorXNG.py -q 'search query' -n4
Iterative Concurrent Search Queries
./DorXNG.py -q 'search query' -c4 -n64
Server List Iteration Mode
./DorXNG.py -S server.lst -q 'search query' -c4 -n64 -t0
Query List Iteration Mode
./DorXNG.py -Q query.lst -c4 -n64
Query and Server List Iteration
./DorXNG.py -S server.lst -Q query.lst -c4 -n64 -t0
Main Function Loop Iteration Mode
./DorXNG.py -S server.lst -Q query.lst -c4 -n64 -t0 -L4
Infinite Main Function Loop Iteration Mode with a Database File Size Limit Set to 10k Entries
./DorXNG.py -S server.lst -Q query.lst -c4 -n64 -t0 -L0 -l10
Merging a Database (One at a Time) into a New Database File
./DorXNG.py -d new-database.db -m dorxng.db
Merge All Database Files in the Current Working Directory into a New Database File
for i in `ls *.db`; do ./DorXNG.py -d new-database.db -m $i; done
Query a Database
./DorXNG.py -d new-database.db -D 'regex search string'