Scallion : GPU-Based Onion Hash Generator

Scallion lets you create vanity GPG keys and .onion addresses (for Tor’s hidden services) using OpenCL. It runs on Mono (tested in Arch Linux) and .NET 3.5+ (tested on Windows 7 and Server 2008).

It is currently in beta stage and under active development. Nevertheless, we feel that it is ready for use. Improvements are expected primarily in performance, user interface, and ease of installation, not in the overall algorithm used to generate keys.

FAQ

Here are some frequently asked questions and their answers:

Why generate GPG keys? Scallion was used to find collisions for every 32bit key id in the Web of Trust’s strong set demonstrating how insecure 32bit key ids are. There was/is a talk at DEFCON (video) and additional info can be found at https://evil32.com/.
What are valid characters? Tor .onion addresses use Base32, consisting of all letters and the digits 2 through 7, inclusive. They are case-insensitive. GPG fingerprints use hexadecimal, consisting of the digits 0-9 and the letters A-F.
Can you use Bitcoin ASICs (e.g. Jalapeno, KnC) to accelerate this process? Sadly, no. While the process Scallion uses is conceptually similar (increment a nonce and check the hash), the details are different (SHA-1 vs double SHA-256 for Bitcoin). Furthermore, Bitcoin ASICs are as fast as they are because they are extremely tailored to Bitcoin mining applications. For example, here’s the datasheet for the CoinCraft A-1, an ASIC that never came out, but is probably indicitive of the general approach. The microcontroller sends work in the form of the final 128-bits of a Bitcoin block, the hash midstate of the previous bits, a target difficulty, and the maximum nonce to try. The ASIC chooses the location to insert the nonce, and it chooses what blocks meet the hash. Scallion has to insert the nonce in a different location, and it checks for a pattern match rather than just “lower than XXXX”.
How can you use multiple devices? Run multiple Scallion instances. 😄 Scallion searches are probabilistic, so you won’t be repeating work with the second device. True multi-device support wouldn’t be too difficult, but it also wouldn’t add much. I’ve run several scallion instances in tmux or screen with great success. You’ll just need to manually abort all the jobs when one finds a pattern (or write a shell script to monitor the output file and kill them all when it sees results).

Also Read – LOLBITS : C# Reverse Shell Using BITS As Communication Protocol

Dependencies

OpenCL and relevant drivers installed and configured. Refer to your distribution’s documentation.
OpenSSL. For Windows, the prebuilt x86 DLLs are included
On windows only, VC++ Redistributable 2008

Build Linux

Prerequisites

Get the latest mono for your linux distribution: http://www.mono-project.com/download/
Install Common dependencies: sudo apt-get update sudo apt-get install libssl-dev mono-devel
AMD/OpenSource build sudo apt-get install ocl-icd-opencl-dev
Nvidia build sudo apt-get install nvidia-opencl-dev nvidia-opencl-icd
Finally msbuild scallion.sln

Docker Linux (nvidia GPUs only)

Have the nvidia-docker container runtime
Build the container: docker build -t scallion -f Dockerfile.nvidia .
Run: docker run --runtime=nvidia -ti --rm scallion -l screenshot of expected output

Build Windows

Open ‘scallion.sln’ in VS Express for Desktop 2012
Build the solution, I did everything in debug mode.

Multipattern Hashing

Scallion supports finding one or more of multiple patterns through a primitive regex syntax. Only character classes (ex. [abcd]) are supported. The . character represents any character. Onion addresses are always 16 characters long and GPG fingerprints are always 40 characters. You can find a suffix by putting $ at the end of the match (ex. DEAD$). Finally, the pipe syntax (ex. pattern1|pattern2) can be used to find multiple patterns. Searching for multible patterns (within reason) will NOT produce a significant decrease in speed. Many regexps will produce a single pattern on the GPU and result in no speed reduction.

Some use cases with examples:

Generate a prefix followed by a number for better readability: mono scallion.exe prefix[234567]
Search for several patterns at once (n.b. -c causes scallion to continue generating even once it gets a hit) mono scallion.exe -c prefix scallion hashes mono scallion.exe -c "prefix|scallion|hashes"
Search for the suffix “badbeef” mono scallion.exe .........badbeef mono scallion.exe --gpg badbeef$ # Generate GPG key
Complicated self explanatory example: mono scallion.exe "suffixa$|suffixb$|prefixa|prefixb|a.suffix$|a.test.$"

How Does It work?

At a high level Scallion works as follows:

Generate RSA key using OpenSSL on the CPU
Send the key to the GPU
Increase the key’s public exponent
Hash the key
If the hashed key is not a partial collision go to step 3
If the key does not pass the sanity checks recommended by PKCS #1 v2.1 (checked on the CPU) go to step 3
Brand new key with partial collision!

The basic algorithm is described above. Speed / performance is the result of massive parallelization, both on the GPU and the CPU.

Speed / Performance

It is important to realize that Scallion preforms a probabilistic search. Actual times may very significantly from predicated

The inital RSA key generation is done the CPU. An ivybridge i7 can generate 51 keys per second using a single core. Each key can provide 1 gigahash worth of exponents to mine and a decent CPU can keep up with several GPUs as it is currently implemented.

SHA1 hashing is done on the GPU. The hashrates for several GPUs we have tested are below (grouped by manufacturer and sorted by power):

GPU	Speed
Intel i7-2620M	9.9 MH/s
Intel i5-5200U	118 MH/s
NVIDIA GT 520	38.7 MH/s
NVIDIA Quadro K2000M	90 MH/s
NVIDIA GTS 250	128 MH/s
NVIDIA GTS 450	144 MH/s
NVIDIA GTX 670	480 MH/s
NVIDIA GTX 970	2350 MH/s
NVIDIA GTX 980	3260 MH/s
NVIDIA GTX 1050 (M)	1400 MH/s
NVIDIA GTX 1070	4140 MH/s
NVIDIA GTX 1070 TI	5100 MH/s
NVIDIA GTX TITAN X	4412 MH/s
NVIDIA GTX 1080	5760 MH/s
NVIDIA Tesla V100	11646 MH/s
AMD A8-7600 APU	120 MH/s
AMD Radeon HD5770	520 MH/s
AMD Radeon HD6850	600 MH/s
AMD Radeon RX 460	840 MH/s
AMD Radeon RX 470	957 MH/s
AMD Radeon R9 380X	2058 MH/s
AMD FirePro W9100	2566 MH/s
AMD Radeon RX 480	2700 MH/s
AMD Radeon RX 580	3180 MH/s
AMD Radeon R9 Nano	3325 MH/s
AMD Vega Frontier Edition	7119 MH/s

MH/s = million hashes per second

Its worth noting that Intel has released OpenCL drivers for its processors and short collisions can be found on the CPU.

To calculate the number of seconds required for a given partial collision (on average), use the formula:

Type	Estimated time
GPG Key	2^(4*length-1) / hashspeed
.onion Address	2^(5*length-1) / hashspeed

For example on my nVidia Quadro K2000M, I see around 90 MH/s. With those speed I can generate an eight character .onion prefix in about 1h 41m, 2^(5*8-1)/90 million = 101 minutes.

Workgroup Size

Scallion will use your devices reported preferred work group size by default. This is a reasonable default but experimenting with the workgroup may increase performance.

Security

The keys generated by Scallion are quite similar to those generated by shallot. They have unusually large public exponents, but they are put through the full set of sanity checks recommended by PKCS #1 v2.1 via openssl’s RSA_check_key function. Scallion supports several RSA key sizes, with optimized kernels for 1024b, 2048b, and 4096b. Other key sizes may work, but have not been tested.

Download

R K