Scallion : GPU-Based Onion Hash Generator

Scallion lets you create vanity GPG keys and .onion addresses (for Tor’s hidden services) using OpenCL. It runs on Mono (tested in Arch Linux) and .NET 3.5+ (tested on Windows 7 and Server 2008).

It is currently in beta stage and under active development. Nevertheless, we feel that it is ready for use. Improvements are expected primarily in performance, user interface, and ease of installation, not in the overall algorithm used to generate keys.

FAQ

Here are some frequently asked questions and their answers:

  1. Why generate GPG keys? Scallion was used to find collisions for every 32bit key id in the Web of Trust’s strong set demonstrating how insecure 32bit key ids are. There was/is a talk at DEFCON (video) and additional info can be found at https://evil32.com/.
  2. What are valid characters? Tor .onion addresses use Base32, consisting of all letters and the digits 2 through 7, inclusive. They are case-insensitive. GPG fingerprints use hexadecimal, consisting of the digits 0-9 and the letters A-F.
  3. Can you use Bitcoin ASICs (e.g. Jalapeno, KnC) to accelerate this process? Sadly, no. While the process Scallion uses is conceptually similar (increment a nonce and check the hash), the details are different (SHA-1 vs double SHA-256 for Bitcoin). Furthermore, Bitcoin ASICs are as fast as they are because they are extremely tailored to Bitcoin mining applications. For example, here’s the datasheet for the CoinCraft A-1, an ASIC that never came out, but is probably indicitive of the general approach. The microcontroller sends work in the form of the final 128-bits of a Bitcoin block, the hash midstate of the previous bits, a target difficulty, and the maximum nonce to try. The ASIC chooses the location to insert the nonce, and it chooses what blocks meet the hash. Scallion has to insert the nonce in a different location, and it checks for a pattern match rather than just “lower than XXXX”.
  4. How can you use multiple devices? Run multiple Scallion instances. 😄 Scallion searches are probabilistic, so you won’t be repeating work with the second device. True multi-device support wouldn’t be too difficult, but it also wouldn’t add much. I’ve run several scallion instances in tmux or screen with great success. You’ll just need to manually abort all the jobs when one finds a pattern (or write a shell script to monitor the output file and kill them all when it sees results).

Also Read – LOLBITS : C# Reverse Shell Using BITS As Communication Protocol

Dependencies

  • OpenCL and relevant drivers installed and configured. Refer to your distribution’s documentation.
  • OpenSSL. For Windows, the prebuilt x86 DLLs are included
  • On windows only, VC++ Redistributable 2008

Build Linux

Prerequisites

  • Get the latest mono for your linux distribution: http://www.mono-project.com/download/
  • Install Common dependencies: sudo apt-get update sudo apt-get install libssl-dev mono-devel
  • AMD/OpenSource build sudo apt-get install ocl-icd-opencl-dev
  • Nvidia build sudo apt-get install nvidia-opencl-dev nvidia-opencl-icd
  • Finally msbuild scallion.sln

Docker Linux (nvidia GPUs only)

Build Windows

  • Open ‘scallion.sln’ in VS Express for Desktop 2012
  • Build the solution, I did everything in debug mode.

Multipattern Hashing

Scallion supports finding one or more of multiple patterns through a primitive regex syntax. Only character classes (ex. [abcd]) are supported. The . character represents any character. Onion addresses are always 16 characters long and GPG fingerprints are always 40 characters. You can find a suffix by putting $ at the end of the match (ex. DEAD$). Finally, the pipe syntax (ex. pattern1|pattern2) can be used to find multiple patterns. Searching for multible patterns (within reason) will NOT produce a significant decrease in speed. Many regexps will produce a single pattern on the GPU and result in no speed reduction.

Some use cases with examples:

  • Generate a prefix followed by a number for better readability: mono scallion.exe prefix[234567]
  • Search for several patterns at once (n.b. -c causes scallion to continue generating even once it gets a hit) mono scallion.exe -c prefix scallion hashes mono scallion.exe -c "prefix|scallion|hashes"
  • Search for the suffix “badbeef” mono scallion.exe .........badbeef mono scallion.exe --gpg badbeef$ # Generate GPG key
  • Complicated self explanatory example: mono scallion.exe "suffixa$|suffixb$|prefixa|prefixb|a.suffix$|a.test.$"

How Does It work?

At a high level Scallion works as follows:

  • Generate RSA key using OpenSSL on the CPU
  • Send the key to the GPU
  • Increase the key’s public exponent
  • Hash the key
  • If the hashed key is not a partial collision go to step 3
  • If the key does not pass the sanity checks recommended by PKCS #1 v2.1 (checked on the CPU) go to step 3
  • Brand new key with partial collision!

The basic algorithm is described above. Speed / performance is the result of massive parallelization, both on the GPU and the CPU.

Speed / Performance

It is important to realize that Scallion preforms a probabilistic search. Actual times may very significantly from predicated

The inital RSA key generation is done the CPU. An ivybridge i7 can generate 51 keys per second using a single core. Each key can provide 1 gigahash worth of exponents to mine and a decent CPU can keep up with several GPUs as it is currently implemented.

SHA1 hashing is done on the GPU. The hashrates for several GPUs we have tested are below (grouped by manufacturer and sorted by power):

GPUSpeed
Intel i7-2620M9.9 MH/s
Intel i5-5200U118 MH/s
NVIDIA GT 52038.7 MH/s
NVIDIA Quadro K2000M90 MH/s
NVIDIA GTS 250128 MH/s
NVIDIA GTS 450144 MH/s
NVIDIA GTX 670480 MH/s
NVIDIA GTX 9702350 MH/s
NVIDIA GTX 9803260 MH/s
NVIDIA GTX 1050 (M)1400 MH/s
NVIDIA GTX 10704140 MH/s
NVIDIA GTX 1070 TI5100 MH/s
NVIDIA GTX TITAN X4412 MH/s
NVIDIA GTX 10805760 MH/s
NVIDIA Tesla V10011646 MH/s
AMD A8-7600 APU120 MH/s
AMD Radeon HD5770520 MH/s
AMD Radeon HD6850600 MH/s
AMD Radeon RX 460840 MH/s
AMD Radeon RX 470957 MH/s
AMD Radeon R9 380X2058 MH/s
AMD FirePro W91002566 MH/s
AMD Radeon RX 4802700 MH/s
AMD Radeon RX 5803180 MH/s
AMD Radeon R9 Nano3325 MH/s
AMD Vega Frontier Edition7119 MH/s

MH/s = million hashes per second

Its worth noting that Intel has released OpenCL drivers for its processors and short collisions can be found on the CPU.

To calculate the number of seconds required for a given partial collision (on average), use the formula:

TypeEstimated time
GPG Key2^(4*length-1) / hashspeed
.onion Address2^(5*length-1) / hashspeed

For example on my nVidia Quadro K2000M, I see around 90 MH/s. With those speed I can generate an eight character .onion prefix in about 1h 41m, 2^(5*8-1)/90 million = 101 minutes.

Workgroup Size

Scallion will use your devices reported preferred work group size by default. This is a reasonable default but experimenting with the workgroup may increase performance.

Security

The keys generated by Scallion are quite similar to those generated by shallot. They have unusually large public exponents, but they are put through the full set of sanity checks recommended by PKCS #1 v2.1 via openssl’s RSA_check_key function. Scallion supports several RSA key sizes, with optimized kernels for 1024b, 2048b, and 4096b. Other key sizes may work, but have not been tested.