LeakDB : Web-Scale NoSQL Idempotent Cloud-Native Big-Data

LeakDB : Web-Scale NoSQL Idempotent Cloud-Native Big-Data Serverless Plaintext Credential Search

LeakDB is a tool set designed to allow organizations to build and deploy their own internal plaintext “Have I Been Pwned”-like service. The LeakDB tool set can normalize, deduplicate, index, sort, and search leaked data sets on the multi-terabyte-scale, without the need to distribute large files to individual users. Once curated, LeakDB can search terabytes of data in less than a tenth of a second, and the LeakDB server exposes a simple JSON API that can be queried using the command line client or any http client. It can be deployed in a serverless configuration with a BigQuery backend (no indexes), or as an offline/traditional server with indexes.

LeakDB uses a configurable bloom filter to remove duplicate entires, sorts indexes using external parallel quicksort (i.e., memory constrained) with a k-way binary tree merge, and binary tree search to find entries in the index.

Usage

Normalizing Data

The first step is to normalize your data sets, details on how to do this can be found here. Be sure to keep all your normalized files in a single directory. This step is the same for both the serverless and server deployments.

Next you’ll need to decide if you want to setup a traditional server deployment or a serverless deployment, there are benefits and drawbacks to each as described below.

Server Deployment

The server deployment is the cheapest way to setup LeakDB, but you’ll have to compute indexes yourself. The amount of time it takes to compute an index varies depending on the data set and your hardware. LeakDB can sort multiple terabyte data sets on almost any hardware but it could take a very long time. Domain indexes in particular can take a very long time to compute due to the high number of collisions. However, you only have to pay the computational price once. That is to say once-per-data-set, if you’ve already computed an index and want to add additional data you’ll need to re-sort the entire index (you can save bloom filter results, etc. though), so be sure to collect and normalize all your data before computing the index.

Serverless Deployment

The severless deployment generally costs more money to run as the backend is BigQuery. With BigQuery you pay for data storage as well paying per-query; it isn’t particularly expensive unless you start running lots of queries against the data set. It’s also much faster to setup since you don’t have to compute indexes yourself.

Compile From Source

Just run make <platform>, files will be put in ./bin. The client, curator, and server are pure Go and should support any valid Go compiler target but you may need to modify the Makefile. The serverless binary is Linux only, since AWS Lambda only supports Linux. The easiest way to compile the Windows binaries is to cross-compile them from a better operating system like Linux or MacOS.

For example:

make macos
make linux
make windows

Download

R K

Next Databases Worldwide are Full of Security Holes »

Previous « Kekeo : A Little Toolbox To Play With Microsoft Kerberos In C

Wifi-Hacking.py : Your Ultimate Guide To Ethical WiFi Penetration Testing

Unlock the potential of ethical hacking with Wifi-Hacking.py, a powerful cybersecurity tool designed to navigate…

14 hours ago

Cyber security

THREAT ACTORS – TTPs : Decoding The Digital Underworld Through Comprehensive Mapping

This repository was created with the aim of assisting companies and independent researchers about Tactics,…

14 hours ago

Cyber security

MagicDot : Harnessing DOT-To-NT Path Conversion For Rootkit-Like Capabilities

A set of rootkit-like abilities for unprivileged users, and vulnerabilities based on the DOT-to-NT path…

14 hours ago

Cyber security

Ethical Hacking And Penetration Testing Tools – Harnessing Python For Robust Cybersecurity Solutions

This repository contains tools created by yogSahare0 while learning Python 3 for ethical hacking and penetration testing.…

4 days ago

Cyber security

SentinelEye – Automated Wireless Security Toolkit

"NetSecChallenger" provides a suite of automated tools designed for security professionals and network administrators to…

4 days ago

Cyber security

Autohack : Your Step-By-Step Guide To Installation And Setup

The essential tool for cybersecurity enthusiasts! This guide provides a detailed walkthrough on how to…

4 days ago

Kali Linux Tutorials