WARCannon : High Speed/Low Cost CommonCrawl RegExp In Node.js

WARCannon was built to simplify and cheapify the process of ‘grepping the internet’. With WARCannon, you can: Build and test regex patterns against real Common Crawl data Easily load Common Crawl datasets for parallel processing Scale compute capabilities to asynchronously crunch through WARCs at frankly unreasonable capacity. Store and easily retrieve the results How It …