Cyber security

CloudGrep – Efficient Direct Search For Logs In Cloud Storage

The innovative solution designed to streamline your search processes in cloud storage. By bypassing traditional log indexing in SIEMs, CloudGrep offers a faster, cost-effective method to delve directly into your AWS S3 logs.

Whether you’re debugging applications or conducting a security investigation, discover how CloudGrep enhances your cloud-based search capabilities.

Why?

  • Directly searching cloud storage, without indexing logs into a SIEM or Log Analysis tool, can be faster and cheaper.
  • There is no need to wait for logs to be ingested, indexed, and made available for searching.
  • It searches files in parallel for speed.
  • If you run this in the same region as the S3 bucket using a VPC endpoint for S3 you can avoid data transfer costs. Do check first!
  • This may be of use when debugging applications, or investigating a security incident.

Example

Simple example:

python3 cloudgrep.py --bucket test-s3-access-logs --query 9RXXKPREHHTFQD77
python3 cloudgrep.py -b test-s3-access-logs -q 9RXXKPREHHTFQD77

More complicated example:

python3 cloudgrep.py -b test-s3-access-logs --prefix "logs/" --filename ".log" -q 9RXXKPREHHTFQD77 -s "2023-01-09 20:30:00" -e "2023-01-09 20:45:00" --file_size 10000 --debug

Saving the output to a file:

python3 cloudgrep.py -b test-s3-access-logs -q 9RXXKPREHHTFQD77 --hide_filenames > output.txt

Example output:

Bucket is in region: us-east-2 : Search from the same region to avoid egress charges.
Searching 11 files in test-s3-access-logs for 9RXXKPREHHTFQD77...
access2023-01-09-20-34-20-EAC533CB93B4ACBE: abbd82b5ad5dc5d024cd1841d19c0cf2fd7472c47a1501ececde37fe91adc510 bucket-72561-s3bucketalt-1my9piwesfim7 [09/Jan/2023:19:20:00 +0000] 1.125.222.333 arn:aws:sts::000011110470:assumed-role/bucket-72561-myResponseRole-1WP2IOKDV7B4Y/1673265251.340187 9RXXKPREHHTFQD77 REST.GET.BUCKET - "GET /?list-type=2&prefix=-collector%2Fproject-&start-after=&encoding-type=url HTTP/1.1" 200 - 946 - 33 32 "-" "Boto3/1.21.24 Python/3.9.2 Linux/5.10.0-10-cloud-amd64 Botocore/1.24.46" - aNPuHKw== SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader bucket-72561-s3bucketalt-1my9piwesfim7.s3.us-east-2.amazonaws.com TLSv1.2 - -

Arguments

python3 cloudgrep.py --help
usage: cloudgrep.py [-h] -b BUCKET -q QUERY [-p PREFIX] [-f FILENAME] [-s START_DATE] [-e END_DATE] [-fs FILE_SIZE] [-d] [-hf]

CloudGrep searches is grep for cloud storage like S3.

options:
  -h, --help            show this help message and exit
  -b BUCKET, --bucket BUCKET
                        Bucket to search. E.g. my-bucket
  -q QUERY, --query QUERY
                        Text to search for. Will be parsed as a Regex. E.g. example.com
  -p PREFIX, --prefix PREFIX
                        Optionally filter on the start of the Object name. E.g. logs/
  -f FILENAME, --filename FILENAME
                        Optionally filter on Objects that match a keyword. E.g. .log.gz
  -s START_DATE, --start_date START_DATE
                        Optionally filter on Objects modified after a Date or Time. E.g. 2022-01-01
  -e END_DATE, --end_date END_DATE
                        Optionally filter on Objects modified before a Date or Time. E.g. 2022-01-01
  -fs FILE_SIZE, --file_size FILE_SIZE
                        Optionally filter on Objects smaller than a file size, in bytes. Defaults to 100 Mb.
  -d, --debug           Enable Debug logging.
  -hf, --hide_filenames
                        Dont show matching filesnames.

Deployment

Install with: pip3 install -r requirements.txt

You can run this from your local laptop, or from an EC2 instance in the same region as the S3 bucket with a VPC endpoint for S3 to avoid egress charges.

You can authenticate in a number of ways. If you are running on an EC2, an Instance Profile is likely the best choice.

Varshini

Varshini is a Cyber Security expert in Threat Analysis, Vulnerability Assessment, and Research. Passionate about staying ahead of emerging Threats and Technologies.

Recent Posts

Understanding the Model Context Protocol (MCP) and How It Works

Introduction to the Model Context Protocol (MCP) The Model Context Protocol (MCP) is an open…

2 days ago

The file Command – Quickly Identify File Contents in Linux

While file extensions in Linux are optional and often misleading, the file command helps decode what a…

2 days ago

How to Use the touch Command in Linux

The touch command is one of the quickest ways to create new empty files or update timestamps…

2 days ago

How to Search Files and Folders in Linux Using the find Command

Handling large numbers of files is routine for Linux users, and that’s where the find command shines.…

2 days ago

How to Move and Rename Files in Linux with the mv Command

Managing files and directories is foundational for Linux workflows, and the mv (“move”) command makes it easy…

2 days ago

How to Create Directories in Linux with the mkdir Command

Creating directories is one of the earliest skills you'll use on a Linux system. The mkdir (make…

2 days ago