Security Automation with Python — Quickly generate common IOCs from files with Python
— Tutorial, Python, IOCs, Indicators of Compromise, MD5, SHA-1, SHA-256, Security Automation, Security Information Automation — 7 min read
I'm Brett Fullam, a creative technologist turned networking and cybersecurity specialist passionate about security automation. In this blog post, I'm going to share with you a Python script I created to quickly gather common Indicators of Compromise (IOCs) from a single file or directory that can be incorporated into tools like Mandiant's IOCe.
IOCs
Indicators of compromise (IOCs) are a method of sharing information between organizations to determine if they've had contact with similar attacks. Some common IOCs are IP addresses, email addresses, file size values, MD5 or SHA-1 hashes, and Strings.
The following script I created quickly generates file property IOCs like file name and file size in bytes, as well as file hash IOCs using MD5, SHA-1, and SHA-256 hashes. File hash IOCs are particularly useful, and can be used to search for the same file on a single system or many systems while Threat Hunting.
Automation
I'm always trying to identify and automate repetitive tasks in my workflows. Especially if I'll be performing the same tasks again in the future.
In this particular case, I was working my way through a tutorial on how to generate IOCs for the purpose of entering that data into Mandiant's IOC Editor (IOCe) which is a free tool to manage and edit IOCs. These IOCs can then be used to scan a host or hosts for the same file or files.
To do this, I was generating and collecting 5 types of IOC data from a single file. The file name and extension, an MD5 hash, a SHA-1 hash, a SHA-256 hash, as well as the file size in bytes. Worse yet, I had to repeat these five steps for each file.
Once all that data was collected, I would have to input each value one at a time into Mandiant's IOCe application to create an IOC entry. Not only was it ridiculously tedious to accomplish this, but it was mind numbingly repetitive and time consuming.
I knew that I could definitely automate the process of generating all the of the IOC information for each file which would significantly reduce the amount of time needed to complete this type of task.
The Plan
The are two options available to us that will help us generate all of the IOC information. We can either use "subprocess" to access the host system and execute them at the system level, or we can make a pure Python-based script that's platform independent.
For the sake of creating something more portable and platform agnostic, we'll focus on creating our script using only the libraries included in Python.
Ok, right off the top we know that grabbing the file name should be pretty straight forward.
We also know that file hashing can be done using the hashlib module which is part of the standard Python installation.
Grabbing the file size in bytes is a little trickier, but not impossible. For this, we'll use the Python os module's os.stat to grab both the file name and the file size in bytes.
To make our script feel more like a CLI tool we're going to use argparse, which is a command-line parsing module in the Python standard library, to interact with it. We're going to include options to select either a single file or an entire directory of files to leverage the full benefit automation.
While we're at it, we're also going to write all of the output to a text file to make it easier to work with or share with our colleagues.
Let's get started.
Getting started
To get started, we'll need to import the following modules:
import argparseimport timefrom pathlib import Pathimport hashlib
Before we can interact with our script from the command-line, we'll need to initiate the parser and define our arguments (which can also be referred to as flags or switches).
# Initiate the parserparser = argparse.ArgumentParser(description="Python Automated IOC Generator v2.0 by Brett Fullam")parser.add_argument("-f", "--file", help="select file as input")parser.add_argument("-d", "--directory", help="select directory as input")parser.add_argument("-V", "--version", help="show program version", action="store_true")
Next we're going to include the following code to grab the epoch timestamp from the host system. We'll use this later on to include a human readable timestamp on our reports.
# grab the epoch timestamp at run time and convert to human-readable for the artifact output document footer informationtimeStamp = time.time()
# convert epoch timestamp to human-readable date time formattedreport_time = time.strftime('%c', time.localtime(timeStamp))
# create a custom string to be included at the end of the generated outputreport_time_footer = str('IOCs generated on: ') + report_time + str('\ncreated by Python Automated IOC Generator') + str('\n\n')
Next we'll create and open a text document called "output.txt" that the script will write our generated IOC values to.
# create and open a file named 'output.txt' to write our data tof = open("output.txt", "w")
After that, we'll write a function called "iocGrab" which is going to generate all of the IOC values for each file submitted. I've included detailed comments in the code below for a more detailed explanation of each section.
def iocGrab(arg):
# store a single entry value from direct user input, or from the directory_as_input() function target_file = arg
# use Path module to access .stat results -- 'name' to grab the file name, and 'st_size' to grab the file size in bytes file_ = Path(target_file) fileName = (file_.name) fileStats = (file_.stat().st_size)
# Grab the name of the file, create a custom string, print output to screen, as well as write to 'output.txt' fileNameOutput = "\nFile name: " + fileName + "\n\n" print(fileNameOutput) f.write(fileNameOutput)
# /// Hashing START
# Open and read the file contents to create the MD5 hash of the file md5_hash = hashlib.md5() with open(target_file,"rb") as f4: # Read and update hash string value in blocks of 4K for byte_block in iter(lambda: f4.read(4096),b""): md5_hash.update(byte_block) md5hash = (md5_hash.hexdigest())
# Output the MD5 hash value created by hashlib.md5() as reference md5Data = str("MD5: " + md5hash + "\n") print(md5Data) f.write(md5Data)
# Open and read the file contents to create the SHA-1 hash of the file sha1_hash = hashlib.sha1() with open(target_file,"rb") as f3: # Read and update hash string value in blocks of 4K for byte_block in iter(lambda: f3.read(4096),b""): sha1_hash.update(byte_block) sha1hash = (sha1_hash.hexdigest())
# Output the SHA-1 hash value created by hashlib.sha1() as reference sha1Data = str("SHA-1: " + sha1hash + "\n") print(sha1Data) f.write(sha1Data)
# Open and read the file contents to create the SHA-256 hash of the file sha256_hash = hashlib.sha256() with open(target_file,"rb") as f2: # Read and update hash string value in blocks of 4K for byte_block in iter(lambda: f2.read(4096),b""): sha256_hash.update(byte_block) sha256hash = (sha256_hash.hexdigest())
# Output the SHA-256 hash value created by hashlib.sha256() as reference sha256Data = str("SHA-256: " + sha256hash + "\n") print(sha256Data) f.write(sha256Data)
# /// Hashing END
# Create a custom string to show the file size in bytes fileSizeBytes = ("Size in bytes: " + str(fileStats) + ("\n\n"))
# Store the 'size' value as fileSizeBytes, create a custom string, print output to screen, as well as write to 'output.txt' print(fileSizeBytes) f.write(fileSizeBytes)
return
Now that we have a function that will generate our IOCs from a single file, it's time to take it one step further and create a new function called "directory_as_input" which will include a loop that will iterate over a directory of files and send each one to the iocGrab() function as input.
def directory_as_input(arg):
# store the directory_as_input value in the path_of_the_directory variable path_of_the_directory = arg
# initialize list 'i' i = [ ]
# use Path to get the file paths for each file in the directory entries = Path(path_of_the_directory) for i in entries.iterdir(): # send a concatenated value using "i.parent" and "i.name" to create the relative path for each file # which will appears as "directory-indicated/filename" as the loop iterates over the list 'i" grabPath = (i.parent / i.name) # the 'grabPath' value is then passed to the iocGrab() function iocGrab(grabPath)
f.write(report_time_footer) f.close()
print(report_time_footer)
In the previously published version of this script, I used a main() function to handle user input once the script was run. While this worked, it wasn't the best user experience since you had to manually enter file paths and names without the benefit of autocomplete that a CLI tool would provide.
To resolve this usability issue I decided to use argparse, which is a command-line parsing module in the Python standard library, to handle user input at the same time the script is executed in the CLI. Now users can specify if it's a single file or a directory at the time of executing the script. The best part is that they can even use the CLI's autocomplete feature by simply using the "tab" key to spare them from typing the path to the file or directory manually.
Argparse allows us to include arguments to select either a single file or an entire directory of files to leverage the full benefit automation.
We can even provide a help option to help the user interact with the script. Like any other CLI tool, a user can add "-h" or "--help" to see a usage example for our script along with all the available arguments to choose from.
python3 ioc-generator-v2.0.py --help
Here's the output from the command above that uses the "--help" option:
usage: ioc-generator-v2.0.py [-h] [-f FILE] [-d DIRECTORY] [-V]
Python Automated IOC Generator v2.0 by Brett Fullam
options: -h, --help show this help message and exit -f FILE, --file FILE select file as input -d DIRECTORY, --directory DIRECTORY select directory as input -V, --version show program version
What's even better, is that it also performs some basic error handling as well. When the user tries to run the script without any options or inputs an invalid option, the following output is presented to help the user:
# this is the output when no options are included in the command# python3 ioc-generator-v2.0.pyusage: ioc-generator-v2.0.py [-h] [-f FILE] [-d DIRECTORY] [-V]
# this is the output when an invalid option is included in the command# python3 ioc-generator-v2.0.py -gusage: ioc-generator-v2.0.py [-h] [-f FILE] [-d DIRECTORY] [-V]ioc-generator-v2.0.py: error: unrecognized arguments: -g
All that we have left to do is read the arguments from the command line, and check them against a conditional statement to determine which argument was selected and how to proceed.
# Read arguments from the command lineargs = parser.parse_args()
# Check for --version or -Vif args.file: iocGrab(args.file) print(report_time_footer) f.write(report_time_footer) f.close()# Check for --directory or -delif args.directory: directory_as_input(args.directory)# Check for --file or -felif args.version: print("IOC Generator version 2.0")# Print usage information if no arguments are providedelse: print("usage: ioc-generator-v2.0.py [-h] [-f FILE] [-d DIRECTORY] [-V]")
That's the of our script.
Putting it all together
At this point you can either use the script you've created using the code in the steps listed above, or you can download a finished version of the script from my Github repository which also includes sample files to test the script.
Download a copy of the Github project repository
You can download the repository from my Github account with a web browser here, or by using the following commands in a terminal session. Either option will work.
# confirm git is installedgit --version
If you don't have git installed, download and install the "Latest source Release" for Git.
If git is already installed, download the repository using the git clone command:
git clone https://github.com/b-fullam/IOC-Generator-v2.git
Once the repository is downloaded
Once the repository is downloaded, pop it open and have a look inside.
The following output was created using the "tree" command which works on Linux, Mac and Windows. You should check the options available to you for the "tree" command specific to your system. The options listed below work on both Linux and Mac.
# "-L level" descends only to the "level" of directories deep# "-L" option and "1" limit the output to 1 level deep
tree -L 1
You could use the "ls" command on Linux and Mac, or use "dir" on Windows to view the contents of the directory. To be more platform friendly I opted to use the "tree" command which can be used by all 3.
.├── 2innocent.pdf├── LICENSE├── README.md├── ioc-generator-v2.0.py└── ioc-samples
1 directory, 4 files
There are 4 files and 1 directory in total:
- The finished version of the script using only using standard Python libraries (file name ending in ".py")
- License
- README.md file
- 2innocent.pdf sample file for testing
- ioc-samples directory that contains 2 sample files for testing
The directory, which contains 2innocent.pdf and highly_malicious.txt, are harmless sample files for you to test the "directory_as_input" functionality included in the script to see first hand how multiple files are processed.
Review the finished code BEFORE you use it
While you could download the finished code directly from my github repository and start using it, I highly recommend reviewing the entire python script first instead for the following reasons.
Security
A secure code review should always be the first thing you do after downloading code from an untrusted source, and definitely BEFORE you execute it in a production environment. Malicious code could have been included and buried somewhere inside of it.
Have a look under the hood
It's always a good idea to see how the code works prior to executing it. This way you'll have a better understanding of what's happening in the background, and possibly even identify areas that you could improve on it as well.
Development Environment
This script was created using Python 3, and all of the necessary dependencies are already included in the standard Python installation. The only requirement is to have Python 3 installed in your development environment.
Python
Before proceeding you should check which version of Python is currently installed on your system, and install Python 3 if necessary.
Heads-up. The following 2 commands are version specific, and will only return information for Python 2 or Python 3 individually.
# this will return information regarding Python version 2 onlypython --version
You'll need to run this command instead to see information for Python 3:
python3 --version
If you're having trouble confirming which version is installed or need to install Python 3 then go to python.org/downloads, and download the appropriate version for your system.
Running the script
Run the following command to test the script using the "-h" or "--help" option to view more information about the available options:
python3 ioc-generator-v2.0.py --help
You will be presented with the following output:
usage: ioc-generator-v2.0.py [-h] [-f FILE] [-d DIRECTORY] [-V]
Python Automated IOC Generator v2.0 by Brett Fullam
options: -h, --help show this help message and exit -f FILE, --file FILE select file as input -d DIRECTORY, --directory DIRECTORY select directory as input -V, --version show program version
From here you can see all the available options included in our script.
Let's select a single file as output using the "-f" option, and the sample file "2innocent.pdf" that's included in the Github repository:
python3 ioc-generator-v2.0.py -f 2innocent.pdf
Just like any other CLI tool, you can use the "tab" key to leverage the power of the CLI's autocomplete feature when you enter the name of the sample file.
When the file is processed, each IOC is generated, and the output is both printed to the terminal and written to a text file called "output.txt" as well. The output.txt file is located in the same directory the python script resides.
Let's select a directory as output using the "-d" option, and the sample directory "ioc-samples" that's included in the Github repository:
python3 ioc-generator-v2.0.py -d ioc-samples
Here's sample output from the "output.txt" file for the "ioc-samples" directory:
File name: 2innocent.pdf
MD5: 2942bfabb3d05332b66eb128e0842cffSHA-1: 90ffd2359008d82298821d16b21778c5c39aec36SHA-256: 3df79d34abbca99308e79cb94461c1893582604d68329a41fd4bec1885e6adb4Size in bytes: 13264
File name: highly_malicious.txt
MD5: c690acda02c040be19c0406385226cbfSHA-1: 221630b12fe82060f9cda48b00c959dbd5a5ba68SHA-256: 9eb4971f0f19809e61bfaead7428a476f0bdbcb323edb6d481606f9c3d6726c5Size in bytes: 74
IOCs generated on: Sat Jan 29 12:15:22 2022created by Python Automated IOC Generator
Please note ... the output.txt file will be overwritten each time the script is run. I recommend renaming or relocating the output.txt file prior to re-running the script.
For a complete version of this Python script, you can visit my Github account and download it directly from the project repository.