Skip to content
Brett Fullam
LinkedInGitHub

Security Automation with Python — IP address and URL analysis via VirusTotal's API v3 with HTML Reporting

Python, VirusTotal, VirusTotal API v3, Security Automation, Security Information Automation, IP address analysis, URL analysis, Bulk IP address analysis, Bulk URL analysis17 min read

I'm Brett Fullam, a creative technologist turned networking and cybersecurity specialist passionate about security automation. In this blog post, I'm going to show you how I created a Python script to automate the submission of IP addresses or URLs for analysis using VirusTotal’s API v3 that generates custom HTML reports.

Security Automation Python DNS Lookups

As I continue my security automation journey, I'm always looking for opportunities to solve practical problems with Python. One of the repetitive tasks performed by security analysts are reputation checks of IP addresses or URLs. This task is typically done by submitting a single entry at a time, and can be very time consuming if IPs or URLs need to analyzed in bulk.

To automate this process we'll be using Python 3 to programmatically interact with VirusTotal's API v3 to allow for the submission of entire lists of IP address or URLs to be analyzed quickly and at volume.

Our Python script will parse the information returned by VirusTotal, and create a custom HTML report that only includes high-level information to allow analysts to quickly determine which submissions are harmless and which will need further investigation. To streamline this effort our Python script will also generate a hypertext link to a full report on VirusTotal's web-based GUI for each entry allowing the analyst seamless access to additional information directly from the HTML report.

Let's jump in.

VirusTotal free public API Key

Before we can get started you'll need to get a free public API key from VirusTotal. In order to do this, you need to sign up for a free account on VirusTotal.com. Once your account is setup you can find your free public API key in the settings section of your account.

IMPORTANT ... If you're using Github and plan on saving your code to a public repository, make sure you do not include your API key directly in your code. In this tutorial I will be using a ".env" file to store our API key outside of our Python script. ALWAYS remember to make sure you add ".env" to your .gitignore file to keep it from being sent to github and exposing your API key in the repository. For more information on working with .env files, take a look at Drew Seewald's article "Using dotenv to Hide Sensitive Information in Python--Hide your passwords and API tokens to make your code more secure".

Development Environment

For this tutorial I'll be using VScode. However, you can use any IDE you'd like so long as you have Python 3 installed, as well as the dependencies listed in the requirements.txt file which I'll explain in the next few sections.

Python

Before proceeding you should check which version of Python is currently installed on your system, and install Python 3 if necessary.

Heads-up. The following 2 commands are version specific, and will only return information for Python 2 or Python 3 individually.

# this will return information regarding Python version 2 only
python --version

You'll need to run this command instead to see information for Python 3:

python3 --version

If you're having trouble confirming which version is installed or need to install Python 3 then go to python.org/downloads, and download the appropriate version for your system.

Download a copy of the Github project repository

You can download the repository from my Github account with a web browser here, or by using the following commands in a terminal session. Either option will work.

# confirm git is installed
git --version

If you don't have git installed, download and install the "Latest source Release" for Git.

If git is already installed, download the repository using the git clone command:

git clone https://github.com/b-fullam/Automating-VirusTotal-APIv3-for-IPs-and-URLs.git

Once the repository is downloaded

Once the repository is downloaded, pop it open and have a look inside.

The following output was created using the "tree" command which works on Linux, Mac and Windows. You should check the options available to you for the "tree" command specific to your system. The options listed below work on both Linux and Mac.

# "-L level" descends only to the "level" of directories deep
# "-L" option and "1" limit the output to 1 level deep
tree -L 1

You could use the "ls" command on Linux and Mac, or use "dir" on Windows to view the contents of the directory. To be more platform friendly I opted to use the "tree" command which can be used by all 3.

.
├── LICENSE
├── README.md
├── requirements.txt
├── target-ips.txt
├── target-urls.txt
└── vt-ip-url-analysis.py
0 directories, 6 files

There are 6 files in total:

  • The finished version of the Python script (file name ending in ".py")
  • License
  • README.md file
  • requirements.txt
  • target-ips.txt
  • target-urls.txt

The last two files, target-ips.txt and target-urls.txt, are sample lists for you to test the regex patterns included in the script to see first hand how they're used to normalize and validate input from a list.

Review the finished code BEFORE you use it

While you could download the finished code directly from my github repository and start using it, I highly recommend reviewing the entire python script first instead for the following reasons.

Security

A secure code review should always be the first thing you do after downloading code from an untrusted source, and definitely BEFORE you execute it in a production environment. Malicious code could have been included and buried somewhere inside of it.

Have a look under the hood

It's always a good idea to see how the code works prior to executing it. This way you'll have a better understanding of what's happening in the background, and possibly even identify areas that you could improve on it as well.

Python Virtual Environments

For this tutorial, we'll be using a "Python Virtual Environment". If you already have experience with this you can jump to the next section. If you've never used this feature before, it's the perfect opportunity to learn.

A "Python Virtual Environment", referred to as a "virtualenv", helps you keep your project environments separate by creating an isolated environment for each project.

Once a virtualenv has been created and activated for your project, any installed dependencies-modules will be limited to that specific project, and won't interfere with any of your other projects.

If you're not familiar working with virtualenv's in Python, then I highly recommend you review this article, "Python Virtual Environments: A Primer" for more detailed information.

Create a new directory for our project

Open up a terminal session, create a directory for this project, and copy the python script from the github repository using the following commands:

# create a new directory called python-vt-apiv3
mkdir python-vt-apiv3
# navigate to the downloaded github repository
cd Automating-VirusTotal-APIv3-for-IPs-and-URLs

Now let's copy the python script from the downloaded github repository into the project directory /python-vt-apiv3. In my case, the downloaded github repository is located in the same directory as /python-vt-apiv3 so the file path is relatively simple.

cp vt-ip-url-analysis.py ../python-vt-apiv3

Create a virtualenv for our project

In order to make our directory a virtual environment, we'll need to navigate one step outside of it first by using the following command.

# navigate one step outside of the project directory
cd ..

From here we can use the "venv" command to create our virtual environment using our project directory:

# Python 3 -- "python-vt-apiv3" is the name of our project directory
python3 -m venv python-vt-apiv3

Once that's complete, navigate to our /python-vt-apiv3 directory and list the contents.

You could use the "ls" command on Linux and Mac, or use "dir" on Windows to view the contents of the directory. To be more platform friendly I opted to use the "tree" command which can be used by all 3.

cd python-vt-apiv3
tree -L 1

Here's what you should see when you list the contents of the project directory:

.
├── bin
├── include
├── lib
├── pyvenv.cfg
└── vt-ip-url-analysis.py

Activate the virtual environment we just created

Navigate one level outside of our project directory by using the following command:

cd ..

Activate our project's virtual environment with the following command:

source python-vt-apiv3/bin/activate

You should see "(python-vt-apiv3)" to the left of the command prompt in your terminal session indicating the virtual environment for our project is active.

When you're done working with the project's virtual environment, all you need to do is type "deactivate" and the "(python-vt-apiv3)" will no longer appear to the left of the command prompt. To re-activate the virtual environment, you'll need to repeat the previous steps. It's dead simple to use.

Ok, navigate back into our project directory /python-vt-apiv3 using the following command:

cd python-vt-apiv3

Now we can install any dependencies associated with our project in an isolated environment.

Installing dependencies

Before we can start working with our Python script we'll need to install some dependencies into the virtual environment for our project. In the Github repository you downloaded, there's a file named "requirements.txt". This file lists all of the necessary dependencies for our script, and we can use this file to import all of them at once.

First we need to place a copy of the requirements.txt file into our /python-vt-apiv3 directory.

.
├── bin
├── include
├── lib
├── pyvenv.cfg
├── requirements.txt
└── vt-ip-url-analysis.py

Then we can use the following command to install all of the dependencies listed in the requirements.txt file:

# pip3 is used for Python 3 install commands. pip is for Python 2
pip3 install -r requirements.txt

It takes a little bit of time to install the dependencies, especially for numpy and pandas, but you can watch the progress of the install directly in the terminal.

If you do have any issues installing the dependencies, make sure you are using the most recent version of pip3, and then try the command again. In my first attempt, numpy failed to install using the requirements.txt file. An error appeared in the output stating that I should update pip (pip not pip3) to the most current version. I did some research on this, and it appears to be some issue with the numpy installation. After I updated pip using the command included in the error output and entered the "pip3 install -r requirements.txt" command, the install was quick and successful.

IMPORTANT ... Depending on your platform (hello Mac!) you may also need to put "sudo" at the beginning of the command to correct any permissions issues during the install using the "pip3 install -r requirements.txt" command.

After the dependencies are installed in our project environment, we're now ready to start working with our Python script.

Storing our API key in a .env file

To make our code more secure, we will be using an ".env" file to store our public API key from VirusTotal outside of our Python script.

To do this, we need to create a .env file in the SAME directory as the Python script, and add the following code. Make sure you insert your VirusTotal API key as indicated:

You can create the .env file any way you choose, but I'll be using the following commands in the terminal if you'd like to follow along. I'm using nano, but any text editor like vim will work.

# create a new file called ".env" and open it with nano
nano .env

Once it's open, add "API_KEY1=" followed by your public API key from VirusTotal. Don't use any any quotation marks around your API key when you do it or it won't work.

API_KEY1=insert your vt API key here

Save the changes and exit out of your text editor.

Our API_KEY1 variable is now stored outside of our Python script. Next we'll use the "dotenv" module to find the .env file and load the environmental variables from it so our script can use the API key to interact with VirusTotal's API v3.

ALWAYS remember to make sure you add ".env" to your .gitignore file to keep it from being sent to github and exposing your API key in the repository. For more information on working with .env files, take a look at Drew Seewald's article "Using dotenv to Hide Sensitive Information in Python--Hide your passwords and API tokens to make your code more secure".

The choice is yours. Test first, or soldier on.

Whenever I'm reading a tutorial on a coding blog, for me, "seeing is believing". At this point, for motivation's sake, you can certainly test the finished python script included in the Github repository before you invest any more time in the sections that follow reviewing each part of the script.

Running the script

The virtual environment should be good to go, and your .env file with your API key is all set. All you need to do is make sure you're inside the /python-vt-apiv3 project directory, and run the following command:

python3 vt-ip-url-analysis.py -h

If everything was installed correctly, the virtual environment is "active", and the .env file with your API key is in the project directory the script will run and you will be presented with the following output:

usage: vt-ip-url-analysis.py [-h] [-s SINGLE_ENTRY] [-i IP_LIST] [-u URL_LIST] [-V]
Python Automated VT API v3 IP address and URL analysis 2.0 by Brett Fullam
optional arguments:
-h, --help show this help message and exit
-s SINGLE_ENTRY, --single-entry SINGLE_ENTRY
ip or url for analysis
-i IP_LIST, --ip-list IP_LIST
bulk ip address analysis
-u URL_LIST, --url-list URL_LIST
bulk url analysis
-V, --version show program version

From here you can test individual entries for IP addresses or URLs, and also use the target-ips.txt and target-urls.txt list files included in the github repository to test the bulk analysis options for lists of IP addresses or URLs.

I recommend copying the target-ips.txt and target-urls.txt list files into your project folder first so you don't have to write out any lengthy file paths to their current location.

When an entry is submitted, and a successful response from VirusTotal's API v3 is received you will see the output directly in the terminal.

I would start small by selecting the "-s" option and using a single entry to confirm everything is working properly prior to attempting to use a list of IP addresses or URLs.

python3 vt-ip-url-analysis.py -s google.com

You should see the following output in your terminal:

google.com
community score 0/93 : security vendors flagged this as mali...
last_analysis_date Sat Jan 29 14:43:04 2022
last_analysis_stats {'harmless': 83, 'malicious': 0, 'suspicious':...
redirection_chain [http://google.com/]
reputation 2597
times_submitted 156841
virustotal report https://www.virustotal.com/gui/url/cf4b367e49b...

HTML Report

Upon successful completion, the script generates a time stamped HTML report, named "report.html", which is saved in the same directory that the Python script is located. To view the report, open the report.html file in any web browser.

The script also generates a hypertext link to VirusTotal's web-based GUI for each entry allowing the end user seamless access to additional information directly from the HTML report.

Please note ... the report.html report file will be overwritten each time the script is run. I recommend renaming or relocating the report prior to re-running the script.

I hope you'll find this Python script helpful.

Now, on to the detailed code review

For the rest of the blog post that follows I'll walk though sections of the code to help give you a clearer understanding of why I chose to create it the way I did and how it all works together. I won't be explaining the entire script line for line, but the comments in my code are extensive and explicit. If you want to take a deeper dive, work your way through the entire script and read the comments in my code.

For full disclosure, and while I'm sure there are cleaner and more concise ways to accomplish what I've created, I'm still learning that this is all part of the process of becoming a better programmer.

Not just another Python script

When I started working through the concept of what I wanted to build, I didn't want to just build a bunch of separate scripts. I was more interested in building something that felt more like a CLI tool than a script. Something that would present more than a single option for input, that could create a report that was visually pleasing, and still remain portable for sharing with less technologically experienced peers.

I had to create 4 major functions to accomplish all of this. One function to submit entries to the VirusTotal API v3 for analysis. Two functions for handling lists as input for IP addresses and URLs for bulk requests. Another for generating the HTML reports.

Argparse

In the previously published version of this script, I used a main() function to handle user input once the script was run. While this worked, it wasn't the best user experience since you had to manually enter file paths and names without the benefit of autocomplete that a CLI tool would provide.

To resolve this usability issue I decided to use argparse, which is a command-line parsing module in the Python standard library, to handle user input at the same time the script is executed in the CLI. Now users can specify if it's a single file or a specific type of list at the time of executing the script. The best part is that they can even use the CLI's autocomplete feature by simply using the "tab" key to spare them from typing the path to the file or directory manually.

Argparse allows us to include arguments to select either a single file or an entire directory of files to leverage the full benefit automation.

We can even provide a help option to help the user interact with the script. Like any other CLI tool, a user can add "-h" or "--help" to see a usage example for our script along with all the available arguments to choose from.

python3 vt-ip-url-analysis.py --help

Here's the output from the command above that uses the "--help" option:

usage: vt-ip-url-analysis.py [-h] [-s SINGLE_ENTRY] [-i IP_LIST] [-u URL_LIST] [-V]
Python Automated VT API v3 IP address and URL analysis 2.0 by Brett Fullam
optional arguments:
-h, --help show this help message and exit
-s SINGLE_ENTRY, --single-entry SINGLE_ENTRY
ip or url for analysis
-i IP_LIST, --ip-list IP_LIST
bulk ip address analysis
-u URL_LIST, --url-list URL_LIST
bulk url analysis
-V, --version show program version

What's even better, is that it also performs some basic error handling as well. When the user tries to run the script without any options or inputs an invalid option, the following output is presented to help the user:

# this is the output when no options are included in the command
# python3 vt-ip-url-analysis.py
usage: vt-ip-url-analysis.py [-h] [-s SINGLE_ENTRY] [-i IP_LIST] [-u URL_LIST] [-V]
# this is the output when an invalid option is included in the command
# python3 vt-ip-url-analysis.py -g
usage: vt-ip-url-analysis.py [-h] [-s SINGLE_ENTRY] [-i IP_LIST] [-u URL_LIST] [-V]
vt-ip-url-analysis.py: error: unrecognized arguments: -g

All that we have left to do is read the arguments from the command line, and check them against a conditional statement to determine which argument was selected and how to proceed.

args = parser.parse_args()
# Check for --single-entry or -s
if args.single_entry:
urlReport(args.single_entry)
print(dataframe)
outputHTML()
# Check for --ip-list or -i
elif args.ip_list:
urlReportIPLst(args.ip_list)
outputHTML()
# Check for --url-list or -u
elif args.url_list:
urlReportLst(args.url_list)
outputHTML()
# Check for --version or -V
elif args.version:
print("VT API v3 IP address and URL analysis 2.0")
# Print usage information if no arguments are provided
else:
print("usage: vt-ip-url-analysis.py [-h] [-s SINGLE_ENTRY] [-i IP_LIST] [-u URL_LIST] [-V]")

If the user has to input a single IP address or URL for analysis, it's then sent to the urlReport() function where it is converted to a "url identifier" and sent along to VirusTotal's API v3 via a GET request.

If the user chooses to use a list of IP addresses or URLs for analysis, it's sent to either urlReportIPLst() or urlReportLst() where the file is read, and its contents are normalized and/or validated and stored in an array. Once this process is complete, each entry is sent to urlReport() where it will be converted to a "url identifier" and sent along to VirusTotal's API v3 via a GET request.

The urlReportIPLst() and urlReportLst() functions pass their input through the urlReport() function for each entry. So, in all 3 types of user input available, the information is ultimately returned by the urlReport() function is then passed long to the outputHTML() function to generate a report.

urlReport() Function

The urlReport() function is the largest and most complex of all five functions. It's purpose is to convert entries into what VirusTotal calls a "URL identifier", which is the base64 encoded equivalent of each entry without the "==" that appears at the end of base64 code.

Once converted, the URL identifier is added to the end of the VirusTotal API v3 url and included in the GET request submitted for analysis along with appropriate headers that include your public API key.

Once a response is received, the json data from VirusTotal is stored in a python dictionary called decodedResponse. From there, I stripped out selected key values from the data to only include the values I wanted in my report. I grabbed the timestamp from the host system as well, and converted it from epoch to human readable date and time format.

I also wanted to use some of the returned data to create custom values in my report such as "community score", convert "last_analysis_date" from epoch to human readable format, and create a custom link to a full report in VT's web-based GUI.

Recreating the Community Score

VirusTotal's web-based GUI reports include a "community score" to quickly determine how many community members have flagged an entry as "malicious". It appears from left to right as the "number of members flagged as malicious" and "the total number of members who have analyzed the entry". For example, Google.com has a community score of "0 / 93" on VirusTotal's web-based GUI. "0" is the number of members who flagged the entry as malicious, and "93" is the total number of members who have analyzed Google.com.

I also felt like this was a valuable metric, and wanted to include this in my HTML tables. Unfortunately, the json returned by VirusTotal doesn't explicitly include a data set for that, but I could re-create it using the information included in "last_analysis_stats" key value to create the same community score information.

last_analysis_stats {'harmless': 84, 'malicious': 0, 'suspicious': 0, 'undetected': 9, 'timeout': 0}

Grabbing the first number was easy since "malicious" is one of the items explicitly listed in last_analysis_stats.

# grab "malicious" key data from last_analysis_stats to create the first part of the community_score_info
community_score = (decodedResponse["data"]["attributes"]["last_analysis_stats"]["malicious"])

The second number was a little more complicated. I needed the sum of all of the values included in last_analysis_stats.

# grab the sum of last_analysis_stats to create the total number of security vendors that reviewed the URL for the second half of the community_score_info
total_vt_reviewers = (decodedResponse["data"]["attributes"]["last_analysis_stats"]["harmless"])+(decodedResponse["data"]["attributes"]["last_analysis_stats"]["malicious"])+(decodedResponse["data"]["attributes"]["last_analysis_stats"]["suspicious"])+(decodedResponse["data"]["attributes"]["last_analysis_stats"]["undetected"])+(decodedResponse["data"]["attributes"]["last_analysis_stats"]["timeout"])

Once I had my community score numbers ready, I still had to output it into the HTML table and in a format that would read well.

# create a custom community score using community_score and the total_vt_reviewers values
community_score_info = str(community_score)+ ("/") + str(total_vt_reviewers) + (" : security vendors flagged this as malicious")

After that, all I had to do was add a new row to the dataframe by amending it with the following command.

# amend dataframe with extra community score row
dataframe.loc['community score',:] = community_score_info

Converting epoch to human readable format

One of the values returned by VirusTotal's API v3 that I wanted to include in my HTML report table data was the "last_anaylsis_date". However, the format returned by VirusTotal is epoch rather than human readable and needed to be converted.

It was easy enough to grab the epoch timestamp from the returned json data for last_analysis_date.

# grab "last_analysis_date" key data to convert epoch timestamp to human readable date time formatted
epoch_time = (decodedResponse["data"]["attributes"]["last_analysis_date"])

Once that value was stored in a new variable called "epoch_time" I could convert it to human readable date time.

# convert epoch time to human readable date time and store in the time_formatted variable
# the original key last_analysis_date from the returned VirusTotal json will be removed and replaced with an updated last_analysis_date value that's now human readable
time_formatted = time.strftime('%c', time.localtime(epoch_time))

Once it was converted and stored in the new time_formatted variable, I had to exclude the last_analysis_date key from the dataframe, and then add a new row titled "last_analysis_date" back into the dataframe with our converted timestamp from time_formatted.

# amend dataframe with the updated last_analysis_date value stored in time_formatted that was converted from epoch to human readable
dataframe.loc['last_analysis_date',:] = time_formatted

Create a custom link to VirusTotal's web-based GUI

When I set out to create this script I wanted to make it something useful in a real-world application. That's when I realized that if an entry needed further investigation the analyst would have to use VirusTotal's web-based GUI to get more information. Another step an analyst had to perform in order to complete this task and definitively determine if an entry is harmless or malicious. It was also another excellent opportunity for automation, and I was determined to add it to the scope of my Python script.

Luckily, when I was working through creating Python scripts to interact with all of VirusTotal's "Universal API Endpoints" (which are "public" and free to use) I discovered a similarity between "URL identifiers" and part of the URL that appears after a report is generated in the web-based GUI. In fact, they are exactly the SAME.

Check it out.

Here's a "URL identifier" generated by my script using a "canonized URL" (http://google.com/) encoded in SHA-256:

cf4b367e49bf0b22041c6f065f4aa19f3cfe39c8d5abc0617343d1a66c6a26f5

Here's the URL grabbed from a report generated using the SAME URL (but this time only entering "google.com") in the web-based GUI:

https://www.virustotal.com/gui/url/cf4b367e49bf0b22041c6f065f4aa19f3cfe39c8d5abc0617343d1a66c6a26f5

Notice how the values are EXACTLY the SAME. Actually, in the web-based GUI, you could enter either "google.com" or "http://google.com/" and the SHA-256 value will still be exactly the same.

This isn't limited to URLs either, it even works with IP addresses so long as they are formatted as a "canonized URL" first. So, I could use this to create links to full reports on VirusTotal's web-based GUI that can now be added to my HTML report table data for EACH entry. Here's how I added it to my code:

# create sha256 encoded vt "id" of each url or ip address to generate a hypertext link to a VirusTotal report in each table
# create a string value of the complete url to be encoded
UrlId_unEncrypted = ("http://" + target_url + "/")
# begin function for encrypting our hyperlink string to sha256
def encrypt_string(hash_string):
sha_signature = \
hashlib.sha256(hash_string.encode()).hexdigest()
return sha_signature
# store the hyperlink string to be hashed in the variable hash_string
hash_string = UrlId_unEncrypted
# encrypt and store our sha256 hashed hypertext string as
sha_signature = encrypt_string(hash_string)
# create the hypertext link to the VirusTotal.com report
vt_urlReportLink = ("https://www.virustotal.com/gui/url/" + sha_signature)

After the link is generated and stored in the vt_urlReportLink variable, all I had to do was add a new row titled "virustotal report" back into the dataframe with our custom hypertext link to the full report on VirusTotal's web-based GUI.

BOOM. Just like that, the HTML report now provides analysts with the ability to perform an initial high-level analysis of the entries submitted, and click a hypertext link included for each entry to view a full report on VirusTotal.com as needed to perform a more detailed analysis.

Converting the finished dataframe to HTML

Once the dataframe is complete, I reordered it by sorting the dataframe rows alphabetically to place more emphasis on the community score, last_analysis_date, and last_analysis_stats.

From here the dataframe is ready to be converted to an html table which is stored in the html variable.

It can either be passed directly to the outputHTML() function if it's a single entry, or back to the urlReportLst() and urlReportIPLst() functions first to be stored in an array prior to being sent to the outputHTML() function.

# this is the function that will take user input or input from a list to submit urls to VirusTotal for url reports, receive and format the returned json for generating our html reports
def urlReport(arg):
# user input, ip or url, to be submitted for a url analysis stored in the target_url variable
target_url = arg
# For a url analysis report virustotal requires the "URL identifier" or base64 representation of URL to scan (w/o padding)
# create virustotal "url identifier" from user input stored in target_url
# Encode the user submitted url to base64 and strip the "==" from the end
url_id = base64.urlsafe_b64encode(target_url.encode()).decode().strip("=")
# print(url_id)
# amend the virustotal apiv3 url to include the unique generated url_id
url = "https://www.virustotal.com/api/v3/urls/" + url_id
# while you can enter your API key directly for the "x-apikey" it's not recommended as a "best practice" and should be stored-accessed separately in a .env file (see comment under "load_dotenv()"" for more information
headers = {
"Accept": "application/json",
"x-apikey": API_KEY
}
response = requests.request("GET", url, headers=headers)
# load returned json from virustotal into a python dictionary called decodedResponse
decodedResponse = json.loads(response.text)
# grab the epoch timestamp at run time and convert to human-readable for the html report header information
timeStamp = time.time()
# set report_time to a global value to share the stored value with other functions
global report_time
# convert epoch timestamp to human-readable date time formatted
report_time = time.strftime('%c', time.localtime(timeStamp))
# set dataframe to a global value to share the stored value with other functions
global dataframe
# grab "last_analysis_date" key data to convert epoch timestamp to human readable date time formatted
epoch_time = (decodedResponse["data"]["attributes"]["last_analysis_date"])
# convert epoch time to human readable date time and store in the time_formatted variable
# the original key last_analysis_date from the returned virustotal json will be removed and replaced with an updated last_analysis_date value that's now human readable
time_formatted = time.strftime('%c', time.localtime(epoch_time))
# create sha256 encoded vt "id" of each url or ip address to generate a hypertext link to a virustotal report in each table
# create a string value of the complete url to be encoded
UrlId_unEncrypted = ("http://" + target_url + "/")
# begin function for encrypting our hyperlink string to sha256
def encrypt_string(hash_string):
sha_signature = \
hashlib.sha256(hash_string.encode()).hexdigest()
return sha_signature
# store the hyperlink string to be hashed in the variable hash_string
hash_string = UrlId_unEncrypted
# encrypt and store our sha256 hashed hypertext string as
sha_signature = encrypt_string(hash_string)
# create the hypertext link to the virustotal.com report
vt_urlReportLink = ("https://www.virustotal.com/gui/url/" + sha_signature)
# strip the "data" and "attribute" keys from the decodedResponse dictionary and only include the keys listed within "attributes" to create a more concise list stored in a new dictionary called a_json
filteredResponse = (decodedResponse["data"]["attributes"])
# create an array of keys to be removed from attributes to focus on specific content for quicker/higher-level analysis
keys_to_remove = [
"last_http_response_content_sha256",
"last_http_response_code",
"last_analysis_results",
"last_final_url",
"last_http_response_content_length",
"url",
"last_analysis_date",
"tags",
"last_submission_date",
"threat_names",
"last_http_response_headers",
"categories",
"last_modification_date",
"title",
"outgoing_links",
"first_submission_date",
"total_votes",
"type",
"id",
"links",
"trackers",
"last_http_response_cookies",
"html_meta"
]
# iterate through the filteredResponse dictionary using the keys_to_remove array and pop to remove additional keys listed in the array
for key in keys_to_remove:
filteredResponse.pop(key, None)
# create a dataframe with the remaining keys stored in the filteredResponse dictionary
# orient="index" is necessary in order to list the index of attribute keys as rows and not as columns
dataframe = pd.DataFrame.from_dict(filteredResponse, orient="index")
# rename the column header to the submitted url
dataframe.columns = [target_url]
# grab "malicious" key data from last_analysis_stats to create the first part of the community_score_info
community_score = (decodedResponse["data"]["attributes"]["last_analysis_stats"]["malicious"])
# grab the sum of last_analysis_stats to create the total number of security vendors that reviewed the URL for the second half of the community_score_info
total_vt_reviewers = (decodedResponse["data"]["attributes"]["last_analysis_stats"]["harmless"])+(decodedResponse["data"]["attributes"]["last_analysis_stats"]["malicious"])+(decodedResponse["data"]["attributes"]["last_analysis_stats"]["suspicious"])+(decodedResponse["data"]["attributes"]["last_analysis_stats"]["undetected"])+(decodedResponse["data"]["attributes"]["last_analysis_stats"]["timeout"])
# create a custom community score using community_score and the total_vt_reviewers values
community_score_info = str(community_score)+ ("/") + str(total_vt_reviewers) + (" : security vendors flagged this as malicious")
# amend dataframe with extra community score row
dataframe.loc['virustotal report',:] = vt_urlReportLink
# amend dataframe with extra community score row
dataframe.loc['community score',:] = community_score_info
# amend dataframe with the updated last_analysis_date value stored in time_formatted that was converted from epoch to human readable
dataframe.loc['last_analysis_date',:] = time_formatted
# sort dataframe index in alphabetical order to put the community score at the top
dataframe.sort_index(inplace = True)
# set html to a global value to share the stored value with other functions
global html
# dataframe is output as an html table, and stored in the html variable
html = dataframe.to_html(render_links=True, escape=False)

outputHTML() function

The output function handles the HTML report generation. It takes either a single html table or an array of html tables, and writes them to a file called "report.html" that uses inline CSS for styling to make it more portable.

To accomplish this, I used several variables to generate each section of the HTML report from top to bottom, and used the following to create the file:

text_file = open("report.html", "w")

Then I use the following to open and amend the file for each section that followed:

text_file = open("report.html", "a") # append mode

HTML header

The header variable contains all the HTML document information from the initial "!DOCTYPE html" tag all the way through the opening "body" tag, including the H1 report title, and H2 VirusTotal API v3 information at the top of the report.

Report timestamp

The next variable added to the report is the report_timestamp which is nested inside of an H3 tag. This value was grabbed from the local system back in the urlReport() function, converted from epoch timestamp to human-readable date time formatted, and then shared with the outputHTML() function via a global variable. This timestamp value is from the local system, and captured each time the script is executed.

HTML table data

After the inline CSS, document title, and timestamp are added to the index.html document we can now start adding our html table data created by urlReport().

I had to account for either a single table from urlReport(), or an array of table data from the urlReportLst() and urlReportIPLst() functions. To do this, I created a loop to iterate over the array of table data stored in the "html" variable from the either the urlReport(), the urlReportLst(), or urlReportIPLst() functions.

HTML footer

After all the html table data is written to the index.html file, the only thing left to do is to add closing tags for both the "body" and "html" tags and close the file.

Inline CSS styles

I wanted to make sure that the report had CSS styles to make it more pleasing visually and easier to read. There were several different ways to accomplish this, and also different opportunities to add them as the HTML tables are created.

I experimented with dynamically creating and adding the CSS styles to the document with an additional step between the html table data creation and sending to the outputHTML() function, but it was really cumbersome and overcomplicated. Having worked as a web developer, an external linked CSS styles document was my next choice, but that meant it would have to be sent along with the index.html report.

For the sake of simplicity, and ultimately more control over the styling, I opted to add inline styles directly into the index.html report. This made it possible to apply styles to broad sections of the index.html, and keep it all contained in a single document. Having everything packed into a single document makes the report more self-contained and simpler to distribute if needed.

# ////////////////////////////////// START OUTPUT TO HTML
# this function will take either a single html table or an array of html tables, and write them to a CSS styled html file called "report.html"
def outputHTML():
# save html with css styled boilerplated code up to the first <body> tag to a variable named "header"
header = """<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Automated VirusTotal Analysis Report | API v3</title>
<style>
body {
font-family: Sans-Serif;
color: #1d262e;
}
h1 {
font-size: 1.25em;
margin: 35px 0 0 30px;
}
h2 {
font-size: .75em;
font-weight:normal;
margin: 5px 0 15px 30px;
color: #7d888b;
}
h3 {
font-size: 1em;
font-weight:normal;
margin: 0 0 20px 30px;
color: #7d888b;
}
table {
text-align: left;
width: 90%;
border-collapse: collapse;
border: none;
padding: 0;
margin-left: 20px;
margin-bottom: 40px;
max-width: 780px;
}
th {
text-align: left;
border:none;
padding: 10px 0 5px 10px;
margin-left: 10px;
}
tr {
text-align: left;
border-bottom: 1px solid #ddd;
border-top: none;
border-left: none;
border-right: none;
padding-left: 10px;
margin-left: 0;
}
td {
border-bottom: none;
border-top: none;
border-left: none;
border-right: none;
padding-left: 10px;
}
tr th {
padding: 10px 10px 5px 10px;
}
</style>
</head>
<body>
<h1 class="reportHeader">Automated VirusTotal Analysis Report</h1>
<h2>VirusTotal API v3</h2>
"""
# add report timestamp
report_timestamp = str("<h3>" + report_time + "</h3>")
# save html closing </ body> and </ html> tags to a variable named "footer"
footer = """
</body>
</html>
"""
# create and open the new report.html file
text_file = open("report.html", "w")
text_file.write(header)
text_file.close()
# open and append report.html with the human-readable date time stored in the report_timestamp variable
text_file = open("report.html", "a") # append mode
text_file.write(report_timestamp)
text_file.close()
# open and append report.html with a single html table from urlReport(), or as an array of html tables returned by urlReportLst or urlReportIPLst
text_file = open("report.html", "a") # append mode
# iterate through the html array and write all the html tables to report.html
for x in html:
text_file.write(x)
text_file.close()
# open and append report.html with the closing tags stored in the footer variable
text_file = open("report.html", "a") # append mode
text_file.write(footer)
text_file.close()

For a complete version of this Python script, you can visit my Github account and download it directly from the project repository.

© 2023 by Brett Fullam. All rights reserved.