UPDATE: @kylemaxwell has accepted the pull of this script into the main maltrieve repo!
*Note: For starters, we need to say thanks as usual to technoskald and point you in the right direction to the Maltrieve Code on GitHub.
Overview
We have posted Maltrieve articles a couple times in the past, but the capabilities of this application continue to amaze us so we thought we'd add to our past contributions. During our initial build of a malware collection box (malware zoo creation) we utilized a standard concept of running Maltrieve throughout the day using a cron job. As most simple things do, this became rather complex based on the fact that the Maltrieve delivery is not categorized in any method, so finding what you're looking for is.....shall we say.....difficult at best. This article discusses a categorization method to help you organize your malware zoo so that it is manageable.
If you would prefer this article in video format, it is provided as well:
Getting started
The box containing the malware repository is a standard Precise Pangolin Ubuntu Distro (12.04 LTS), so no big tricks or hooks here. Maltrieve is installed in a standard format, but a 1TB drive is being utilized to store the malware retrieved. The box has 3TB worth of space for later use, but for now we'll deal with just the 1TB drive. The malware repository is mounted at /media/malware/maltrievepulls. All scripts utilized (to include the Maltrieve python scripts) are located at /opt/maltrieve. Again, nothing flashy in any of this, so it should be easy for you to get your box setup quick if you'd like.
Running Maltrieve Consistently
To begin the build of the malware repository, we wanted to run the maltrieve scripts hourly so that the directory would fill with new and interesting malware consistently and quickly. This screamed “crontab”, so we fired up a terminal and ran sudo crontab -l and then sudo crontab -e so that we could edit the crontab. Our initial entry was as follows:
@hourly echo "maltrieve run at: $(date) $(time)" >> /home/username/Documents/maltrievelog.log
This simply tells the system to run the maltrieve.py python script on an hourly basis and send the results to the /media/malware/maltrievepulls directory for safe storage. The second entry basically adds a little stamp in a file in my home directory so I can ensure the cron job is running every hour – you can obviously NOT include this statement if you don't see fit. In any case, we quickly noticed that the Maltrieve app was doing its job and we went about our business allowing the box to do what we asked. We quickly were swimming in malware and were ready to start analyzing to our hearts delight when we ran into the problem!
The Problem
Maltrieve does exactly what it's told and it does it well – find malware from specific sites and put it in a directory of your liking. And it finds LOTS OF MALWARE if you keep running it as we did in hopes of having a massive store. However, the files are given a hashed name that has very little use to the human eye, and they are just plopped merrily into the directory you choose when you run the malware.py python script. It became quite tedious to run the file command on files that just “looked” interesting based on a hashed filename that gave little meaning to what it might be in terms of formatting, or even payload. A quick look could allow you to do some judging by filesize, but basic command line sorting, grepping, awking, and loads of other tools were needed to try and fix the problem. These methods were simply tedious and after we began to have hundreds of GBs of malware, it became downright no fun any more. The picture below will show you a glimpse of the problem.
Hardly the beacon of light for finding what you're looking for from your malware repository.
Running the file command on a few of these things starts showing some potential though because what you get from doing this looks like:
file 818fc882dab3e682d83aabf3cb8b453b
818fc882dab3e682d83aabf3cb8b453b: PE32 executable (GUI) Intel 80386, for MS Windows
file fd8fd6d345cb630d7f1b6926ce7d28b3
fd8fd6d345cb630d7f1b6926ce7d28b3: Zip archive data, at least v1.0 to extract
So here we find that we have 2 pieces of malware, one is a Portable Executable for a Windows box and the other is a Zip archive. This is a very nice start, but was just 2 needles in a large and growing haystack, and the manual effort was laborious and downright daunting.
Bash to the Rescue
As coders love to do, our answer was to take the awesome product Maltrieve and throw some more code at it. My initial thought was to extend the python script, but since I pulled this from a GitHub repository I didn't want to modify the code and then have to “re-modify” it later if things were ever changed or upgraded. My answer was to create a small Bash Shell script and run it to help categorize our malware repository. The requirements we set upon ourselves were to categorize the code into multiple directories based on the first word output from the file command and then further categorize that by separating the code by size. We decided that 0-50KB files would be considered “small”, 51KB-1MB would be considered “medium”, 1.xMB-6MB would be considered “large”, and anything larger would be considered “xlarge”. It's a rather brutish method but it's something and it seems to work nicely. So in the end, we would want to see a directory tree that looked something like this:
--PE32
----small
----medium
----large
----xlarge
--Zip
----small
----medium
----large
----xlarge
and so on and so on.
Since we set up our maltrieve pulls to run hourly we decided to run the bash script - which we so obviously named maltrievecategorizer.sh – to run on every half hour, which allows maltrieve to finish and then categorizes the latest findings. To make this happen, we cracked open crontab again with sudo crontab -e and added the following to the end of the file:
which just says to run our bash script on the half hour of every day of the year, plain and simple.
The Bash Script
The maltrievecategorizer.sh bash script can be seen below. An explanation follows the script.
#!/bin/sh
smallstr="/small"
mediumstr="/medium"
largestr="/large"
xlargestr="/xlarge"
smallfile=50001
mediumfile=1000001
largefile=6000001
root_dir="/media/malware/maltrievepulls/"
all_files="$root_dir*"
for file in $all_files
do
if [ -f $file ]; then
outstring=($(eval file $file))
stringsubone="${outstring[1]}"
case $stringsubone in
"a") stringsubone="PerlScript";;
"very") stringsubone="VeryShortFile";;
"empty") rm $file
continue;;
*);;
esac
if [ ! -d $root_dir$stringsubone ]; then
mkdir -p "$root_dir$stringsubone"
mkdir -p "$root_dir$stringsubone$smallstr"
mkdir -p "$root_dir$stringsubone$mediumstr"
mkdir -p "$root_dir$stringsubone$largestr"
mkdir -p "$root_dir$stringsubone$xlargestr"
fi
filesize=$(stat -c %s $file)
if [[ "$filesize" -le "$smallfile" ]]; then
mv $file "$root_dir$stringsubone$smallstr/"
elif [[ "$filesize" -le "$mediumfile" ]]; then
mv $file "$root_dir$stringsubone$mediumstr/"
elif [[ "$filesize" -le "$largefile" ]]; then
mv $file "$root_dir$stringsubone$largestr/"
else
mv $file "$root_dir$stringsubone$xlargestr/"
fi
fi
done
The first several lines simply create string literals for “small”, “medium”, “large”, and “xlarge” so we can use them later in the script, and then we create three variables “smallfile”, ”mediumfile”, and ”largefile” so we can compare file sizes later in the script. So far so good! The lines containing:
root_dir="/media/malware/maltrievepulls/"
all_files="$root_dir*"
for file in $all_files
do
if [ -f $file ]; then
do nothing more than set our root directory where our maltrieve root is and then run a loop against every file in that directory.
outstring=($(eval file $file))
Creates a variable called outstring that is an array of words representing the output of the file command. So using the file command output from above, the outstring array would have 818fc882dab3e682d83aabf3cb8b453b: PE32 executable (GUI) Intel 80386, for MS Windows in it. Each array element would be separated by the space in the statement, so outstring[0] would store: 818fc882dab3e682d83aabf3cb8b453b: and outstring[1] would store: PE32 and outstring[2] would store: executable and so on and so on. We are only interested in outstring[1] to make our categorization a possibility.
Our next line in the script
stringsubone="${outstring[1]}"
creates a variable named stringsubone that contains just the string held in outstring[1] so using the example above, stringsubone would now hold PE32.
The case statement you see next
case $stringsubone in
"a") stringsubone="PerlScript";;
"very") stringsubone="VeryShortFile";;
"empty") rm $file
continue;;
*);;
esac
fixes a couple problems with the file command's output. In the case of a piece of malware that is a Perl Script, the output that the file command provides is: a /usr/bin/perl\015 script. This may be helpful for a human, but it makes our stringsubone variable hold the letter “a” in it, which means we would be creating a directory later for categorization called “a” which is LESS THAN USEFUL. The same problem happens with something called Short Files where the output from the file command is: very short file (no magic) which means our stringsubone variable would hold the word “very” which isn't a great name for a directory either. The case statement takes care of these 2 and allows for a better naming method for these directories. It also allows for the removal of empty files which are found as well.
The next lines
if [ ! -d $root_dir$stringsubone ]; then
mkdir -p "$root_dir$stringsubone"
mkdir -p "$root_dir$stringsubone$smallstr"
mkdir -p "$root_dir$stringsubone$mediumstr"
mkdir -p "$root_dir$stringsubone$largestr"
mkdir -p "$root_dir$stringsubone$xlargestr"
fi
simply tell the script to look in the directory and if a directory that has the same name as stringsubone does not exist then create it. Then create the directory small, medium, large, and xlarge within that directory for further categorization. Using the PE32 example from above, basically this says “if there's no PE32 directory in this root directory, create one and create the sub-directories small, medium, large, and xlarge within that directory. If the PE32 directory already exists then do nothing”.
The remaining lines look difficult but are simple:
filesize=$(stat -c %s $file)
if [[ "$filesize" -le "$smallfile" ]]; then
mv $file "$root_dir$stringsubone$smallstr/"
elif [[ "$filesize" -le "$mediumfile" ]]; then
mv $file "$root_dir$stringsubone$mediumstr/"
elif [[ "$filesize" -le "$largefile" ]]; then
mv $file "$root_dir$stringsubone$largestr/"
else
mv $file "$root_dir$stringsubone$xlargestr/"
fi
fi
first we create a variable called filesize and then using the stat command, we store the file size in that variable. Then we find out if the file fits in our category of small, medium, large, or xlarge using if and elif comparison statements. Whichever comparison statement turns out to be correct is where the file is then successfully moved.
The results of this solution are in the picture below.
Conclusion
As you can plainly see, we now have the ability to quickly look for specific files in an easier fashion. If I am looking for a piece of malware that I know to be in HTML format that was over 50KB, but less than 1MB, I can easily roam to HTML->medium and a one-liner file command with some grepping and find what I am looking for. I'm certain there are other methods to go about this process and probably WAY better methods of categorizing this directory, so if you have some ideas please shoot them our way and we'll give them a try and see if we can help the community.
In this episode of TekTip we take a look at performing basic static analysis with MASTIFF. While that is the focus of this episode I wanted to delve into Maltrieve first.
Maltrieve is a fork of MWCrawler which you guys and gals may remember from a previous TekTip video. Maltrieve was created by Kyle Maxwell@KyleMaxwell. While it has the same basic function of MWCrawler which is downloading malware from various web resources, it works much faster and has more reliable web resources it pulls from. @KyleMaxwell is working to add thug integration as well.
Once downloaded you run maltrieve without any options, as seen below:
This will download the malware to a default directory of /tmp/malware
*Make sure this directory exists or change the path in the python script to match what you want
So, with Maltrieve done and a bunch of samples downloaded it is time to see the power of MASTIFF.
MASTIFF is an automated framework for static analysis created by Tyler Hudak @SecShoggath and was funded by the Cyber Fast Track DARPA program. Too bad Cyber Fast track is going away, there are so many awesome projects coming out of it right now.
What MASTIFF will do is it will analyze a file to determine the file type (pdf, zip, PE32) and based on that file type it will run the appropriate static analysis tools against the sample. The output for tools it runs are organized and packaged up with some key information also making its way to a sqllite database.
Some of the benefits of this framework are:
Easily Extensible: Built very modular so adding to the functionality is easy
Consistent: When you have a team of analyst working on malware it is important that everyone speak the same language. MASTIFF gives a consistent standard approach to static analysis.
Quick: Manual static analysis can take a long time. With MASTIFF I can run through hundreds of samples in minutes.
Documented: As a consequence of being a DARPA funded program the creator was forced to ensure that the framework was documented well. The documentation goes beyond the normal installation and usage covering workflow and methodology.
I do not cover installation in the video as it would take to long and be very boring but I will mention that installation is relatively easy. The only real pain is ensuring you have all of the third party tools installed. The pdf inside the archive for MASTIFF has great documentation to get you up and running. Once the dependencies and MASTIFF are installed though ensure that you modify the mastiff.conf file to reference the appropriate paths for where you installed the third party tools.
Now that the config is good and MASTIFF is installed you are ready to start analyzing malware. running mas.py will show you usage.
[2013-02-23 21:47:41,645] [ERROR] [Mastiff.Plugins.yara.get_sigs] : /opt/yara-1.6/yara is not a directory or does not exist.
[2013-02-23 21:47:41,645] [INFO] [Mastiff.Analysis] : Finished analysis for /tmp/malware/86658467c74b39210de96111ee6f66d5.
Navigate to the directory you have set as the work log in the mastiff.conf to see the results
tekmalinux@TekMALinux:/work/log/86658467c74b39210de96111ee6f66d5$ ls -l
total 424
-rw-r--r-- 1 root root 267312 Feb 23 21:47 86658467c74b39210de96111ee6f66d5.VIR
-rw-r--r-- 1 root root 137 Feb 23 21:47 fuzzy.txt
-rw-r--r-- 1 root root 3440 Feb 23 21:47 mastiff.log
-rw-r--r-- 1 root root 1024 Feb 23 21:47 mastiff-run.config
-rw-r--r-- 1 root root 42100 Feb 23 21:47 peinfo-full.txt
-rw-r--r-- 1 root root 13317 Feb 23 21:47 peinfo-quick.txt
drwxr-xr-x 2 root root 4096 Feb 23 21:47 resources
-rw-r--r-- 1 root root 1332 Feb 23 21:47 resources.txt
-rw-r--r-- 1 root root 7704 Feb 23 21:47 sig.der
-rw-r--r-- 1 root root 27152 Feb 23 21:47 sig.txt
-rw-r--r-- 1 root root 42606 Feb 23 21:47 strings.txt
Nice, it looks like we pulled certificate info based on the sig.txt being there. To give you an example of the type of data you get, here is a cat of the peinfo-quick.txt:
MASTIFF does not currently have a native method to scan multiple files at once. While that is on the horizon for the project that is not a problem for us as we can just script out a quick program to do this. Of course you can always use mine.
#!/usr/bin/python
importos
# MASTIFF Autorun
# @TekDefense
# www.TekDefense.com
# Quick script to autorun samples from maltrieve to MASTIFF
malwarePath='/tmp/malware/'
forr,d,finos.walk(malwarePath):
forfilesinf:
malware=malwarePath+files
printmalware
os.system('mas.py'+' '+malware)
Simply change the directory in the script to point to where you have the samples and run the python program. Also be sure to keep this script in the same directory as mas.py.
[2013-02-23 22:00:56,698] [ERROR] [Mastiff.Plugins.yara.get_sigs] : /opt/yara-1.6/yara is not a directory or does not exist.
[2013-02-23 22:00:56,698] [INFO] [Mastiff.Analysis] : Finished analysis for /tmp/malware/ba91f309a81c1f6f1d7dcc5cb5094328.
/tmp/malware/a544ffb08f6177f6382df6101f78bfdc
Now that you have performed analysis against a bunch of samples you can analyze the results, or open up the sqllite database to pull some statistics.
As you can probably tell by now, I am really enjoying MASTIFF, in fact I am looking for any excuse to run it daily. Last week I was given a perfect event to apply MASTIFF too and that was Mandiant's report on APT1. VirusShare@VXShare was able to quickly compile a bunch of samples which a lot of folks started playing around with. I decided to run 20 or so of the samples through MASTIFF. If you would like to download those results you can get them in the download section.
I mentioned in the video that I was getting an error when running MASTIFF. I am not sure what is generating the error exactly quite yet, as I have checked that all the appropriate imports are in place. Once I figure it out I'll let you guys know what is going on. The error is below:
[2013-02-23 21:47:40,904] [ERROR] [yapsy] : Unable to import plugin: /opt/mastiff/mastiff-0.5.0/plugins/EXE/EXE-singlestring
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/Yapsy-1.10.1_pythons2n3-py2.7.egg/yapsy/PluginManager.py", line 486, in loadPlugins