2003 time capsule: clusterable, fault-injecting virtual machines

In 2003, I spec'd out a full product design and implementation plan for a fault-injecting virtual machine to automatically find exploitable bugs in a highly scalable fashion with little to no false positives. Here are my thoughts 4 years later now that similar ideas are being made public.

When I first had the idea of a static code analysis product for binaries, I had planned an entire roadmap. That roadmap was, sadly, not fully realised. A single product, even with 5-year roadmap, does not a company make. Coming up with other related product ideas that were worthy of the trouble of starting a company was not trivial.

I knew that I wanted to start a company that would not be religious about black box testing (fuzzing) versus white box testing (runtime analysis and static code analysis). Even in 2007, the biggest problem I still see in the security testing market is that companies who invest in a narrowly-focused product portfolio feel the need to protect that product portfolio by spreading FUD about other approaches. It is usually good advice to spread out investments rather than isolating the potential for doing so, so it's just bad business. In our class, Luis and I talk about how all of these technologies can (and should) be used to together to find exploitable bugs more quickly. This kind of balanced knowledge is not generally imparted, and can easily be lost in the din of vendor posturing and marketing materials. These issues existed in 2002 as well, when I started planning the company I would start.

I knew I needed a set of products that would work separately to do static code analysis, fuzzing, and runtime analysis. The last, runtime analysis, was already handled quite well by open source tools like valgrind , so I planned to offer commercial support and fund valgrind development to make the improvements necessary to communicate things like inappropriate memory accesses and code coverage information back to a fuzzer. gdb and SoftIce were adding capabilities to connect to them via the network and send commands, and it was the next logical step to do it for a super-debugger like valgrind.

That left the issue of coming up with a fuzzing product. I knew I had to overcome the general issues with blackbox fuzzing I had encountered while working on a commercial fuzzer. (I left the company that made that fuzzer to start work on BugScan.) The main issue was detecting inappropriate memory access that would be a real problem but not cause the program being fuzzed to crash. Integration with valgrind would fix that issue, but would not fix the second most important deficiency with fuzzing: scalability.

Fuzzing via the network can be very slow for reasons Luis and I describe in our BlackHat class. You can run fuzzers in parallel, duplicating the environment necessary for the program you are fuzzing, but it starts to get physically bothersome more than anything. I had the idea while at Sundance in January 2003, after I had left the commercial fuzzing company I had worked for. Somehow, being in a different context and away from computers just opened things up for me. I realised that virtual machines were the key.

The idea was as follows: you would install your OS and program(s) you wanted to test into a virtual machine; you would start up the client or server program you wanted to fuzz, then enable recording and specify which process(es) you were targetting; you would then exercise the client or server against a real server or client. As you exercised the program while the virtual machine was in record mode, virtual machine images would be persisted to disk at various calls to the networking APIs and/or the virtualized ethernet driver. When you were done exercising the program you wanted to fuzz, you would stop recording. Then you could start fuzzing. The virtual machine would restore one of the saved VM images that was frozen in a network-oriented part of the OS, fuzz the buffers pointed to by that network-oriented part of the OS, and then continue execution for a few seconds. Because you had chosen the target processes, and the VM knows the stack and heap layout for that process before the fuzzing, it can detect memory corruption issues. If the fuzzing on the network buffer yields an integrity issue with data on the stack or heap, you get a report. The VM can also track code coverage in the target process' address space, a necessity for making sure you're fuzzing as much of the program's code as possible.

This might actually sound even slower than a standard fuzzing approach, given the expense of loading a potentially large VM image and the fact that running in a VM will possibly slow things down more than running under valgrind would. The key advantage was that the individual VM images can be loaded on separate machines and be fuzzed with different fuzzers -- effectively allowing for a fuzzing cluster. The more machines you add, the faster the fuzzing will finish and the faster you'll find the exploitable bugs. Managing cluster nodes can require an application in and of itself, which was another aspect of this long-defunct business.

I'm still really proud of this idea and I'm still sure it would work pretty well. Many companies out there are doing products based on Linux Kernel Virtual Machine which has pieces of QEMU in it. I think this idea could be implemented using LKVM. Maybe someone out there will go ahead and implement this idea in the open source arena.

When documenting these ideas, I gave each phase of the products in the company a codename. This clusterable, fault-injecting virtual machine product was code-named Project Sirius. Yes, I am a total Harry Potter nerd :)

Discussion:

showing all 0 messages

`YaK:: WebLog #535 Topic : 2007-07-31 18.01.26 matt : 2003 time capsule : clusterable, fault-injecting virtual machines`	`[Changes] [Calendar] [Search] [Index] [PhotoTags]`
	`[Back to weblog: pretention]`

2003 time capsule: clusterable, fault-injecting virtual machines

Discussion:

(No messages)