YaK:: WebLog #535 Topic : 2004-08-28 01.04.49 matt : QA nightmare story #2 [Changes]   [Calendar]   [Search]   [Index]   [PhotoTags]   
  [Back to weblog: pretention]  
[mega_changes]
[photos]

QA nightmare story #2

Static schmatic analysis, runtime funtime paralysis


When I started doing QA in 1997, I really had no concept of what I was doing. I had done beta testing before, and seemed to have a knack foor breaking software in general, but I didn't understand any the the process or tools that had been established. In 1998, I picked up on two tools that have been part of my personal toolbox ever since: Purify and PC-Lint. (Actually, Purify has been stagnant for years; I now use valgrind when possible.)

Doing blackbox testing for about a year, finding memory leaks on solaris/linux using 'ps auxw' and 'top', and on NT using Task Manager and later HandleEx (now known as Process Explorer) worked better than one might think. It just made it difficult for the developers to track down and fix things, and that's when I realised looking at code might help.

On an NT firewall/VPN product I worked on, I started by just compiling it under VC++ 6 where everyone else was using VC++ 5. That alone found about 10 really serious bugs that made me question how some of these components ever worked. (Some of these bugs dated back to a Firewall Toolkit where the code was pulled from. A famous security person whose name is Marcus R. kept making = instead of == bugs. I know it was him because he kept putting big comments in files with his name/initials mentioned all over. I later asked him about it when I saw him in Boston in 2000 and he admitted to the "youthful" mistakes.) This initial and basic success got me very interested in source code analysis tools.

At this point, I looked at a lot of things but was only really happy with PC-Lint (7.00 at the time, I think). I could configure it to show pointer custody being taken/abandoned at different times without having to actually annotate the source code, which was very nice. After about a week of setup and tweaking, it was finding memory leaks without too many false positives pretty consistently in the firewall product. While doing this, I also started doing manual code review. This was C code, and I had never really gotten into C. When I was 16, programming became kind of boring to me and I left it at Turbo Pascal and some simple x86 (and 80387) assembly programming. I had done bits of C, but still didn't understand basic things like heap vs stack, etc. This frustrated some developers because I kept asking basic questions, and since they weren't too happy about having to fix the bugs, they weren't exactly encouraging or helpful. Some were, though, and they are probably to blame/thank for my continued work in this area (Ilya M., Aaron B., Andrea V., etc). At one point before I left the company, a whitebox QA team was formed that did nothing but do code reviews and maintain PC-Lint in the automated builds.

In my searches for other tools to help improve quality, I ran into Purify. For about 6 months, Purify was my best friend. Besides being practically psychic, it allowed for much more accurate bug reporting and was able to find very subtle memory corruption bugs. If any of the customers noticed major stability improvements beetween 5.0 and 5.5 of that product, this is one of the main reasons (in my opinion). Purify helped me understand more about memory layout, and this is the point at which my blackbox testing became a bit more complex. I also started reading bugtraq more closely around this time and started understanding that memory corruption and exploits/vulnerabilities were very closely related. I would telnet to the port where a proxy lived, enter long strings, and see it crash. I would then do the same thing under Purify and get the line numbers and enter that into the bug. You'd think people who appreciate this, but I got a lot of grief and personal attacks over it. Tom P. (a developer on the Network Scanner), upon hearing off my discovery of stack-based overflows in another developer's (Mike V.) firewall component said very loudly "MATT HARGETT IS AN ASSHOLE." Nice. This firewall developer really had it out for me, insulting me constantly, and ultimately quit in protest of the whitebox QA team I formed. I didn't really understand this mode off thinking until I understood he was scared shitless of the poor quality of his code being discovered. It was some of the worst code I've ever seen in my professional career (not counting open source stuff). This same developer wrote a threaded scanning engine in the network scanner product. If customers noticed an extreme decrease in quality between the 5.0 and 5.5 versions of the scanner, go talk to Mike V.

So, Purify became part of my arsenal. It's sister product, PureCoverage intrigued me. Measuring code coverage hadn't really occurred to me before, but I instantly understood why it is incredibly important. I would manually test something under PureCoverage, and manually interpret the results and add more use cases to the test plan. Eventually, we had an engineer writing Visual Test scripts and then integrated PureCoverage to measure the coverage of his scripts. We then had Purify run in the automated tests as well, for automatic data generation. It was great! So much so that I replicate this process at nearly every company I've been at since. I also played with Insure++ (then 5.0), which didn't work too well. At the last company I was at, I revisited their then new version (6.0) and it still had major issues. This time, though, I was able to work with their support to help them fix their problems and their next release (6.1) worked much much better. What I've found is that Purify will find things Insure++ won't and vice versa, so I use both. Insure++ instruments code at the source level whereas Purify instruments code at the object code level. A few years ago I discovered a WONDERFUL tool for Linux called Valgrind, which detects many of the same classes of bugs but does it in a different way.

Given this experience, when I interviewed at a company called ClickToSecure (later renamed to Cenzic) and saw their Hailstorm tool, I was very excited. This tool would've been great at my previous job testing. Hailstorm basically did what I did manually: connect to a port, send a long string for a field in the protocol, then try again putting hte long stirng in another field. Hailstorm has no good way of detecting if memory corruption actually occurred. It tried to detect via the network if the program crashed, but that didn't work very well. You had to kind of closely monitor the server you were attacking. Given my previous experience, I told Greg Hoglund (my boss then) that a runtime analysis tool would make it much more useful. He really liked this idea, and I installed Purify on his machine for him and showed him how to use it. I think he ended up talking about this in one of his Blackhat talks, though I'm not sure if he mentioned the source. Code coverage also came into play, amusingly enough. Hailstorm came with "content" to test various protocols, like HTTP. I showed one of the developers (Shawn B.) how to use gcov to measure code coverage on a linux HTTP server and he made some tweaks to the content to increase the coverage. Showing coverage was in another one of Greg's talks where he showed a UI where a control flow block being colored as a "visited" block.

This is really bizarre. All these QA tools and methodologies being applied to exploit/vulnerability discovery? What about static analysis? I'd learned previously that doing static analysis before runtime analysis was generally a better use of time and gets a lot of low-hanging fruit out of the way. When testing, this is helpful because trying to manually test something that's going to be shitting the bed anyways can be very frustrating. When you don't have source code and you're setting breakpoints in a debugger and staring at stack contents, registers, and assembly listings until your toenails curl, it's the same thing.

After ClickToSecure, I started on BugScan which does static analysis of binaries to find security vulnerabilities. It is progress in solving the overall problem, but some pieces are still missing. Thankfully, I've had the plan in place for those pieces and the overall integration since day 1. If I hadn't gone to Sundance Film Festival last year, some of them probably wouldn't have come to me. I had some badass ideas while at the New York Guggenheim in March 2003, but that's another story.

Discussion:

showing all 0 messages    

(No messages)

>
Post a new message:

   

(unless otherwise marked) Copyright 2002-2014 YakPeople. All rights reserved.
(last modified 2006-07-23)       [Login]
(No back references.)