YaK:: WebLog #535 Topic : 2006-07-18 01.58.52 matt : TDD parsing the x86 opcode mod R/M byte, part 1 [Changes]   [Calendar]   [Search]   [Index]   [PhotoTags]   
  [Back to weblog: pretention]  
[mega_changes]
[photos]

TDD parsing the x86 opcode mod R/M byte, part 1

The mod R/M (modRM) byte in x86 is a hairy proposition thanks to inconsistencies, undocumented things, and bit-level hackery. How do we deal with it in a TDD fashion?


First, let me say that my favorite x86 opcode reference on the web is here . The layout of the tables in plain-ish HTML and the links between them make for simple, easy browsing of the clusterfuck known as the x86 instruction set. If there was a donation link on the pages somewhere, I'd definitely give. (I did something similar for HPANA . Yes, I'm a fuqn nerd.)

So, let's look at one of the customer tests (objdump output) I gave to Luis to make pass that has a "simple" modRM byte:

8048475:      89 e5                 mov    ebp,esp

Well, that's pretty easy to test. I'm using mono 1.1.16.1 under Linux , by the way. I'm using vi for my editor and mono's built-in nunit-console and nunit.framework. (nunit-gui works well under mono these days, but old habits die hard.) Their built-in NUnit is 2.2.0, but that should be okay for our purposes. If it wasn't we could just use our own NUnit 2.2.8 binaries anyways.

[Test]
public void MovEbpEsp()
{
  Byte[] opcode = new byte[] {0x89, 0xe5};
  // ref: http://sandpile.org/ia32/opc_rm32.htm
  const UInt32 ebpRegister = 5, espRegister = 4;
  UInt32 source = ModRMParser.GetSourceRegisterNumber(opcode);
  UInt32 destination = ModRMParser.GetDestinationRegisterNumber(opcode);

  Assert.AreEqual(espRegister, source);
  Assert.AreEqual(ebpRegister, destination);
}

It's also really easy to make pass if you Fake It Until You Make It :

public static class ModRMParser
{
  public static UInt32 GetDestinationRegisterNumber(Byte[] _opcode) { return 4; }
  public static UInt32 GetSourceRegisterNumber(Byte[] _opcode) { return 5; }
}

First, the test fails because I mixed up the return values that would've made the test pass. This is what happens when I don't take baby steps (and I'm not pairing). I should have have them both return 0, seem both the tests fail, made the first Assert pass, and then the second Assert. Lesson learned, code corrected.

Second, I've duplicated the word "Number" in the method names. We *know* it's returning a number (a UInt32, specifically), so I'm ditching that Number suffix. There is a time when you might need that suffix, and that is when you are overloading the method only by return value, which C# doesn't support (but the CLR does, according to CLR via C# ).

That test passes, so let's look at another simple one:

8048469:  89 ec                   mov    esp,ebp

Okay, we now have two similar cases to support. To me, this implies an if/else on the exact modRM byte. There's two tests I can think of: the {0x89, 0xec} case above and an unhandled modRM case.

                [Test]
                public void MovEspEbp()
                {
                        Byte[] opcode = new byte[] {0x89, 0xec};
                        // ref: http://sandpile.org/ia32/opc_rm32.htm
                        const UInt32 ebpRegister = 5, espRegister = 4;
                        UInt32 source = ModRMParser.GetSourceRegister(opcode);
                        UInt32 destination = ModRMParser.GetDestinationRegister(opcode);

                        Assert.AreEqual(ebpRegister, source);
                        Assert.AreEqual(espRegister, destination);
                }

I make that pass with a simple if statement. This is cheating, but the next test will drive us to be more "real". I use "real" here loosely because this code does what we need it to do so far. If these were our only requirements, we'd be set to go and integrate it. Note that the method names are different now due to the aforementioned refactoring.

        public static class ModRMParser
        {
                public static UInt32 GetDestinationRegister(Byte[] _opcode)
                {
                        if (_opcode[1] == 0xe5)
                                return 5;
                        else
                                return 4;
                }

                public static UInt32 GetSourceRegister(Byte[] _opcode)
                {
                        if (_code[1] == 0xe5)
                                return 4;
                        else
                                return 5;
                }
        }

w00t. The next tests will make this less retarded:

                [Test]
                [ExpectedException(typeof(ArgumentException))]
                public void UnimplementedDestinationModRM()
                {
                        ModRMParser.GetDestinationRegister(0x89, 0xff);
                }

                [Test]
                [ExpectedException(typeof(ArgumentException))]
                public void UnimplementedSourceModRM()
                {
                        ModRMParser.GetSourceRegister(0x89, 0xff);
                }

First, I got rid of the variable _opcode since I don't need to use the values twice. I'll add the 'params' keyword to the front of the method arguments to make that compile. The test compiles and fails, which is what I expect.

Second, It feels weird that I need to tests just make sure something is invalid. I seemingly always want the source and destination operands of the opcode, which implies a struct or a class that groups them. I'm writing it down in my test list (potential refactorings, in addition to test ideas, go there).

This code makes the tests pass:

                public static UInt32 GetDestinationRegister(params Byte[] _opcode)
                {
                        switch(_opcode[1])
                        {
                                case 0xe5:
                                        return 5;
                                case 0xec:
                                        return 4;
                                default:
                                        throw new ArgumentException("_code", "unsupported modRM
");
                        }
                }

                public static UInt32 GetSourceRegister(params Byte[] _opcode)
                {
                        switch(_opcode[1])
                        {
                                case 0xe5:
                                        return 4;
                                case 0xec:
                                        return 5;
                                default:
                                        throw new ArgumentException("_code", "unsupported modRM
");
                        }
                }

You'll notice I made it a switch() statement instead of if/else if/else. I think the latter is ugly, and I have a feeling I would've needed a switch shortly anyways. So, what's the next test? I don't have any in my list, but I do have a potential refactoring: grouping the source and destination together somehow. Looking at the code, it isn't screaming for this; I have a feeling a couple more tests will make it happen, so I'll wait for now. So, what's the next simple test in a similar vein?

80483e4:  89 d6                   mov    esi,edx

Great, two new registers we've never seen before. Here's the test:

                [Test]
                public void MovEsiEdx()
                {
                        Byte[] opcode = new byte[] {0x89, 0xd6};
                        // ref: http://sandpile.org/ia32
                        const UInt32 esiRegister = 6, edxRegister = 2;
                        UInt32 source = ModRMParser.GetSourceRegister(opcode);
                        UInt32 destination = ModRMParser.GetDestinationRegister(opcode);

                        Assert.AreEqual(edxRegister, source);
                        Assert.AreEqual(esiRegister, destination);
                }

At this point, it's super-easy to make this pass -- just add another case to our switch() statements. I'll forego that and skip right to the code smells. Starting with the tests, we're duplicating our variables. Those register consts are highly suspect for being transformed into an enum since there's more than three and I know there's more to come. On top of this, they're duplicated as magic numbers in our implementation -- blech! The modRM bytes are also duplicated, so they are another target for refactoring toward a calculation. I'm putting these things into my test list for now, as I need to get on to other things right now.

With that, I'm signing off. I realize I sold this a bit short by keeping things simple; we're only dealing with one opcode (0x89), and register to register transformations -- I'll get way more complicated in part 2, coming soon to a blog post near you. Here's the code as it currently stands:

using System;
using NUnit.Framework;

namespace ModRM
{
        public static class ModRMParser
        {
                public static UInt32 GetDestinationRegister(params Byte[] _opcode)
                {
                        switch(_opcode[1])
                        {
                                case 0xd6:
                                        return 6;
                                case 0xe5:
                                        return 5;
                                case 0xec:
                                        return 4;
                                default:
                                        throw new ArgumentException("_code", "unsupported modRM
");
                        }
                }

                public static UInt32 GetSourceRegister(params Byte[] _opcode)
                {
                        switch(_opcode[1])
                        {
                                case 0xd6:
                                        return 2;
                                case 0xe5:
                                        return 4;
                                case 0xec:
                                        return 5;
                                default:
                                        throw new ArgumentException("_code", "unsupported modRM
");
                        }
                }
        }

        [TestFixture]
        public class ModRMParserTests
        {
                Byte[] opcode;
                UInt32 source, destination;

                // ref: http://sandpile.org/ia32
                const UInt32 ebpRegister = 5, espRegister = 4, esiRegister = 6, edxRegister = 2
;

                [Test]
                [ExpectedException(typeof(ArgumentException))]
                public void UnimplementedDestinationModRM()
                {
                        ModRMParser.GetDestinationRegister(0x89, 0xff);
                }

                [Test]
                [ExpectedException(typeof(ArgumentException))]
                public void UnimplementedSourceModRM()
                {
                        ModRMParser.GetSourceRegister(0x89, 0xff);
                }

                [Test]
                public void MovEbpEsp()
                {
                        opcode = new byte[] {0x89, 0xe5};
                        source = ModRMParser.GetSourceRegister(opcode);
                        destination = ModRMParser.GetDestinationRegister(opcode);

                        Assert.AreEqual(espRegister, source);
                        Assert.AreEqual(ebpRegister, destination);
                }

                [Test]
                public void MovEspEbp()
                {
                        opcode = new byte[] {0x89, 0xec};
                        source = ModRMParser.GetSourceRegister(opcode);
                        destination = ModRMParser.GetDestinationRegister(opcode);

                        Assert.AreEqual(ebpRegister, source);
                        Assert.AreEqual(espRegister, destination);
                }

                public void MovEsiEdx()
                {
                        opcode = new byte[] {0x89, 0xd6};
                        source = ModRMParser.GetSourceRegister(opcode);
                        destination = ModRMParser.GetDestinationRegister(opcode);

                        Assert.AreEqual(edxRegister, source);
                        Assert.AreEqual(esiRegister, destination);
                }
        }
}

Discussion:

showing all 0 messages    

(No messages)

>
Post a new message:

   

(unless otherwise marked) Copyright 2002-2014 YakPeople. All rights reserved.
(last modified 2006-07-18)       [Login]
(No back references.)