Reversing an Embedded Brainfuck VM

In my free time, I like to create my own little reversing challenges to learn from. If you’ve ever read Reverse Engineering for Beginners by Dennis Yurichev (A free & open source book you should definitely read!), you would know at the beginning of the book he writes:

When I first started learning C and, later, C++, I used to write small pieces of code, compile them, and then look at the assembly language output. This made it very easy for me to understand what was going on in the code that I had written. I did this so many times that the relationship between the C/C++ code and what the compiler produced was imprinted deeply in my mind. It’s now easy for me to imagine instantly a rough outline of a C code’s appearance and function. Perhaps this technique could be helpful for others.

Dennis Yurichev

I wanted to play around with reversing virtual machines. I didn’t want to go through the effort of writing a custom virtual language, so I opted for Brainfuck. I created a Brainfuck program that asked for a flag, and verified it through the help of BF-it. After which I wrote a small, cruddy Brainfuck interpreter in C, changed the Brainfuck commands to random bytes, and embedded the Brainfuck program in the executable.

You can download this challenge here if you want to try to solve it yourself.

Brainfuck is an esoteric programming language, that consists of eight commands, and an instruction pointer. In addition, it has a data pointer, pointing to an array of 30,000 byte cells, initialized to zero. It can also output bytes to stdout, and input a byte from stdin to the current data cell pointed by the data pointer.

Brainfuck’s commands are (shamelessly copied from Wikipedia):

> increment the data pointer (to point to the next cell to the right).
< decrement the data pointer (to point to the next cell to the left).
+ increment (increase by one) the byte at the data pointer.
- decrement (decrease by one) the byte at the data pointer.
. output the byte at the data pointer.
, accept one byte of input, storing its value in the byte at the data pointer.
[ if the byte at the data pointer is zero, then instead of moving the instruction pointer forward to the next command, jump it forward to the command after the matching ] command.
] if the byte at the data pointer is nonzero, then instead of moving the instruction pointer forward to the next command, jump it back to the command after the matching [ command.

Initial Analysis

Upon opening the program in IDA, we see the following:

void __fastcall sub_1240(__int64 this)
{
  unsigned __int8 *v1; // rcx
  unsigned __int8 v2; // al
  __int64 v3; // rbx
  __int64 v4; // rbp
  signed __int64 v5; // rsi
  _BYTE *v6; // rcx
  signed int v7; // edx
  int v8; // edi

  v1 = *(unsigned __int8 **)(this + 3008);
  v2 = *v1;
  if ( *v1 )
  {
    v3 = this;
    v4 = 0LL;
    do
    {
      if ( v2 == -71 )
      {
        ++**(_BYTE **)(v3 + 3000);
        v5 = *(_QWORD *)(v3 + 3008);
        v1 = (unsigned __int8 *)(v5 + 1);
        goto LABEL_5;
      }
      if ( v2 > 0xB9u )
      {
        if ( v2 == -26 )
        {
          --**(_BYTE **)(v3 + 3000);
          v5 = *(_QWORD *)(v3 + 3008);
          v1 = (unsigned __int8 *)(v5 + 1);
          goto LABEL_5;
        }
        if ( v2 <= 0xE6u )
        {
          if ( v2 == -52 )
          {
            putc(**(char **)(v3 + 3000), stdout);
            v5 = *(_QWORD *)(v3 + 3008);
            v1 = (unsigned __int8 *)(v5 + 1);
            goto LABEL_5;
          }
        }
        else if ( v2 == -23 )
        {
          if ( !**(_BYTE **)(v3 + 3000) )
          {
            v6 = v1 + 1;
            v7 = 1;
            while ( 1 )
            {
              *(_QWORD *)(v3 + 3008) = v6;
              v5 = (signed __int64)v6;
              if ( *v6 == -23 )
              {
                ++v7;
              }
              else if ( *v6 == 15 && !--v7 )
              {
                v1 = v6 + 1;
                goto LABEL_5;
              }
              ++v6;
            }
          }
        }
        else if ( v2 == -8 )
        {
          ++*(_QWORD *)(v3 + 3000);
          v5 = (signed __int64)v1++;
          goto LABEL_5;
        }
LABEL_17:
        v5 = (signed __int64)v1++;
        goto LABEL_5;
      }
      if ( v2 == 83 )
      {
        v5 = (signed __int64)v1;
        --*(_QWORD *)(v3 + 3000);
        ++v1;
        goto LABEL_5;
      }
      if ( v2 == -118 )
      {
        **(_BYTE **)(v3 + 3000) = getc(stdin);
        v5 = *(_QWORD *)(v3 + 3008);
        v1 = (unsigned __int8 *)(v5 + 1);
        goto LABEL_5;
      }
      if ( v2 != 15 )
        goto LABEL_17;
      v8 = 0;
      while ( 1 )
      {
        if ( v2 == -23 )
          ++v8;
        else
          v8 -= v2 == 15;
        v5 = (signed __int64)(v1 - 1);
        *(_QWORD *)(v3 + 3008) = v1 - 1;
        if ( !v8 )
          break;
        v2 = *(v1-- - 1);
      }
LABEL_5:
      *(_QWORD *)(v3 + 3008) = v1;
      v2 = *(_BYTE *)(v5 + 1);
      ++v4;
    }
    while ( v2 );
  }
  else
  {
    v4 = 0LL;
  }
  printf(1LL, "iterations: %lu\n", v4);
}

After renaming symbols, and creating a struct for the interpreter class, we see the following output. Creating structs in IDA often results in elegant and easy to read code.

brainfuck_vm struct created in IDA
void __fastcall brainfuck_vm::run(brainfuck_vm *const this)
{
  const unsigned __int8 *temp_ip; // rcx
  unsigned __int8 opcode; // al
  brainfuck_vm *bf_program; // rbx
  __int64 iter_count; // rbp
  signed __int64 curr_ip; // rsi
  const unsigned __int8 *br_search_ip; // rcx
  signed int bracket_balance1; // edx
  int bracket_balance2; // edi

  temp_ip = this->instruction_pointer;
  opcode = *temp_ip;
  if ( *temp_ip )
  {
    bf_program = this;
    iter_count = 0LL;
    do
    {
      if ( opcode == 0xB9u )                    // command: +
      {
        ++*bf_program->data_ptr;
        curr_ip = (signed __int64)bf_program->instruction_pointer;
        temp_ip = (const unsigned __int8 *)(curr_ip + 1);
        goto next_opcode;
      }
      if ( opcode > 0xB9u )
      {
        if ( opcode == 0xE6u )                  // command: -
        {
          --*bf_program->data_ptr;
          curr_ip = (signed __int64)bf_program->instruction_pointer;
          temp_ip = (const unsigned __int8 *)(curr_ip + 1);
          goto next_opcode;
        }
        if ( opcode <= 0xE6u )
        {
          if ( opcode == 0xCCu )                // command: .
          {
            putc(*bf_program->data_ptr, stdout);
            curr_ip = (signed __int64)bf_program->instruction_pointer;
            temp_ip = (const unsigned __int8 *)(curr_ip + 1);
            goto next_opcode;
          }
        }
        else if ( opcode == 0xE9u )             // command: [
        {
          if ( !*bf_program->data_ptr )
          {
            br_search_ip = temp_ip + 1;
            bracket_balance1 = 1;
            while ( 1 )
            {
              bf_program->instruction_pointer = br_search_ip;
              curr_ip = (signed __int64)br_search_ip;
              if ( *br_search_ip == 0xE9u )
              {
                ++bracket_balance1;
              }
              else if ( *br_search_ip == 0xF && !--bracket_balance1 )
              {
                temp_ip = br_search_ip + 1;
                goto next_opcode;
              }
              ++br_search_ip;
            }
          }
        }
        else if ( opcode == 0xF8u )             // command: >
        {
          ++bf_program->data_ptr;
          curr_ip = (signed __int64)temp_ip++;
          goto next_opcode;
        }
next_ip:
        curr_ip = (signed __int64)temp_ip++;
        goto next_opcode;
      }
      if ( opcode == 0x53 )                     // command: <
      {
        curr_ip = (signed __int64)temp_ip;
        --bf_program->data_ptr;
        ++temp_ip;
        goto next_opcode;
      }                                         // command: ,
      if ( opcode == 0x8Au )
      {
        *bf_program->data_ptr = getc(stdin);
        curr_ip = (signed __int64)bf_program->instruction_pointer;
        temp_ip = (const unsigned __int8 *)(curr_ip + 1);
        goto next_opcode;
      }
      if ( opcode != 0xF )                      // command: ]
        goto next_ip;
      bracket_balance2 = 0;
      while ( 1 )
      {
        if ( opcode == 0xE9u )
          ++bracket_balance2;
        else
          bracket_balance2 -= opcode == 0xF;
        curr_ip = (signed __int64)(temp_ip - 1);
        bf_program->instruction_pointer = temp_ip - 1;
        if ( !bracket_balance2 )
          break;
        opcode = *(temp_ip-- - 1);
      }
next_opcode:
      bf_program->instruction_pointer = temp_ip;
      opcode = *(_BYTE *)(curr_ip + 1);
      ++iter_count;
    }
    while ( opcode );
  }
  else
  {
    iter_count = 0LL;
  }
  printf(1LL, "iterations: %lu\n", iter_count);
}

IDA’s awesome analysis makes it very easy to clearly see the commands, regardless if the ASCII representations have been converted to random bytes.

Extracting the Bytecode

To export the Brainfuck instructions, we need to locate where they are in the actual binary. From the disassembled main function, we can locate the instructions.

Disassembly of the main function

From the disassembly, a suspicious byte array at 0x4020 is put into rax. Looking at this array, we see the following:

The suspicious byte array at 0x4020

These are indeed valid opcodes for the Branfuck interpreter. We can export this to a file by going to the Hex View, selecting the opcodes, and selecting “Save to file…”.

Brainfuck opcodes extracted from the binary opened in HxD. IDA also wrote 3 extra bytes to align the output to 16 bytes.

With the Brainfuck opcodes, we can write a quick python script to convert these into real Brainfuck commands.

#!/usr/bin/python3
# -*- coding: utf-8 -*-
# Convert brainfuck opcodes into correct brainfuck commands

import sys

conversions = {
    '\xF8': '>',
    '\x53': '<',
    '\xB9': '+',
    '\xE6': '-',
    '\xCC': '.',
    '\x8A': ',',
    '\xE9': '[',
    '\x0F': ']',
    }


def main():
    if len(sys.argv) < 2:
        print('[!] No bytecode file supplied')

    with open(sys.argv[1], 'rb') as f:
        content = f.read()

    bf = map(lambda x: conversions.get(chr(x), '?'), list(content))
    print(''.join(list(bf)).replace('?', ''))


if __name__ == '__main__':
    main()

Reversing Brainfuck by Cheating

From this script, I have retrieved the Brainfuck program running inside of the program. And well.. it looks like a brainfuck.

>>>[-]><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]>[-]<>++++++++[-<+++++++++>]<+.>++++++[-<++++++>]<+.++.+++++.-.>+++++++[-<------------>]<.>++++++++[-<++++++++++>]<.>+++[-<----->]<.>+++[-<++++++>]<..++++.--------.+++.--------------.>++++++[-<------->]<.>+++++[-<----->]<-.[-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]++++++++++[-<++++++++++>]<++><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]+++++++++[-<++++++++++++>]<><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]++++++++[-<++++++++++++>]<+><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]++++++++++[-<++++++++++>]<+++><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]+++++++++++[-<+++++++++++>]<++><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]+++++++[-<+++++++++++++++++>]<><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]++++++++[-<+++++++++++++>]<><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]++++[-<+++++++++++++>]<><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]++++++++[-<++++++++++++++>]<++++><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]+++++[-<+++++++++++++++++++>]<><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]++++[-<+++++++++++++>]<><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]+++++[-<+++++++++++++++++++>]<><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]+++++++[-<++++++++++++++>]<><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]++++++++[-<++++++++++++++>]<++><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]++++[-<+++++++++++++>]<><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]+++++++[-<+++++++>]<><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]++++++++++[-<+++++++++++>]<><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]+++++[-<+++++++++++++++++++>]<><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]++++++++++[-<++++++++++>]<++><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]+++++++++[-<+++++++++++++>]<><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]+++++++++[-<+++++++++++>]<><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]+++++++[-<+++++++++++++++>]<++><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]><,><>[-]<<[-]>[>+<<+>-]>[<+>-]<><[-]>[-]<<[>+>+<<-]>>[<<+>>-][-]>[-]+++++++++++[-<+++++++++++>]<++++><<[->-<]>[<+>[-]]<[[-]+><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]][-]>[-]<++++++++++.[-]>[-]<<<[>>+>+<<<-]>>>[<<<+>>>-][-]+<[>->[-]><>[-]>[-]<>+++++++[-<++++++++++++>]<.>+++++[-<++++++>]<.+++++++.>++++++++[-<----------->]<-.>++++++++[-<+++++++++>]<.-------.>++++[-<++++>]<+.--------------.+.+++++++++++++.>+++++++++[-<--------->]<.>+++[-<------->]<--.<><<<[-]]>[>[-]><>[-]>[-]<>++++++++[-<+++++++++++>]<+.>+++[-<+++++++>]<+.++++++.>+++++++[-<------------>]<-.>+++++++[-<++++++++++++>]<+++.--------------.+++++.>+++++++[-<----------->]<.>+++[-<------->]<--.<><<-]<<<<

Reversing this by hand would be a total nightmare (a brain fuck, if you will). Luckily, there are a multitude of tools available online that can transform Brainfuck into C, which can be compiled with optimizations and reversed in IDA.

I used this script to convert the commands into a valid C file, and compiled with maximum optimizations.

gcc -O3 brainfudge.bf -o compiled_bf.o

We can pop this compiled file into IDA for analysis.

Disassembly of main from the compiled Brainfuck

GCC has nicely optimized the code, and makes reverse engineering the Brainfuck a lot easier.

Scrolling down a bit, we can see the part of the program that checks the input against the flag. The program compares the characters to correct value, and sets a flag if any character didn’t match.

The first character in the flag is “f”

The input characters are checked one by one, and are preceded by a call to getc. We can scroll through the binary and manually extract the flag.

The second character comparison check

After scrolling through the binary, we found the flag: flag{wh4t_4_br41n_fuck}

Leave a Reply