Your First Phoenix Program: A Walkthrough of StaticGlobalDump

Article
01/04/2006

In this introduction to building tools with Phoenix I'm going to start with one of the simplest tools possible. What this tool does dumps the global/static variables in an image. And as it turns out, I've actually had customers ask for a tool that could do this. This is something you can do with existing tools, but the nice thing here is that you can do this now with a single tool across managed and unmanaged code.

I call the tool StaticGlobalDump. Lets walk through the code. At the end of this blog entry I give the source in full, and it works with the November Phoenix RDK, which is available for download.

In terms of requirements, it must be simple to use. It simply takes as input a PE file (either a DLL or EXE) and writes out to stdout. No fancy processing, just a straightforward use of Phoenix.

The RDK exposes a purely managed API. This allows us to use C#, C++/CLI, VB, or any other .NET language. I'll do most of my code examples in this blog either in C# or C++.

Things Covered in this Article

· Initializing Phoenix for a simple PE read scenario.

· Reading in a PE file.

· Loading the module symbol table from the PE file.

· Walking the symbol table.

· Finding basic type information associated with a symbol.

The Main Function

Let’s start with a look at Main(), which is given below. The code in bold has more user-defined functionality behind it, whereas the non-bold code calls directly into framework code (either the CRT, STL, CLR, or Phoenix).

Looking at the code, we see the first thing (code point 1) we do is to initialize the Phoenix targets. This will be the next function we look at, in more depth after Main, but in summary this call allows Phoenix to read and/or write x86 and MSIL binaries.

Code point 2 is done after the targets are initialized. Phx.Init is a static class in Phoenix used to initialize Phoenix. BeginInit is a method on this static class that initializes some of the key aspects of Phoenix such as memory management, the global unit, the global unit's symbol table, alias package, etc…

The EndInit lets Phoenix know that initialization is done. In this particular example there was no good reason to have a BeginInit and EndInit, as I did no particular work in between, but there are cases when you will do work in between. I'll talk about those functions of scenarios in a future blog.

Code point 3: We open the file that we passed as command-line arguments to the program. This is done with a call to Phx.PEModuleUnit.Open, which creates a new object of type PEModuleUnit. The PEModuleUnit holds the representation of PE file in an object. At this point the instructions have not been raised into Phoenix Low-level Intermediate Representation (LIR). For this tool, all we care about are the names of variables, and not instructions, so we will not raise them at all. But that too will come in the future..

Code point 4: We instruct the PEModuleUnit to load the global symbols for the PE file. This reads in the types and symbols into the PEModuleUnit. It reads from both the PDB and metadata (if it's a managed image).

Code point 5: We call PrintStaticGlobals. This is the function that I've written which will take the symbol table for the module and print all of the static and global variables defined for this module.

So from the 50,000 foot view, that's all there is too it. We'll dive into the user written functions now, but as you can now tell, this is conceptually straightforward.

public static int Main(String[] argv) {

if (argv.Length != 1) {

Phx.Output.WriteLine(

"Usage: StaticGlobalDump <input-image-name>\n");

return 1;

}

// 1

StaticGlobalDump.InitializeTargets();

// 2

Phx.Init.BeginInit();

Phx.Init.EndInit("PHX|_PHX_", argv);

// 3

Phx.PEModuleUnit module = Phx.PEModuleUnit.Open(argv[0]);

// 4

module.LoadGlobalSyms();

// 5

StaticGlobalDump.PrintStaticGlobals(module.SymTable);

return 0;

}

The InitializeTargets Function

InitializeTargets initializes four objects, broken into two categories. The two categories are Arch and Runtime. Arch specifies the processor architecture that we are initializing for Phoenix to operate on. In this case we have picked two architectures: MSIL and x86. We treat MSIL as an architecture, as it is a completely different instruction set architecture.

The next thing we configure is the runtime that Phoenix targets. Phoenix can target either the x86Runtime or the msilRuntime. The difference between Arch and Runtime is that Arch focuses on characteristics of the ISA, i.e., differences in opcodes, registers, conditional codes, etc… The Runtime component is focused on runtime differences of the architectures, most notably exception handling.

This is largely boilerplate code that you'll simply cut and paste for a good number of applications of this sort. In fact that’s precisely what I did for this example (it’s from a sample in the RDK).

static void InitializeTargets() {

Phx.Targets.Archs.Arch msilArch =

Phx.Targets.Archs.MSIL.Arch.New();

Phx.GlobalData.RegisterTargetArch(msilArch);

Phx.Targets.Archs.Arch x86Arch =

Phx.Targets.Archs.X86.Arch.New();

Phx.GlobalData.RegisterTargetArch(x86Arch);

Phx.Targets.Runtimes.Runtime msilRuntime =

Phx.Targets.Runtimes.VCCRT.Win32.MSIL.Runtime.New(msilArch);

Phx.GlobalData.RegisterTargetRuntime(msilRuntime);

Phx.Targets.Runtimes.Runtime x86Runtime =

Phx.Targets.Runtimes.VCCRT.Win32.X86.Runtime.New(x86Arch);

Phx.GlobalData.RegisterTargetRuntime(x86Runtime);

}

The PrintStaticGlobals Function

PrintStaticGlobals is where the real action happens. Understanding this part is probably the most important part of this entry. This function takes a symbol table for the PEModuleUnit as its sole argument, and from that is able to print out the globals and statics. This is a user-leaf function in that it doesn't call any other code that is written by the user (although it calls some BCL and Phoenix routines).

Code point 6: This is where we create size and initialize it to 0. size is the variable that holds the number of globals and statics we’ve encountered thus far. We will use this variable to dump the total number encountered at the end of the function.

Code point 7: This is where we iterate over all of the symbols in the table. Later I’ll go into more detail as to how the symbols package works, but for now what you need to know is that each table has a set of maps, where each map maps from some characteristic to a symbol in the table. A characteristic can be the name, or a GUID, or the RVA of a symbol. In this case we use the LocalId map as it has all the symbols in the table in its map (a map can have a subset of the symbols in the table).

Code point 8: At code point 8 we determine if the given symbol we are looking at is really a global or a static. There are a lot of “Is*” properties on symbols. The properties that we care about are if the symbol is a global variable (IsGlobalVarSym) or a static field (StaticFieldSym). In native code, static fields are represented as GlobalVarSyms, but in managed code static fields are actually represented as StaticFieldSym.

You may notice that we also check to make sure that symbol is not a reference to a symbol. The symbol table has a list of all definitions and references to a symbol. In this case we want to only dump definitions, but you can imagine for other tools dumping references might be what you want (in fact you could use the symbol references to find out who references the globals and static symbols by name, but it only will give you named references and not aliased references).

Last point on this line of code is that we do use “!sym.IsRef”. Now, you’re probably wondering why I didn’t use “sym.IsDef” instead. In theory that would work, but the current RDK has an issue where StaticFieldSyms aren’t correctly setting IsDef to true in this case. It’s a pre-alpha SDK J

Code point 9: The if-statement checks that the global or static variable name doesn’t begin with “__” as that is reserved for compiler use. If you look at all the global symbols in a typical native application you will see quite a few symbols that begin with “__”. These are either compiler reserved uses or uses in some standard header, but your code should not have global variables with such a name. This program assumes that you haven’t started your globals with “__”.

If it passes that “__” test then it simply writes out the name of the symbol. All symbols have a NameString property, which returns the name of the symbol. We print out the name and increment our size variable.

Code point 10: It would also be handy to print the type of each global and static variable. User-defined global and static variables, of course, have a type, but the PE reader may not be able to deduce types for other global symbols in the symbol table. For this reason we need to check to make sure that a given symbol has a type before attempting to generate the string corresponding to that type – otherwise we may have an access violation when we try to call a method on a type that doesn’t exist. If there is no type for this symbol then we simply insert a carriage return and move on to the next symbol.

static void PrintStaticGlobals(Phx.Syms.Table symTable)

{

// 6

int size = 0;

// 7

foreach(Phx.Syms.Sym sym in symTable.LocalIdMap.InternalMap) {

// 8

if ((sym.IsStaticFieldSym || sym.IsGlobalVarSym)

&& !sym.IsRef)

// 9

if (!sym.NameString.StartsWith("__"))

{

size++;

Console.Write("{0}", sym.NameString);

// 10

if (sym.Type != null)

Console.WriteLine(" [{0}]",

sym.Type.ToString());

else

Console.WriteLine();

}

Console.WriteLine("Number of Globals: {0}", size);

}

The only thing left is to make sure that you use the correct references for the application. The necessary references are: arch-msil.dll, arch-x86.dll, phx.dll, runtime-vccrt-win32-msil, and runtime-vccrt-win32-x86. One requirement of the RDK is that native code must be compiled using VC2005 using the /Zi switch and linked with the /PROFILE switch.

That’s it, we’re done! The program dumps all globals and statics. Now try it on your favorite managed or native application or DLL.

//------------------------------------------------------------------------------

// Description:

// StaticGlobalDump prints out

// Unmanaged input files must be compiled with -Zi and linked with /PROFILE.

// Usage:

// StaticGlobalDump <input-file1>

//------------------------------------------------------------------------------

using System;

public class StaticGlobalDump {

public static int Main(String[] argv) {

// Initialize the infrastructure.

StaticGlobalDump.InitializeTargets();

Phx.Init.BeginInit();

// Simple usage check

if (argv.Length != 1) {

Phx.Output.WriteLine(

"Usage: StaticGlobalDump <input-image-name>\n");

return 1;

}

Phx.Init.EndInit("PHX|_PHX_", argv);

// Open the module.

Phx.PEModuleUnit module = Phx.PEModuleUnit.Open(argv[0]);

// Read symbols in.

module.LoadGlobalSyms();

PrintStaticGlobals(module.SymTable);

return 0;

}

static void PrintStaticGlobals(Phx.Syms.Table symTable) {

int size = 0;

Phx.Syms.Sym sym;

Phx.Collections.SymIConstIterator symIter = symTable.NameMap.SymIterator;

for (symIter.MoveNext(); symIter.MoveNext(); ) {

sym = (Phx.Syms.Sym)symIter.Current;

if (sym.IsGlobalVarSym && sym.IsDef)

if (!sym.NameString.StartsWith("__")) {