XenAccess Documentation

Version 0.4

The XenAccess project was inspired by ongoing research within the Georgia Tech Information Security Center (GTISC). The purpose of this library is to make it easier for other researchers to experiment with the many uses of memory introspection without needing to focus on the low-level details of introspection. If you are using this library and come up with a useful extension to it, we are always happy to receive patches.

Please direct all questions about XenAccess to the mailing list: https://lists.sf.net/lists/listinfo/xenaccess-devel

The project was created and is maintained by Bryan D. Payne, who is currently working towards his PhD in Computer Science at Georgia Tech. Bryan may be reached by email at bryan@thepaynes.cc.

Introduction

What is XenAccess?

XenAccess is a library that simplifies the process of memory introspection for virtual machines running on the Xen hypervisor. With XenAccess, your software can run in one virtual machine and access the memory space of other virtual machines. While the Xen Control Library (libxc), which is included with Xen, provides the ability to access another virtual machine's memory at a low level, XenAccess allows you to access memory using kernel symbols, virtual addresses, and physical addresses.

What is memory introspection?

Memory introspection is the process of viewing the memory of one virtual machine from a different virtual machine. On the surface, this sounds rather simple. In fact, Xen provides a function to facilitate this type of memory access. What makes memory introspection difficult is the semantic gap between the two virtual machines. For example, to lookup virtual addresses XenAccess must walk the page tables inside the other virtual machine. However, to walk these page tables, XenAccess must first know where the page directory is located (i.e., the CR3 value). And this value depends on the process address space you are viewing. The more you think about the problem, the reasons for its difficulty become clear. One must know a lots of details about the target operating system in order to build these higher levels of abstraction.

intro-detail.png

XenAccess must take several steps to access a memory page based on a kernel symbol in Linux.

The figure above shows the steps that XenAccess takes to access a page of memory using a kernel symbol. This figure is focused on Linux, however the procedure for Windows is similar. Instead of using the System.map file, kernel symbols are converted to virtual addresses using the export values from ntoskrnl.exe.

Memory introspection is useful because it allows you to monitor and control an operating system from a protected location. Previous research has shown that introspection can be used for a wide variety of security applications, but more ideas are coming out all the time. Using XenAccess, you can quickly experiment with your new ideas and help advance this new an exciting research direction.

Installation

Requirements

XenAccess is designed for 32-bit x86 systems running Xen 3.0.4. Work is underway to get support for Xen 3.1.0, but right now support for this platform is very limited. If you have success with another version or if you would like to help port XenAccess to another version, then please send a message to the mailing list.

Getting XenAccess

You can get the latest released version of XenAccess from SourceForge using the following link: http://sf.net/project/platformdownload.php?group_id=159196

You can also grab the development version directly from the subversion repository. To do this, you will need a subversion client capable of handling SSL. Then, perform the checkout with the following command:

 svn co https://xenaccess.svn.sf.net/svnroot/xenaccess/trunk/libxa 

If you are just getting started with XenAccess, you probably want to use the latest released version. However, if you need a new feature that hasn't been released, or you are planning on submitting a patch, then you may want to try the development version.

Building XenAccess

Before compiling XenAccess, you should make sure that you have a standard development environment installed including gcc, make, autoconf, etc. You will also need the libxc library and the libxenstore library, which are included with a typical Xen installation. XenAccess uses the standard GNU build system. To compile the library, follow the steps shown below.
./autogen.sh
./configure
make 

Note that you can specify options to the configure script to specify, for example, the installation location. For a complete list of configure options, run:

./configure --help 

Installing XenAccess

Installation is optional. This is useful if you will be developing code to use the XenAccess library. However, if you are just running the examples, then there is no need to do an installation. If you choose to install XenAccess, you can do it using the steps shown below:
su 
make install 

Note that this will install XenAccess under the install prefix spcified to the configure script. If you did not specify an install prefix, then XenAccess is installed under /usr/local.

Configuring XenAccess

In order to work properly, XenAccess requires that you install a configuration file. This file has a set of entries for each domain that XenAccess will access. These entries specify things such as the OS type (e.g., Linux or Windows), the location of symbolic information, and offsets used to access data within the domain. The file format is relatively straight forward. The generic format is shown below:
<domain name> {
    <key> = <value>;
    <key> = <value>;
} 

The domain name is what appears when you use the 'xm list' command. There are 14 different keys available for use. The ostype and sysmap keys are used by both Linux and Windows domains. The available keys are listed below:

All of the offsets can be specified in either hex or decimal. For hex, the number should be preceeded with a '0x'. An example configuration file is shown below:
Fedora-HVM {
    sysmap      = "/boot/System.map-2.6.18-1.2798.fc6";
    ostype      = "Linux";
    linux_tasks = 268;
    linux_mm    = 276;
    linux_pid   = 312;
    linux_pgd   = 40;
    linux_addr  = 132;
}

WinXPSP2 {
    ostype      = "Windows";
    win_tasks   = 0x88;
    win_pdbase  = 0x18;
    win_pid     = 0x84;
    win_peb     = 0x1b0;
    win_iba     = 0x8;
    win_ph      = 0x18;
} 

You can specify as many domains as you wish in this configuration file. When you are done creating this file, it must be saved to /etc/xenaccess.conf.

Example Code

Included Examples

XenAccess comes with a variety of small examples to demonstrate how to use the library. These are also useful for checking that you have successfully completed the build and configuration steps described in the section above. The first argument for each example is the domain ID that you wish to view. This should be the same ID seen using the 'xm list' command. The domain you specify must also be included in the configuration file. The provided examples are listed below. The arguments passed on the command line to each program are specified in squared brackets.

In Detail: process-list.c

In order to better understand how the examples work, let's take a look at one of the more interesting examples. The process-list example displays the processes running in an operating system by walking down the linked list data structure containing process information. For each process, the name and ID are extracted and printed to stdout. To see how this is done, we will step through the code one piece at a time.

#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/mman.h>
#include <stdio.h>
#include <xenaccess/xenaccess.h>
#include <xenaccess/xa_private.h> 

The include list is not too surprising. Note that xa_private.h is included here, but this is only to access the function that prints a memory page to stdout. Most people will not need to include xa_private.h

#define TASKS_OFFSET 24 * 4
#define PID_OFFSET 39 * 4 
#define NAME_OFFSET 108 * 4
#define ActiveProcessLinks_OFFSET 0x88
#define UniqueProcessId_OFFSET 0x84
#define ImageFileName_OFFSET 0x174 

These offset values are important as they will allow us to find the necessary data within the data structures we traverse. The first three offsets are used for Linux systems and obtained by looking at the definition of task_struct. The second three offsets are used for Windows systems and obtained using windbg to view the EPROCESS struct.

    uint32_t dom = atoi(argv[1]);

    if (xa_init(dom, &xai) == XA_FAILURE){
        perror("failed to init XenAccess library");
        goto error_exit;
    } 

Next we read in the domain ID to look at. (Yes, there is no error checking here so the program will seg fault if you fail to specify a domain ID as an argument.) Then we make our first call to XenAccess. This call initializes the instance data structure. Note that we only perform this initialization step once as it is a costly function call.

From this point forward, all of the error checking code will be omited from the sake of clarity. In addition, we will only focus on the Linux version of the code. The Windows version operates in a similar fashion. To see the complete version of the code, look in the examples directory of your copy of XenAccess.

    memory = xa_access_kernel_symbol(&xai, "init_task", &offset);
    memcpy(&next_process, memory + offset + TASKS_OFFSET, 4);
    list_head = next_process;
    munmap(memory, xai.page_size); 

The kernel symbol 'init_task' points to the beginning of the process list in the Linux kernel. So we map this memory location and then copy the pointer to the next process into both the list_head and next_process variables. The task list is a circular linked list, so we will use list_head to know when we have visited every process. The next step here is to unmap the memory, since we are done with this page. This is important to remember since you can only map a limited number of pages at a time. Just as you would free memory after a malloc, you should unmap these pages after you are done using them.

    while (1){
        memory = xa_access_virtual_address(&xai, next_process, &offset);
        memcpy(&next_process, memory + offset, 4);

        if (list_head == next_process){
            break;
        }

        name = (char *) (memory + offset + NAME_OFFSET - TASKS_OFFSET);
        memcpy(&pid, memory + offset + PID_OFFSET - TASKS_OFFSET, 4);
        printf("[%5d] %s\n", pid, name);
        munmap(memory, xai.page_size);
    } 

This loop is the bulk of the program. We map the memory page associated with the next process. For this process we check to see if it is the init_task process. If so then we are done. If not, then we print out the process information. Note that we are obtaining this information directly from the task_struct in memory of the running Linux system that we are looking at. When we are done with this memory page, we unmap it and repeat the loop.

    if (memory) munmap(memory, xai.page_size);
    xa_destroy(&xai); 

The final step is cleanup. We perform a sanity check to make sure that there mapped memory pages. And then we call xa_destroy to free any memory associated with the XenAccess instance.

Running the Examples

A quick way to see XenAccess in action is to try out the example code. You should be running Xen, and running the example code as root in domain 0. You should have at least one user domain running and the configuration file setup for this domain. Note the domain ID using the 'xm list' command:

[root@bluemoon libxa]# xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     1229     2 r----- 137356.4
Fedora-HVM                                 4      384     1 -b----   2292.6
fc5                                        5      384     1 -b----     15.4
[root@bluemoon libxa]# 

Then you can run the examples as follows:

[root@bluemoon libxa]# cd examples/
[root@bluemoon examples]# ./module-list 5
ipv6
binfmt_misc
lp
parport_pc
parport
nvram
usbcore
[root@bluemoon examples]# ./module-list 4
autofs4
hidp
rfcomm
l2cap
bluetooth
sunrpc
ipv6
parport_pc
lp
parport
floppy
8139cp
8139too
mii
pcspkr
serio_raw
dm_snapshot
dm_zero
dm_mirror
dm_mod
ext3
jbd
[root@bluemoon examples]# 

Note that the example code works for both para-virtualized (i.e., PV) and fully-virtualized (i.e., HVM) domains. However, the example code uses the offsets you provide in the configuration file, and some hard coded offsets in the example code, to locate information in the running kernels. For this reason you may find that it fails on some kernels.

[root@bluemoon examples]# ./process-list 5
[    1] init
[    2] migration/0
[    3] ksoftirqd/0
[    4] watchdog/0
[    5] events/0
[    6] khelper
[    7] kthread
[    8] xenwatch
[    9] xenbus
[   15] kblockd/0
[   57] pdflush
[   58] pdflush
[   60] aio/0
[   59] kswapd0
[  578] kseriod
[  685] kpsmoused
[  710] khubd
[  978] dhclient
[ 1006] syslogd
[ 1009] klogd
[ 1021] sshd
[ 1027] mingetty
[root@bluemoon examples]# ./process-list 4
[1408237823] ?
[14941936] ?S?
[    0]
[    0]
ERROR: address not in page table
failed to map memory for process list pointer: Success
[root@bluemoon examples]# 

Here we see that process-list works on one of the domains, but not the other. This is because the examples were written with a specific kernel version in mind. Your code will need to be built to work with the specific system that you plan on viewing. Future versions of XenAccess will provide tools to help simplify the process of finding the right offset values.

Programming With XenAccess

Most people that are familiar with C and familiar with OS data structure layout will find that programming with XenAccess is pretty straight forward. You can probably just look at the example code and be off and running quickly. However, there are a few things to keep in mind when working with introspection that will make your life easier and improve the performance of your applications. This section provides some tips to help you get started.

Note: This section provides some preliminary ideas and help. If you find other tips that could be added to this section, please send a note to the mailing list.

Best Practices for Performance

Performance is a key concern when building any application that uses introspection. Since some of the library operations are costly, if you aren't careful, your application can run very slowly. In most cases, this can be avoided with a little planning. The two stratigies to keep in mind are (1) do as much work as possible during initialization, and (2) avoid algorithms that map the same page of memory repeatedly.

The first step is to do your costly work during initialization. This includes your call to xa_init. You should call this function once during the initialization of your application and then pass the instance around to all functions that need to use the XenAccess API. XenAccess is designed to do as much work as possible during initialization, so a call to xa_init is costly. If you call it several times then your application performance will certainly suffer.

The second step is to be mindful of what you are doing. For example, an algorithm that searches memory looking for a particular pattern should not be using XenAccess to access each virtual address in the search range independently. Instead, use XenAccess to map a page of memory once. Then look at each address on that page before unmapping it and proceeding to the next page. These considerations can improve your application's performance by several orders of magnitude.

Finally, you can enable the debug output (see Debugging section below) to identify when XenAccess is getting cache hits and cache misses. The default cache size is 25 entries, however some applications may benefit from a larger cache. You can adjust the cache size by changing the XA_CACHE_SIZE variable in xa_private.h and then recompiling XenAccess. However, keep in mind that a larger cache size does not always equal better performance. Experiment to see what works best for your particular application.

Debugging

XenAccess includes the ability to show debugging output. This output is very verbose, but may be useful when tracking down bugs in your application or in XenAccess itself. To enable the debug output, uncomment the XA_DEBUG variable near the top of the xenaccess.h file. After uncommenting this variable, you will need to recompile XenAccess (and, optionally, reinstall XenAccess). With the debug output enabled, you will see lots of information on stdout about XenAccess's operation.

If you are requesting help from the mailing list, please send the debug output along with your question as it will be easier to diagnose your problem this way.

Bridging the Semantic Gap

XenAccess provides simplified access to a user domain's memory by allowing you to access virtual and physical addresses. However, knowing which address to access can be a challenging problem by itself. If you were working on the machine locally, you would have a richer set of API calls to gather the information you need. Bridging this semantic gap requires learning details about how an operating system and its programs are loaded into memory.

When available, source code provides an invaluable resource for understanding how data is organized in memory. When source code is not available, or when need a higher-level understanding of the system's operation, then I recommend finding a useful reference book. The two operating system references that seem to be most useful when working with XenAccess are listed below:

In addition to these books, the forensics community has done a lot of work in volitale memory analysis. This body of knowledge is very relavent to work with XenAccess because it involves a similar view of the system's memory. The links below are some recommended sources of information within this forensics community:

Troubleshooting

When troubleshoot your application, it is best to be able to see what is going on. If you think that the problem is within XenAccess, you can try enabling the debug output to identify the problem. If you think that you have found a bug in XenAccess, please send an email with this debug output and a description of the bug to the mailing list.

If you want to see the memory maping by your application, consider using the print_hex function, which will require you to include xa_private.h in your application. This function allows you to easily print the hex and ascii values from a region of memory to stdout, which can often simplify debugging.


Generated on Thu Apr 10 11:54:35 2008 for XenAccess by  doxygen 1.5.5