Our colleagues over at Core Security have been doing great things with Cobalt Strike, making use of it in their own engagements. They wrote up this post on creating Cobalt Strike Beacon Object Files using the MinGW compiler on Linux. It covers several ideas and best practices that will increase the quality of your BOFs.
Flexibility
Compiling to Both Object Files and Executables
While writing a BOF is great, it’s always worth making the code compile to both BOF and EXE.
This provides a lot more options: we could run our capability outside Beacon by just writing the EXE to disk and executing it. We could then convert it into position independent shellcode using donut and run it from memory.
Usually, calling a Windows API from Beacon Object File would appear as follows:
program.h
WINBASEAPI size_t __cdecl MSVCRT$strnlen(const char *s, size_t maxlen);
program.c
int length = MSVCRT$strnlen(someString, 256);
BeaconPrintf(CALLBACK_OUTPUT, "The variable length is %d.", length);
Makefile
BOFNAME := program
CC_x64 := x86_64-w64-mingw32-gcc
all:
$(CC_x64) -c source/program.c -o compiled/$(BOFNAME).x64.o -masm=intel -Wall
However, we would like to create both a BOF and an EXE file using the same file. A practical option to achieve the creation of both files is to add a conditional compilation clause as shown below. In this example, we are using BOF
:
Makefile
BOFNAME := program
CC_x64 := x86_64-w64-mingw32-gcc
all:
$(CC_x64) -c source/program.c -o compiled/$(BOFNAME).x64.o -masm=intel -Wall -DBOF
$(CC_x64) source/program.c -o compiled/$(BOFNAME).x64.exe -masm=intel -Wall
program.h
#ifdef BOF
WINBASEAPI size_t __cdecl MSVCRT$strnlen(const char *s, size_t maxlen);
#define strnlen MSVCRT$strnlen
#endif
#ifdef BOF
#define PRINT(...) { \
BeaconPrintf(CALLBACK_OUTPUT, __VA_ARGS__); \
}
#else
#define PRINT(...) { \
fprintf(stdout, __VA_ARGS__); \
fprintf(stdout, "\n"); \
}
#endif
program.c
int length = strnlen(someString, 256);
PRINT("The variable length is %d.", length);
Finally, in our program.c file, we would define the “go” (BOF’s entry point) and “main” functions:
program.c
#ifdef BOF
void go(char* args, int length)
{
// BOF code
}
#else
int main(int argc, char* argv[])
{
// EXE code
{
#endif
Stealth
Syswhispers2 Integration
syswhispers2 is an awesome implementation of direct syscalls. However, if we take a look under the hood, we can see that it uses a global variable to achieve its objective. Unfortunately, global variables do not work very well with Beacon. This is because Beacon Object Files don’t have a .bss section, which is where global variables are typically stored.
A useful trick, originally suggested by Twitter user @the_bit_diddler, is to move the global variables to the .data section using a compiler directive, as shown below:
syscalls.c (before)
SW2_SYSCALL_LIST SW2_SyscallList;
syscalls.c (after)
SW2_SYSCALL_LIST SW2_SyscallList __attribute__ ((section(".data")));
This small change will allow the use of the syswhispers2 logic in a BOF.
In addition to the global variables change, there are other minor changes that need to be made so that the the code of syswhispers2 can compile with MinGW. For example, the API hashes format needs to be changed from 0ABCD1234h to: 0xABCD1234. The tool InlineWhispers should take care of the rest.
Using direct syscalls is a powerful technique to avoid userland hooks. Ironically, using them could get us caught.
There are at least two ways of detecting direct syscalls: dynamic and static.
The dynamic method is simply detecting that a syscall was called from a module that is not ntdll.dll. The static method is to find a syscall instruction by inspecting the program’s code and memory. How can we avoid both these detections? The answer is to call our syscalls from ntdll.dll.
First, we must locate where ntdll.dll is loaded. Luckily, syswhispers2 already has the code to do just that. Then, we can parse its headers and locate the code section.
Hiding the Use of syscalls
Once we know code section base address and size of ntdll.dll, all we need to do is search for the opcodes of the instructions syscall; ret. In x64, the bytes we are looking for are: { 0x0f, 0x05, 0xc3 }.
While it is true that EDRs and other tools hook (overwrite) syscalls in ntdll.dll, they certainly do not hook all existing syscalls, so we are guaranteed to find at least one occurrence of these three bytes. We might even find them by chance in a misaligned offset.
Once we find the syscall; ret bytes, we can save the address in a global variable (stored in the .data section). That way, we only need to find it once.
All what we have just described can be seen in the following code sequence:
syscalls.c
#ifdef _WIN64
#define PEB_OFFSET 0x60
#define READ_MEMLOC __readgsqword
#else
#define PEB_OFFSET 0x30
#define READ_MEMLOC __readfsdword
#endif
PVOID SyscallAddress __attribute__ ((section(".data"))) = NULL;
__attribute__((naked)) void SyscallNotFound(void)
{
__asm__(" SyscallNotFound: \n\
mov eax, 0xC0000225 \n\
ret \n\
");
}
PVOID GetSyscallAddress(void)
{
#ifdef _WIN64
BYTE syscall_code[] = { 0x0f, 0x05, 0xc3 };
#else
BYTE syscall_code[] = { 0x0f, 0x34, 0xc3 };
#endif
// Return early if the SyscallAddress is already defined
if (SyscallAddress)
{
// make sure the instructions have not been replaced
if (!strncmp((PVOID)syscall_code, SyscallAddress, sizeof(syscall_code)))
return SyscallAddress;
}
// set the fallback as the default
SyscallAddress = (PVOID) SyscallNotFound;
// find the address of NTDLL
PSW2_PEB Peb = (PSW2_PEB)READ_MEMLOC(PEB_OFFSET);
PSW2_PEB_LDR_DATA Ldr = Peb->Ldr;
PIMAGE_EXPORT_DIRECTORY ExportDirectory = NULL;
PVOID DllBase = NULL;
PVOID BaseOfCode = NULL;
ULONG32 SizeOfCode = 0;
// Get the DllBase address of NTDLL.dll. NTDLL is not guaranteed to be the second
// in the list, so it's safer to loop through the full list and find it.
PSW2_LDR_DATA_TABLE_ENTRY LdrEntry;
for (LdrEntry = (PSW2_LDR_DATA_TABLE_ENTRY)Ldr->Reserved2[1]; LdrEntry->DllBase != NULL; LdrEntry = (PSW2_LDR_DATA_TABLE_ENTRY)LdrEntry->Reserved1[0])
{
DllBase = LdrEntry->DllBase;
PIMAGE_DOS_HEADER DosHeader = (PIMAGE_DOS_HEADER)DllBase;
PIMAGE_NT_HEADERS NtHeaders = SW2_RVA2VA(PIMAGE_NT_HEADERS, DllBase, DosHeader->e_lfanew);
PIMAGE_DATA_DIRECTORY DataDirectory = (PIMAGE_DATA_DIRECTORY)NtHeaders->OptionalHeader.DataDirectory;
DWORD VirtualAddress = DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;
if (VirtualAddress == 0) continue;
ExportDirectory = SW2_RVA2VA(PIMAGE_EXPORT_DIRECTORY, DllBase, VirtualAddress);
// If this is NTDLL.dll, exit loop.
PCHAR DllName = SW2_RVA2VA(PCHAR, DllBase, ExportDirectory->Name);
if ((*(ULONG*)DllName | 0x20202020) != 0x6c64746e) continue;
if ((*(ULONG*)(DllName + 4) | 0x20202020) == 0x6c642e6c)
{
BaseOfCode = SW2_RVA2VA(PVOID, DllBase, NtHeaders->OptionalHeader.BaseOfCode);
SizeOfCode = NtHeaders->OptionalHeader.SizeOfCode;
break;
}
}
if (!BaseOfCode || !SizeOfCode)
return SyscallAddress;
// try to find a 'syscall' instruction inside of NTDLL's code section
PVOID CurrentAddress = BaseOfCode;
PVOID EndOfCode = SW2_RVA2VA(PVOID, BaseOfCode, SizeOfCode - sizeof(syscall_code) + 1);
while ((ULONG_PTR)CurrentAddress <= (ULONG_PTR)EndOfCode)
{
if (!strncmp((PVOID)syscall_code, CurrentAddress, sizeof(syscall_code)))
{
// found 'syscall' instruction in ntdll
SyscallAddress = CurrentAddress;
return SyscallAddress;
}
// increase the current address by one
CurrentAddress = SW2_RVA2VA(PVOID, CurrentAddress, 1);
}
// syscall entry not found, using fallback
return SyscallAddress;
}
syscalls.h
EXTERN_C PVOID GetSyscallAddress(void);
In the extremely unlikely scenario in which we do not find ANY occurrence of these three bytes in the code section of ntdll.dll, we can instead use our own function: SyscallNotFound. This simply returns STATUS_NOT_FOUND. We could implement a syscall; ret, but keep in mind that we want to avoid having the syscall instruction in our code in order to evade static analysis.
Once we have the memory address of interest, all we need to do is to modify the assembly of our syscall functions to jump to this memory address:
push rcx ; save volatile registers
push rdx
push r8
push r9
sub rsp, 0x28 ; allocate some space on the stack
call GetSyscallAddress ; call the C function and get the address of the 'syscall' instruction in ntdll.dll
add rsp, 0x28
push rax ; save the address in the stack
sub rsp, 0x28 ; allocate some space on the stack
mov ecx, 0x0123ABCD ; set the syscall hash as the parameter
call SW2_GetSyscallNumber ; get the id of the syscall using syswhispers2
add rsp, 0x28
pop r11 ; store the address of the 'syscall' instruction on r11
pop r9 ; restore the volatile registers
pop r8
pop rdx
pop rcx
mov r10, rcx
jmp r11 ; jump to ntdll.dll and call the syscall from there
And voilà, we use direct syscalls from a valid module (ntdll.dll) without having a syscall instruction in our code ????.
Stripping the Debug Symbols
While this step is not critical, stripping your binaries is clever enough that it is worth the extra step. Once completed, they are not only a lot harder to analyze but they also get smaller in size.
All we need to do is modify the Makefile to look as follows:
BOFNAME := program
CC_x64 := x86_64-w64-mingw32-gcc
STRIP_x64 := x86_64-w64-mingw32-strip
all:
$(CC_x64) -c program.c -o compiled/$(BOFNAME).x64.o -masm=intel -Wall -DBOF
$(STRIP_x64) --strip-unneeded compiled/$(BOFNAME).x64.o
$(CC_x64) program.c -o compiled/$(BOFNAME).x64.exe -masm=intel -Wall
$(STRIP_x64) --strip-all compiled/$(BOFNAME).x64.exe
While the EXE does end up being a smaller, stripping the BOF doesn’t reduce its size significantly (only around 500 bytes).
Once the debugging symbols are stripped, if the program is compiled without changing the code, the resulting object file and executable will be the same regardless of who compiled it. This means that everyone will get the same object files after compiling it.
Is that a bad thing? Potentially, but only if fingerprinting is a concern. The code could be slightly modified and recompiled. For example, the seed of syswhispers2 could be changed. If code is run from a Beacon or in memory in the form of shellcode, fingerprinting should not be worrisome, as static analysis in those cases is not possible.
Compatibility
Supporting x86 might seem hard and pointless, but we shouldn’t limit ourselves and have every 32-bit machine out of our reach. Supporting x86 is a fun challenge and pays off in the end.
Code Logic
We’ll begin by introducing some conditional compilation clauses based on the architecture:
#if _WIN64
// x64 version of some logic
#else
// x86 version of some logic
#endif
If we want to add some code that is exclusive to x64:
#if _WIN64
// some code only for x64
#endif
If we want to add some code that is exclusive to x86:
#ifndef _WIN64
// some code only for x86
#endif
X86 syscall Support
To support syscalls in x86, we will have to deal with a few difficulties that are very manageable.
Function Names Within x86 Assembly
The main issue that we can encounter trying to call the C functions SW2_GetSyscallNumber and GetSyscallAddress from x86 inline assembly, results in these compiler errors:
/usr/lib/gcc/i686-w64-mingw32/11.2.0/../../../../i686-w64-mingw32/bin/ld: /tmp/ccbjuGDN.o:program.c:(.text+0x68): undefined reference to `GetSyscallAddress'
/usr/lib/gcc/i686-w64-mingw32/11.2.0/../../../../i686-w64-mingw32/bin/ld: /tmp/ccbjuGDN.o:program.c:(.text+0x73): undefined reference to `SW2_GetSyscallNumber'
There is some GCC documentation which explains that, for some reason, in x86 inline assembly, C functions (and variables) are prepended with an underscore to their name. So, in this case, GetSyscallAddress becomes _GetSyscallAddress and SW2_GetSyscallNumber becomes _SW2_GetSyscallNumber.
Instead of calling them with the underscore, we can just adapt their definition to specify their name in assembly, like this:
syscalls.h
EXTERN_C DWORD SW2_GetSyscallNumber(DWORD FunctionHash) asm ("SW2_GetSyscallNumber");
EXTERN_C PVOID GetSyscallAddress(void) asm ("GetSyscallAddress");
We also need to do the same with the definitions for all the syscalls in syscalls.h. For example, here’s how we can modify NtOpenProcess:
syscalls.h (before)
EXTERN_C NTSTATUS NtOpenProcess(
OUT PHANDLE ProcessHandle,
IN ACCESS_MASK DesiredAccess,
IN POBJECT_ATTRIBUTES ObjectAttributes,
IN PCLIENT_ID ClientId OPTIONAL);
syscalls.h (after)
EXTERN_C NTSTATUS NtOpenProcess(
OUT PHANDLE ProcessHandle,
IN ACCESS_MASK DesiredAccess,
IN POBJECT_ATTRIBUTES ObjectAttributes,
IN PCLIENT_ID ClientId OPTIONAL) asm ("NtOpenProcess");
Once this is done, the weird x86 naming system should work fine.
Syscalls With Conflicting Types
There are some syscalls that fail to compile in x86, and produce an error message like:
error: conflicting types for ‘NtClose’;
While there are surely others, these syscalls are confirmed to have this issue:
- NtClose
- NtQueryInformationProcess
- NtCreateFile
- NtQuerySystemInformation
- NtQueryObject
It appears that in x86, MinGW already has a definition of these functions somewhere. To fix this, we just need to rename the troubling syscalls by prepending an underscore to their name in the x86 version.
program.h
In program.c, we can call these functions normally, without prepending the underscore to their name.
X86 Assembly Code
For the assembly code, we’ll need to update syscalls-asm.h to look as follows:
syscalls-asm.h
Finally, the x86 assembly will look like this:
After all these changes, we have syscalls x86 support.
WoW64 Support?
WoW64 stands for Windows on Windows64, which means there are 32-bit programs running on 64-bit Windows machines.In WoW64 processes, syscalls are not called via a syscall or sysenter instruction. Instead, a jump to fs:[0xc0] is performed. Understanding the way this works requires a long explanation, but for the purpose of this article, all we need to know is that it translates syscalls from 32 to 64-bit so that the kernel can understand them.
One quick way of “supporting” syscalls on WoW64 processes is to perform the same jump from our code. However, there are a few drawbacks when doing this. First, this is by no means a direct syscall. EDRs can hook these calls. Additionally, in some syscalls that use pointers, we will not be able to reference addresses above 32-bit.
Truly supporting direct syscalls for WoW64 processes would require us to transition via a far jmp instruction into 64-bit code, translate the parameters to their 64-bit counterparts, adjust the calling convention, set the stack alignment and more. These actions alone could make up an entire post.
That being said, jumping to fs:[0xc0] is an easy trick and at least we would have some support for WoW64, which might be useful for some scenarios.
To detect if our program is running as WoW64 process, we’ll define a function called IsWoW64:
syscalls-asm.h
#if _WIN64
#define IsWoW64 IsWoW64
__asm__("IsWoW64: \n\
mov rax, 0 \n\
ret \n\
");
#else
#define IsWoW64 IsWoW64
__asm__("IsWoW64: \n\
mov eax, fs:[0xc0] \n\
test eax, eax \n\
jne wow64 \n\
mov eax, 0 \n\
ret \n\
wow64: \n\
mov eax, 1 \n\
ret \n\
");
#endif
syscalls.h
EXTERN_C BOOL IsWoW64(void) asm ("IsWoW64");
program.c
if(IsWoW64())
{
PRINT("This is a 32-bit process running on a 64-bit machine!\n");
}
If detection is a concern when running under a WoW64 context, just call IsWow64() and bail out if it returns as true.
This can be checked on the .CNA file in Cobalt Strike:
program.cna
$barch = barch($1);
$is64 = binfo($1, "is64");
if($barch eq "x86" && $is64 == 1)
{
berror($1, "This program does not support WoW64");
return;
}
We’ll also need to make a small change to the function GetSyscallAddress in order to set the syscall address to fs:[0xc0] if the process Is WoW64:
PVOID GetSyscallAddress(void)
{
#ifdef _WIN64
BYTE syscall_code[] = { 0x0f, 0x05, 0xc3 };
#else
BYTE syscall_code[] = { 0x0f, 0x34, 0xc3 };
#endif
#ifndef _WIN64
if (IsWoW64())
{
// if we are a WoW64 process, jump to WOW32Reserved
SyscallAddress = (PVOID)READ_MEMLOC(0xc0);
return SyscallAddress;
}
#endif
// Return early if the SyscallAddress is already defined
if (SyscallAddress)
{
// make sure the instructions have not been replaced
if (!strncmp((PVOID)syscall_code, SyscallAddress, sizeof(syscall_code)))
return SyscallAddress;
}
// set the fallback as the default
SyscallAddress = (PVOID)DoSysenter;
…
Finally, we’ll update our Makefile to compile for both 64 and 32-bit.
Makefile
BOFNAME := program
CC_x64 := x86_64-w64-mingw32-gcc
CC_x86 := i686-w64-mingw32-gcc
STRIP_x64 := x86_64-w64-mingw32-strip
STRIP_x86 := i686-w64-mingw32-strip
all:
$(CC_x64) -c program.c -o compiled/$(BOFNAME).x64.o -masm=intel -Wall -DBOF
$(STRIP_x64) --strip-unneeded compiled/$(BOFNAME).x64.o
$(CC_x86) -c program.c -o compiled/$(BOFNAME).x86.o -masm=intel -Wall -DBOF
$(STRIP_x86) --strip-unneeded compiled/$(BOFNAME).x86.o
$(CC_x64) program.c -o compiled/$(BOFNAME).x64.exe -masm=intel -Wall
$(STRIP_x64) --strip-all compiled/$(BOFNAME).x64.exe
$(CC_x86) program.c -o compiled/$(BOFNAME).x86.exe -masm=intel -Wall
$(STRIP_x86) --strip-all compiled/$(BOFNAME).x86.exe
clean:
rm compiled/$(BOFNAME).*.*
Conclusion
To summarize, this post explored several technical solutions to achieve the following objectives:
- Create executables as well as BOF using the same codebase
- Use syscalls from ntdll.dll instead of using them directly from an unknown module
- Strip executables to make them smaller and harder to analyze
- Run on both 64-bit and 32-bit
- Have partial support for syscalls in WoW64
If you want to see an example of all this working together, check out nanodump.