Home » Blog » Revisiting the UDRL Part 3: Beacon User Data
Revisiting the UDRL Part 3: Beacon User Data
Wednesday 04 September, 2024
The UDRL and the Sleepmask are key components of Cobalt Strike’s evasion strategy, yet historically they have not worked well together. For example, prior to CS 4.10, Beacon statically calculated its location in memory using a combination of its base address and its section table. This calculation was then modified depending on the contents of the user’s malleable C2 profile and passed to the Sleepmask irrespective of the current loader (e.g. default vs UDRL). Therefore, if the UDRL’s loading strategy did not match the malleable C2 settings, the default Sleepmask would either crash or leave parts of Beacon unmasked and susceptible to static signatures.
In CS 4.10, we sought to improve the interface between UDRLs and the Sleepmask and decouple it from the malleable C2 profile. As a result, we updated Beacon User Data (BUD) to include information about memory allocated by the loader. This means Beacon can pass accurate section information to the Sleepmask, at runtime, which ensures that it is masked correctly and removes the need for static calculations/heuristics. In addition, it also makes it possible to track arbitrary memory allocations that can be used for things like BOFs/Sleepmasks/additional Postex capabilities.
The primary intention of this post is to demonstrate the UDRL’s role in runtime masking and show how Cobalt Strike’s two most important evasion tools interact. We will first demonstrate how to track Beacon with BUD. We will then load an External C2 DLL at the same time as Beacon and mask both DLLs at runtime with Sleepmask-VS. For simplicity, we will not cover masking the Sleepmask itself.
To accompany this post, we have added the extc2-loader example to UDRL-VS and ExternalC2Sleep() to Sleepmask-VS. It is therefore important to ensure that both tools are compiled and loaded into Cobalt Strike to utilize all the functionality described here.
Note: UDRL-VS has been tested on Visual Studio Community version 17.11.2 and Windows 10 SDK 10.0.22000.0. Please make sure to use the correct Windows 10 SDK as we have noticed some recent MSVC changes which can impact the project.
Beacon User Data
Beacon User Data (BUD) was originally introduced in CS 4.9 to create a mechanism to pass information between a UDRL and Beacon. Initially, it was intended to let users resolve their own syscall information to avoid using Beacon’s default methods of resolution. However, we see this feature becoming an essential part of UDRL development moving forward.
In CS 4.10, we updated BUD so that users could track the memory allocated by their UDRLs. This functionality was primarily introduced to:
Ensure the Sleepmask has accurate information about the memory it needs to mask.
Support the cleanup of the allocated memory.
To fulfill these requirements, BUD follows Microsoft’s abstractions around virtual memory and tracks both the initial allocation to facilitate cleanup and any sections within it to support masking. We refer to these as “regions” and “sections” and use the following ALLOCATED_MEMORY, ALLOCATED_MEMORY_REGION and ALLOCATED_MEMORY_SECTION structures to define them.
Note: The ALLOCATED_MEMORY structure encompasses six independent ALLOCATED_MEMORY_REGIONs. These can then be broken down into eight individual ALLOCATED_MEMORY_SECTIONs.
To simplify this approach to tracking memory, we have provided some helper functions in the UDRL-VS library. These functions abstract some of the details, but can easily be replaced with custom implementations as required.
TrackAllocatedMemoryRegion() – track an initial allocation of memory.
TrackAllocatedMemorySection() – track a section within an existing region.
TrackAllocatedMemoryBuffer() – a wrapper around TrackAllocatedMemoryRegion() and TrackAllocatedMemorySection().
In the following code example, we allocate a “region” of memory for the loaded Beacon (via VirtualAlloc()). We then initialize the relevant structures and use TrackAllocatedMemoryRegion() to save the information to BUD.
It is also important to track each PE section independently as this information is required by the Sleepmask. To simplify this process, we have added the following CopyPESectionsAndTrackMemory() function to the UDRL-VS library. It is a slightly modified version of the existing CopyPESections() function, however, it uses TrackAllocatedMemorySection() to automatically save the section information.
To pass the completed USER_DATA structure to Beacon, we simply add an additional call to DllMain() to our loader with the ul_reason_for_call set to DLL_BEACON_USER_DATA.
Note: Beacon copies this information locally so that the BUD structures do not need to remain in memory.
And that’s it! Once Beacon is up and running, it will operate in the same fashion as before. However, when it is time to use the Sleepmask, it will have a much more accurate picture of the loaded Beacon image. The Sleepmask will then take the information in BUD and use it to apply runtime masking.
This approach allows users to create more generic masking capabilities that can automatically handle different memory layouts. For example, the obfuscation-loader uses a custom PE header which means the original is not present in the loaded image. Previously, this missing PE header would have required changing the Sleepmask code to avoid a crash. However, BUD makes it possible to record this information at load time.
Case Study: BUD vs External C2
Now that we have covered the basics, we can demonstrate how to use BUD to track an additional memory allocation. In the following sections, we will load an External C2 DLL at the same time as Beacon and mask them both at runtime with Sleepmask-VS.
Raphael Mudge originally introduced External C2 in November 2016 to allow operators to create custom command and control channels. Whilst this feature was never announced as part of a release, there are several public projects that are built on top of External C2, most notably C3, which provides a complete framework for creating custom C2 channels.
At a high-level, External C2 is a specification that allows third-party programs to act as a communication layer for Cobalt Strike’s Beacon payload. In practice, this means using an SMB Beacon to communicate with a third-party client over a named pipe. The third-party client then communicates with a third-party controller, which interacts with Cobalt Strike’s External C2 server. This specification makes it possible to tunnel Beacon traffic over any service that allows you to read/write data.
As part of the original implementation, External C2 required the third-party controller to request a stage from the External C2 server before it could begin sending/receiving data. In addition, this stage was provided by the team server which meant that whilst the transformations in the malleable C2 profile were applied, it was not possible to use Aggressor Script to apply UDRLs/Sleepmasks/custom obfuscation and masking.
In CS 4.10, we added a “pass thru” mode to External C2 that allows the third-party controller to begin sending/receiving data immediately without requesting a stage. As a result, it is now possible to export an SMB Beacon from the CS client and use it in combination with a third-party client/controller to connect to the team server. This provides a higher degree of flexibility as it makes it possible to create a single payload file that contains both Beacon and an External C2 channel. In addition, it makes it possible to use Aggressor Script to transform the exported payload.
Introducing extc2-loader
We have added an extc2-loader example to UDRL-VS. In the extc2-loader folder there are two projects: the first is extc2-dll which ports Raphael’s original External C2 example to a DLL and the second is the extc2-loader.
The extc2-loader is a simple reflective loader that abstracts most of its functionality into a separate function (ExternalC2LoaderLoadDll()) so it can be called multiple times to load each DLL. It is a Double Pulsar/sRDI style reflective loader which means that it is prepended in front of a single payload file. To ensure that the loader can easily identify the two DLLs, the extc2-loader’s prepend-udrl.cna creates a payload consisting of the loader, the size of Beacon, Beacon and the External C2 DLL.
This approach makes it possible for the extc2-loader to determine the base address of Beacon and use its length to find the base address of the raw External C2 DLL as well.
Once it has located the base address of each DLL, it can then load them independently via consecutive calls to ExternalC2LoaderLoadDll(). As part of this process, it also tracks the memory and passes the information to Beacon via BUD.
Note: To easily differentiate between these two regions of memory, we set the purpose field of Beacon’s region to PURPOSE_BEACON_MEMORY and the purpose of the External C2 DLL to an arbitrary value of 2000 to demonstrate using a custom ALLOCATED_MEMORY_PURPOSE value. This makes it possible to easily identify the region of memory in the Sleepmask.
To launch the capability, we call the External C2 DLL’s entry point to initialize the CRT and ensure that its startup routines have finished. We then resolve its exported go() function and pass it to CreateThread() along with a pointer to BUD’s custom data field. We then call Beacon’s entry point to do the same initialization, pass it a pointer to the USER_DATA structure and start Beacon.
To reliably apply runtime masking, we had to find a way to synchronize the threads to ensure that they “Sleep” at the same time. It is safe to mask Beacon when execution reaches the Sleepmask as the thread is no longer executing the Beacon code. However, this is not true for the External C2 DLL which is either waiting for the External C2 server to send data or waiting for Beacon to send it.
To overcome this, we modified Raphael’s original External C2 example to use non-blocking calls when reading data from the pipe/socket. This “non-blocking” approach means the External C2 DLL can check if data is available instead of waiting for something to arrive. For example, the following ReadFrameFromBeaconPipe() function uses PeekNamedPipe() to check for data.
The extc2-loader also creates four anonymous event objects. A handle to each event is then saved to BUD’s custom data field and passed to the External C2 DLL’s go() function when the thread is created. This makes it possible to retrieve the same information from within the Sleepmask via BeaconGetCustomUserData().
This approach puts Sleepmask-VS in the driving seat. It can mask Beacon and then use the event objects created by the loader to synchronize the threads. In the below example, Sleepmask-VS:
Sets ExtC2StopEvent to instruct the External C2 DLL to wait.
Waits for the External C2 DLL to signal that it has entered a waiting state (ExtC2SleepEvent).
Masks the External C2 DLL’s PE sections.
Sleeps for three seconds.
Unmasks the External C2 DLL’s PE sections.
Signals ExtC2ContinueEvent to let the External C2 DLL continue execution.
ExternalC2Sleep() is called from within the Sleepmask’s PivotSleep() function shown below. This makes it possible to keep Beacon masked and continuously mask/unmask the External C2 DLL whilst it waits to receive data.
* A small Sleep between checking the pipe for data
27
* for default pivot Sleep and also gives the External C2
28
* client time to process any requests after waking up.
29
*/
30
Sleep(500);
31
}
32
}
33
return;
34
}
After executing our payload, we can see in ProcessHacker that whilst our memory is RWX (it is an example), it has all been sufficiently masked to avoid simple static signatures.
Note: You may also be wondering why the start of the hex dump of the two DLLs looks the same. This is because we used the same key to mask both DLLs (sleepmaskinfo->maskKey) and we’re seeing the masked DOS/PE header. The key passed to the Sleepmask is randomly generated which will likely be sufficient. However, it would also be trivial to use different keys.
Conclusion
That brings us to the end of this post, we hope that this has demonstrated the power of the UDRL and the Sleepmask and their central role in Cobalt Strike’s evasion strategy. We also hope it has demonstrated why users should start to think of the UDRL and the Sleepmask together and ways in which they can interoperate to create more advanced capabilities.
The code shown here is now available in the UDRL-VS library in the Arsenal Kit. To try it out, simply open the solution and compile the Release build of both the extc2-loader and extc2-dll. You can then load the ./bin/extc2-loader/prepend-udrl.cna script into the Cobalt Strike console and export an artefact.