LoadLibrary madness: dynamically load WinHTTP.dll

For the last few weeks, I have been developing a full custom Command and Control (C2). This C2 uses several Windows DLL for network communication and specially the WINHTTP.DLL one to handle HTTP requests used for the HTTP and HTTPS listener.

As everyone knows, when developing a C2 and the corresponding agent, OPSEC must be the priority, so the agent code must rise as few events (ETW) as possible.

The most common way to increase OPSEC when using external DLL is to perform dynamic loading to avoid getting the loaded DLL name in the source code. This can be done using the LoadLibrary Win32 API.

This API allows a program to load a specific DLL from the disk. However, the drawback is that LoadLibrary raises several events and telemetry an EDR can analyze to detect the malicious C2 agent.

In order to avoid this kind of event, I chose to implement a custom LoadLibrary that will not raise such events.

 

State of the art – LoadLibrary

I will not go too much deeper in the implementation details, as this has already been documented several times in blogposts[1] or books (Windows Internals Part 1).

The goal here is to create a function that takes as an input a DLL path and loads the DLL in memory. Doing it manually has a lot of advantages:

  • Limits ETW and Microsoft telemetry
  • More choices in the way sections are allocated and written.
  • Possibility to hide malicious loaded DLL when not used.

However, there are a lot of edge cases that could make the custom loader unreliable as it was mentioned in the SpecterOps blogpost PerfectLoader[2]:

The quality of these reimplementations may be judged by comparing the feature set of these custom loaders against what the OS’s native loader supports. As such, the native OS loader may be considered a “perfect loader,” but it should not be considered the only perfect loader.

Basic implementation

The basic implementation consists in just copying the DLL image in memory, performing relocation, importing imported modules and resolving the IAT entries.
The different steps can be found in the Windows Internal Part 1 book (page 178) and a more described implementation can be found here[3].
This is the most common way to load a DLL. Once the DLL is loaded as-is in memory, it can be used for basic usages. However, any use of standard Win32API against this DLL such as GetModuleHandle or GetProcAddress will fail.
This implementation does not implement any additional feature provided by the Windows DLL loader: it just performs a textbook DLL loading.

 

Fixing compatibility with Microsoft WIN32API

The previous implementation has the merit of working and it helped me out more times I can count. However, it is not reliable; the most important edge case being the DLL cannot be searched using GetModuleHandle or LoadLibrary.

Therefore, if the others DLL need access to the loaded DLL, they will not find it with the standard Win32API and will load it again using LoadLibrary leading to a nice ETW event: all we wanted to avoid in the first place.

Batsec[4] wrote an article[5] on how the DLL can be loaded in memory and still be compatible with the Microsoft Win32 API (at least GetProcAddress, LoadLibrary and GetModuleHandle) without raising a bunch of events.

His research shows that when a DLL is loaded by the standard Windows DLL loader, it does not just load the image in memory and the loader will perform at least two additional actions:

  • Add the DLL in the PEB linked list that contains all the DLL loaded by a process.
  • Create a hash identifying the DLL and adding it to another structure called LdrpHashTable

During the loading process, the DLL loader, calls the LdrpInsertDataTableEntry function. This function creates a hash identifying the DLL and adds it to the LdrpHashTable structure as shown in the following figure:

Figure 1: use of LdrpHashTable during DLL loading

This mechanism has been implemented by Microsoft to ease and speedup DLL search through a read and black binary tree. This structure allows the search of a DLL in O(log(n)) instead of O(n) with the previous linked list. This mechanism will not be explained here but can be seen in the DarkLoadLibrary project in the FindModuleBaseAddressIndex function.

Adding the DLL in the PEB linked list AND in the LdrpHashTable can be seen as fully registering the DLL and makes it known to the process.

Once this link has been established, the DLL can be searched, freed, or copied through the Win32API.

 

Problems with this implementation

When I saw this implementation, I thought that all my problems were solved and started reimplementing it on my side to understand and customize the process.

For a moment it worked well. All the DLL I loaded with worked out of the box and no specific event regarding the loading of an additional DLL were raised by the agent.

The troubles begin when I tried to dynamically load a specific DLL: WinHTTP.dll.

The DLL is successfully loaded, and the majority of functions worked well, but one function did not want to work: WinHTTPOpen.

This function is used to initialize the environment and prepare the structures that will be used by the other network API used to perform an HTTP connection. So, without this function, it was not possible to perform any HTTP communication through the WinHTTP API.

When I called the WinHTTPOpen function, the call failed with the error code 126. This error code is related to a missing DLL which does not make any sense as all the DLL were successfully loaded.

 

Dive into WinHTTP.DLL madness

Macroscopic investigation

The error code hinted a problem with a DLL that has not been loaded, so my first reflex was to monitor the process using Procmon, looking for an imported DLL that could have failed to be loaded.

However, even when comparing the DLL loaded with the standard LoadLibrary and the ones loaded through the custom loader, no differences could explain the error code 126.

 

Microscopic investigation

For a moment I let this problem aside and continue the development of the agent, but it still bothered me, and I had no idea how I could debug it. Until one day, I took my sanity away, and decided to just decompile the WinHTTP.DLL and debug it step by step until I saw the error code 126 popping in one of the registers.

 

Finding the initial problem

With the step by step debug, I quickly found that the problem occurred in the INTERNET_SESSION_HANDLE_OBJECT::SetProxySettings function in the WINHTTP.DLL file.

Following the call stack leads me to the following functions:

  • INTERNET_HANDLE_BASE::SetProxySettingsWithInterfaceIndex
  • WxReferenceDll
  • TakeSingleDllRef

In the TakeSingleDllRef I found the following piece of code:

Figure 2: TakeSingleDllRef code

The 126 error code I got when running the WinHTTPOpen function is generated by the GetModuleHandleExA function.

This function is usually used to retrieve the base address of an already loaded DLL by its DLL name. However, here, two unusual parameters are given to this API:

  • dwFlags: 4 instead of 2
  • dllName: the address of the current function instead of the name of the DLL to search for.

Looking at the Microsoft documentation shows that dwFlags 4 is named GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS and thus explains why an address is given instead of a DLL name.

Indeed, when this flag is passed to the GetModuleHandleExA, the function will not search for the DLL base address by its name but will find the DLL that contains the given function.

 

Narrow down the problem

The problem comes from the GetModuleHandleExA function. This is interesting because during my tests the custom loader worked fine with GetModuleHandle (that call GetModuleHandleEx under the hood with dwFlags 2 instead of 4).

So, I decompiled the KERNELBASE.DLL to find the difference of implementation when dwFlags 4 is passed to GetModuleHandleEx.

The callstack shows that GetModuleHandleEx called the BasepGetModuleHandleExW function.

Figure 3: BasepGetModuleHandleExW code

The first part of the BasepGetModuleHandleExW function explains the difference of behavior between GetModuleHandle and GetModuleHandleEx with dwFlags set to 4.

When the dwFlags is set to 4, the function uses the RtlPcToFileHeader to find the base address of the module related to the function passed as parameters.

A step-by-step debug shows that this function returns the right value for a DLL loaded with LoadLibrary but always return 0 for a DLL loaded with the custom DLL Loader.

 

Analysis of RtlPcToFileHeader

If I had to implement a function that, given a specific address, returns the base address of the image containing the function, I would naturally use the Win32Api VirtualQuery. So, I did not see why this function could fail.

The RtlPcToFileHeader indeed use VirtualQuery to get the base address:

Figure 4: use of VirtualQuery inRtlPcToFileHeader

However, before getting in this execution branch it performs some additional tests :

Figure 5: Tests performed in RtlPcToFileHeader

If the execution flow goes into the if(!v10), the function will return 0, otherwise, it has a chance to go through the VirtualQuery and returns the right base address.

When this function is used on a DLL loaded by the custom loader, it always falls in the wrong code path returning 0.

 

LdrpInvertedFunctionTable

The test performed by the RtlPcToFileHeader function is based on an analysis of the LdrpInvertedFunctionTable.

This table that can be parsed using the two following structures,

Figure 6: Structure used to parse the inverted table

seems to be used to handle SEH exceptions.

So, it seems that the custom loader fails to register these exceptions. Indeed, using WinDBG with the DLL loaded through LoadLibrary, it is possible to see that an entry corresponding to the WINHTTP.DLL file has been registered:

Figure 7: Analysis of the inverted table with WinDBG

The same test with the custom loaded DLL shows that no new entry were added to the LdrpInvertedFunctionTable.

 

Solutions

The messy one

The root cause of the problem is that when loading the DLL, no additional entries are added to the LdrpInvertedFunctionTable leading to a hard failure on the RtlPcToFileHeader function.

However, the main cause of the problem is that GetModuleHandleEx uses RtlPcToFileHeader.

While adding a new entry to the LdrpInvertedFunctionTable can be a hard problem, hijacking the GetModuleHandleEx function when loading the DLL is an easy one.

Indeed, during the DLL loading process, we have to manually resolve the exported function address, so it is possible to hijack the entry related to GetModuleHandleExA.

The following code can be used instead of GetModuleHandleExA:

Figure 8: custom GetModuleHandleExA code

This code iterates over the DLL registered in the PEB linked list, check if the given function is located in the DLL and returns the base address of the related DLL.

This solution worked for the WinHTTP.DLL but what about other use cases or other functions based on RtlPcToFileHeader? We would have to remap them explicitly every time which is not the best way to operate.

 

The elegant one

When two things have to work well together, we have to comply with the rules of the part we are integrating to. In this case, the custom loader should implement the feature that adds the different entries in the LdrpInvertedFunctionTable.

 

Locate the use of RtlInsertInvertedFunctionTable

The function RtlInsertInvertedFunctionTable can be used to add an entry in the LdrpInvertedFunctionTable. So, if this is performed by the Windows DLL loader, it should be possible to find a reference of this function in the LoadLibrary callstack.

Indeed, the call to the RtlInsertInvertedFunctionTable is found in the LdrpProcessMappedModule function:

Figure 9: use of RtlInsertInvertedFunctionTable during DLL loading

This function is called with a security cookie generated using the LdrInitSecurityCookie function:

Figure 10: Use of LdrInitSecurityCookie

While the LdrInitSecurityCookie is an exported function, the RtlInsertInvertedFunctionTable is not. So, if we want to use this function, there are two choices:

  • Using a pattern recognition algorithm to find the function in the NTDLL knowing that the pattern can change between each Windows build version (this technique has been implemented here[6])
  • Redeveloping the function

I’m not a fan of pattern recognition because it is an unreliable technique that must be maintained over each Windows build version.

 

Analysis of the RtlInsertInvertedFunctionTable function

Decompiling the RtlInsertInvertedFunctionTable shows the following code :

Figure 11: RtlInsertInvertedFunctionTable function

Among these functions, the only ones exported are the RtlAcquireSRWLockExclusive and RtlReleaseSrwLockExclusive. However, the other ones are quite simple to implement:

  • RtlCaptureImageExceptionValues retrieves the image ExportDirectory
  • LdrProtectMrData performs a VirtualProtect on the .mrdata section
  • RtlpInsertInvertedFunctionTableEntry populates the RTL_INVERTED_FUNCTION_TABLE_ENTRY and adds the new element to the RTL_INVERTED_FUNCTION_TABLE LdrpInvertedFunctionTable.

The only problem now is there is not any exported function that allows the retrieval of the LdrpInvertedFunctionTable object.

 

Locate the RtlInsertInvertedFunctionTable

So, against all my principle, some pattern recognition algorithms need to be coded in order to locate the LdrpInvertedFunctionTable structure. However, finding this structure will be easier and more reliable than finding some instructions sequences in the whole NTDLL .text section.

Indeed, there are some inputs that can be used to narrow down the lookup and avoid false positive:

  • The structure is located in the .mrdata
  • The MaxCount field must be less than 512
  • The Count field must be less than max count and more than 0

The LdrpInvertedFunctionTable is located in the NTDLL .mrdata. This section is a specific section that is configured with ReadOnly protection as the .rdata. However, this section protection is often changed from ReadOnly to ReadWrite.

This section is used to store sensitive structure that can be modified by the OS under specific circumstances (enhance the ReadWrite protection) but must be protected against programmatic error that could write arbitrary data in it (enhance the ReadOnly protection at runtime).

Then, some conditions on the different entries can be verified to ensure that the address tested represents the LdrpInvertedFunctionTable and is not a false positive. For each entry:

  • The exception directory address must be contained in the DLL image
  • The exception directory address must match with the one computed from the DLL base image
  • The exception directory size must match with the one computed from the DLL base image

These conditions do not ensure the unicity of the solution, but I don’t think random garbage in memory could verify all these conditions, especially the last three.

The following function can be used to locate the LdrpInvertedFunctionTable:

Figure 12: Code looking for LdrpInvertedFunctionTable

We now have everything we need to implement the RtlInsertInvertedFunctionTable.

 

Implement the RtlInsertInvertedFunctionTable

The RtlInsertInvertedFunctionTable can be implemented as the following:

  • Locate the LdrpInvertedFunctionTable as explained before
  • Unprotect the .mrdata section from ReadOnly to ReadWrite using VirtualProtect
  • Locate the index where the new DLL entry must be stored (these entries are sorted by image base address)
  • Write the RTL_INVERTED_FUNCTION_TABLE_ENTRY element in the LdrpInvertedFunctionTable

Figure 13:  RtlpInsertInvertedFunctionTableEntry implementation

This code can be added to the DarkLoadLibrary[7] project to get a fully functional DLL Loader.

 

Conclusion

When developing a custom C2, the most difficult part is not getting something functional. This is mainly basic development. The most difficult and interesting part is to get something OPSEC.

This part implies a deep understanding of Windows internals in order to understand what IOC will be raised, how it can be bypassed and how this custom part can be adapted to be fully integrated with the native Windows ecosystem.

This blogpost does not only show how a specific part of the Windows DLL loader can be reimplemented, but how IOC can be hunted, and how the Windows internals can be reversed to adapt our work to the ecosystem.

 

[1] https://otterhacker.github.io/Malware/Reflective DLL injection.html

[2] https://posts.specterops.io/perfect-loader-implementations-7d785f4e1fa

[3] https://otterhacker.github.io/Malware/Reflective DLL injection.html

[4] https://twitter.com/_batsec_

[5] https://www.mdsec.co.uk/2021/06/bypassing-image-load-kernel-callbacks/

[6] https://github.com/strivexjun/MemoryModulePP/blob/master/MemoryModulePP.c

[7] https://github.com/bats3c/DarkLoadLibrary

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top