By Michael Schöbel
In this post we try to determine how much kernel memory is required when creating a new thread. This amount of memory is relevant for the upper bound of the number of possible threads in the system as investigated in detail by Mark Russinovich.
For a starting point we looked at the system service call implementation of
NtCreateThread and followed every possible code path down to memory allocation functions such as
The following picture (click for a larger view) shows a flow chart of
Vertical connections are call relations – e.g.
NtCreateThread calls the function
PspCreateThread. Horizontal connections are call sequences –
PspCreateThread first calls
ObCreateObject and afterwards
ExCreateHandle, and so on. The graph is far from complete. Only functions leading to memory allocations are shown.
As you can see there are four different places memory is allocated to whilst creating a new thread:
ObCreateObject: A new thread object is created which will be managed by the object manager. The allocated memory contains the thread data structure
ETHREAD and object metainformation.
ExCreateHandle: A thread handle has to be stored in the process handle table. If free entries are available in the table, an entry can be used directly for the new thread. If the handle table is full,
ExpAllocateHandleTableEntrySlow allocates memory and extends the handle table of the process.
MmCreateTeb: Every (user mode) thread gets a thread environment block (TEB) which contains e.g. information about thread local storage memory (see
sdk/inc/pebteb.h for further details). The function
MiCreatePebOrTeb allocates a virtual address descriptor and reserves memory for the TEB data structures.
KeInitThread: A thread requires kernel stack space for activities in kernel mode. Such a stack is created via
The actual memory size used by a new thread depends on the actual platform of the system. There are differences in 64-bit Windows compared to 32-bit Windows with regard to memory page size and data structures. Again, Mark Russinovich covers different aspects in his ‘Pushing the Limits of Windows’ blog post series.
The following table shows the results of our source code analysis:
In short, during thread creation on a 32-bit system around 20 kbyte of memory is used. On a 64-bit system around 40 kbyte is allocated.
Disclaimer: There might be certain memory alignment/padding effects which are not considered in the presented calculation. Furthermore, we might just have missed memory allocations.