Friday, May 29, 2009

How to find the thread which triggered a GC

How to find the thread which triggered a GC

1. Take a memory dump of the application at the correct time ( The time its using high CPU)
2. Run ~* kb 2000 to get all native stacks
3. Search for mscorwks!SVR::GCHeap::GarbageCollectGeneration, its the thread that triggered the GC ()

In addition you can see if a thread is blocked as a result of GC call by searching GCHeap::WaitUntilGCComplete

OS Thread Id: 0xcf4 (37)
Current frame: ntdll!KiFastSystemCallRet
ChildEBP RetAddr Caller,Callee
1a2ee3e4 7c90e9c0 ntdll!ZwWaitForSingleObject+0xc
1a2ee3e8 7c8025db kernel32!WaitForSingleObjectEx+0xa8, calling ntdll!NtWaitForSingleObject
1a2ee43c 7c8399f3 kernel32!_except_handler3
1a2ee44c 79e77fd1 mscorwks!PEImage::LoadImage+0x199
1a2ee490 79e77f9a mscorwks!CLREvent::WaitEx+0x117, calling mscorwks!PEImage::LoadImage+0x16a
1a2ee4b0 79f8eb00 mscorwks!WaitForMultipleObjectsEx_SO_TOLERANT+0x165, calling ntdll!RtlSetLastWin32Error
1a2ee4b4 79f8eb08 mscorwks!WaitForMultipleObjectsEx_SO_TOLERANT+0x16d, calling mscorwks!_SEH_epilog4
1a2ee4e0 79e77f50 mscorwks!CLREvent::Wait+0x17, calling mscorwks!CLREvent::WaitEx
1a2ee4f0 79f762f7 mscorwks!WKS::GCHeap::WaitUntilGCComplete+0x32, calling mscorwks!CLREvent::Wait
1a2ee500 79f75d6a mscorwks!Thread::RareDisablePreemptiveGC+0x1a1

Error: Failed to load data access DLL, 0x80004005

Problem: Ever got the following error while executing a command in WinDbg?

Failed to load data access DLL, 0x80004005
Verify that 1) you have a recent build of the debugger (6.2.14 or newer)
2) the file mscordacwks.dll that matches your version of mscorwks.dll is
in the version directory
3) or, if you are debugging a dump file, verify that the file
mscordacwks___.dll is on your symbol path.
4) you are debugging on the same architecture as the dump file.
For example, an IA64 dump file must be debugged on an IA64
machine.

You can also run the debugger command .cordll to control the debugger's
load of mscordacwks.dll. .cordll -ve -u -l will do a verbose reload.
If that succeeds, the SOS command should work on retry.

If you are debugging a minidump, you need to make sure that your executable
path is pointing to mscorwks.dll as well.

Resolution:

Make sure you have correct version of mscordacwks.dll in your .Net Framework path (normally C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727). To know which version the WinDbg is looking for run the following command.
.symfix nosiy
The above command will enable the verbose mode of all module loads, including any erorr found by WinDbg.
Run .reload command to know which module is failing to load.
If the above command passes without any error, and you are still getting the same error while exeucting any command, check the output of WinDbg after you ran the command. It will be something like

CLRDLL: Unable to get version info for 'C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\mscordacwks.dll', Win32 error 0n87
CLRDLL: Unable to find mscordacwks_x86_x86_2.0.50727.42.dll by mscorwks search
CLRDLL: Unable to find 'mscordacwks_x86_x86_2.0.50727.42.dll' on the path
CLRDLL: Unable to find mscorwks.dll by search
CLRDLL: ERROR: Unable to load DLL mscordacwks_x86_x86_2.0.50727.42.dll, Win32 error 0n2

Now you need to find the correct version and rename the dll to the above bold convention mscordacwks_x86_x86_.dll . You can get this file from the system the dump was taken on.
Place this dll in your framework path and everything should work now.

Tuesday, May 19, 2009

Scripting WinDbg: .foreach command

The .foreach token parses the output of one or more debugger commands and uses each value in this output as the input to one or more additional commands.

Syntax
.foreach [Options] ( Variable { InCommands } ) { OutCommands }

Options
Can be any combination of the following options:
/pS InitialSkipNumber
Causes some initial tokens to be skipped. InitialSkipNumber specifies the number of output tokens that will not be passed to the specified OutCommands.
/ps SkipNumber
Causes tokens to be skipped repeatedly each time a command is processed. After each time a token is passed to the specified OutCommands, a number of tokens equal to the value of SkipNumber will be ignored.

Sample Usage
Consider a example where we need to dump all GCRoot references of System.Drawing.Icon objects. The MT for System.Drawing.Icon in our application is 7ae7613
To find all objects of MT 7ae76134, we will run the command !DumpHeap -mt 7ae76134
The output of the above command looks like

0:000> !DumpHeap -mt 7ae76134
Address MT Size
0989d6a4 7ae76134 40
09ab24b8 7ae76134 40
09b6a044 7ae76134 40
09b6cc70 7ae76134 40
09b6cdb4 7ae76134 40
09d444e4 7ae76134 40
0a0e480c 7ae76134 40
total 7 objects
Statistics:
MT Count TotalSize Class Name
7ae76134 7 280 System.Drawing.Icon
Total 7 objects

Now in normal scenario we have to run !GCRoot on each of the object address, that is seven times to get the desired result. .foreach comes to rescue here. We can run it for each object address for a specific command. As you have noted the above output have three columns with Address as the first which will be needed to passed on to the GCRoot command.

Executing !DumpHeap -mt 7ae76134 -short gives us the following, which only gives us the required object instance Address
0:000> !DumpHeap -mt 7ae76134 -short
0989d6a4
09ab24b8
09b6a044
09b6cc70
09b6cdb4
09d444e4
0a0e480c

We can now get the desired output (that is find all references to all System.Drawing.Icon objects) by .foreach using the following two variants

  1. .foreach ( ico {!DumpHeap -mt 7ae76134 -short}) {!GCRoot ico }
  2. .foreach /ps2 ( ico {!DumpHeap -mt 7ae76134 }) {!GCRoot ico }

Both of above commands prints the exact same output, but they differ in the way its parsed.

Example 1 uses the -short option that outputs only the address of the objects, which are directly passed on to the OutCommands
Example 2 is more versatile as it does not depend on the -short option. It uses the /ps option which causes the token to be skipped. In our case the first token (Which is Address) of the Output is read and passed to GCRoot, and next two subsequent tokens ( which are MT and Size are skipped for every line)

Monday, January 12, 2009

Detecting A Managed Memory Leak

What is a Memory Leak?
In Computer Science, “Memory Leak” is a particular type of unintentional memory consumption by a computer program which occurs when the program fails to release memory that is no longer needed. Usually, this condition arises due to a bug in the program which prevents it from freeing up memory that it no longer needs.

Does traditional memory leak exist in .Net?
.Net CLR does an excellent job of reclaiming the memory that it has assigned to a program on the request. So, in theory if a memory has been assigned by .Net CLR, it can reclaim this memory by cleaning up our live references. However, we can have a native leak in the following cases:
• Improper use of Interops
• Unsafe code
• GCHandles
• Failure to dispose the objects that internally use native resources

There are many more reasons, but those are not so common.

Memory Management by .Net CLR
Though .Net CLR manages the memory in an excellent way, it does not take into account any unintentional leak done by a programmer. The “unintentional leak” here means that the programmer fails to remove object references for an object instance that should not be alive anymore.

Code Snippet
Let us write a simple application where supposedly the programmer wanted to put some data in a hashtable data structure and remove the item from hashtable when the task for that item was over. Apparently, the programmer forgot to write the code to remove items from one of the code flow paths.


public partial class Form1 : Form
{
private Hashtable hshDataHolder = new Hashtable();
public Form1()
{
InitializeComponent();
}

private void btnLeak_Click(object sender, EventArgs e)
{
for (int i=0; i<50;i++)
{
byte[] _data = new byte[1024*1024];
hshDataHolder.Add(DateTime.Now.Ticks + i,_data);
Thread.Sleep(1);
}
}
}

In this program, things will work fine for a while, as our programmer friend wrote correct business logic to process the data stored in hashtable. However, over a period of time, the memory (specifically, the Virtual memory) consumption by the application shoots up as the data items that should have been removed start piling up.

Detecting a Managed Memory Leak
Code Review
One of the best practices in programming is to have a thorough code review. To an experienced eye, detecting a memory leak is easy if the code footprint is small.

WinDbg tool
However, in the case where lines of code run into tens of thousands, it is next to impossible to detect memory leak just by manual review process. This is especially the case if you are not the author of the program 

Using WinDbg, a code debugging tool, we are better equipped to find a memory leak.

WinDbg can be downloaded from Microsoft web site at: http://www.microsoft.com/whdc/devtools/debugging/installx86.Mspx.

The tool needs symbols files for the operating system, which can be downloaded at: http://www.microsoft.com/whdc/DevTools/Debugging/symbolpkg.mspx

Here are the steps to use WinDbg:
1. Run the application to be debugged for memory leak.

2. Make sure the application has reached a stage where it has consumed more memory than normally expected (hence, suffering from a ‘memory leak’).

3. Open WinDbg and attach the debugger to the application.

4. In the command box, type
.loadby sos mscorwks and press Enter. This loads the managed debugging extension SOS (Son Of Strike). This extension provides useful statistics about the managed code.

5. Run the command
!DumpHeap –stat
This command provides memory usage of each class type in the application. The output reads something like this:


--------------------------------------------------------------------------------------
MT Count Size Type
--------------------------------------------------------------------------------------
0b223718 18 720 System.Windows.Forms.Internal.IntNativeMethods+LOGFONT
7ae3c59c 17 748 System.Drawing.BufferedGraphics
79332f40 27 1512 System.Collections.Hashtable
7932b16c 68 1632 System.Collections.Stack
79331a6c 84 1680 System.RuntimeType
7933151c 17 1700 System.Char[]
79330508 143 1716 System.Object
79332a88 20 2032 System.Int32[]
79332178 150 2400 System.Int64
7b22394c 68 4896 System.Windows.Forms.Internal.DeviceContext
648c8c4c 95 5320 System.Configuration.FactoryRecord
7933303c 27 10200 System.Collections.Hashtable+bucket[]
793040bc 120 37832 System.Object[]
793308ec 832 69552 System.String
0015c5c8 675 181748 Free
7933335c 153 157299228 System.Byte[]
Total 3217 objects

6. We see that about 153 MB of memory is taken up by System.Byte[]! That’s a good starting point for our memory leak detection process. Next, let us inspect exactly at which point this memory is consumed by the application.

7. Run
!DumpHeap -type System.Byte[]
This command displays all the instances of System.Byte[] created by the application, along with their memory pointers and size of the objects.


--------------------------------------------------------------------------------------
Address MT Size
--------------------------------------------------------------------------------------
0b391110 7933335c 1048592
0b491130 7933335c 1048592
0b591150 7933335c 1048592
0b691170 7933335c 1048592
0b791190 7933335c 1048592
0b8911b0 7933335c 1048592
0ba91000 7933335c 1048592
0bb91020 7933335c 1048592
0bc91040 7933335c 1048592
0bd91060 7933335c 1048592
0be91080 7933335c 1048592
0bf910a0 7933335c 1048592
0c0910c0 7933335c 1048592
0c1910e0 7933335c 1048592
0c291100 7933335c 1048592
0c391120 7933335c 1048592
0c491140 7933335c 1048592
0c591160 7933335c 1048592
0c691180 7933335c 1048592
0c7911a0 7933335c 1048592
0c8911c0 7933335c 1048592
total 153 objects
Statistics:
MT Count TotalSize Class Name
7933335c 153 157299228 System.Byte[]
Total 153 objects

8. We observe that many byte[] of size about 1 MB are stored somewhere by the application. Let us check if they are live (referenced) instances, and if they are, where they are referenced.

9. Run
!GCRoot 0bc91040
Here 0bc91040 is the memory address of any one of the byte[] instance that we found in the previous step. Display shows an output similar to the following:


0:004> !GCRoot 0bc91040
Note: Roots found on stacks may be false positives. Run "!help gcroot" for
more info.
eax:Root:0128a4ec(WindowsApplication2.Form1)->
0128a638(System.Collections.Hashtable)->
012c1bd4(System.Collections.Hashtable+bucket[])
Scan Thread 0 OSTHread c6c
Scan Thread 2 OSTHread 8fc

10. Here we can see that the byte[] is referenced (live) and it is stored in a hashtable. (Think this could have been found without looking into the code?). The hashtable is created in Form1 class. This limits our scope further, as now we know that we have to look for a hashtable which is declared in Form1 class which stores a byte[].

11. Let us dump the object Form1 (that stores the hashtable in question) by the command
!do 0128a4ec
Here 0128a4ec is the address of Form1

Display shows output similar to the following:


--------------------------------------------------------------------------------------
MT Field Offset Type VT Attr Value Name
--------------------------------------------------------------------------------------
79332b38 4001e82 fe0 System.Int32 1 static 58 PropMainMenuStrip
79332b38 4001e83 fe4 System.Int32 1 static 59 PropMdiWindowListStrip
79332b38 4001e84 fe8 System.Int32 1 static 60 PropMdiControlStrip
79332b38 4001e85 fec System.Int32 1 static 61 PropSecurityTip
79332b38 4001e86 ff0 System.Int32 1 static 62 PropOpacity
79332b38 4001e87 ff4 System.Int32 1 static 63 PropTransparencyKey
79330508 4001e9a bc0 System.Object 0 static 0128aa04 EVENT_MAXIMIZEDBOUNDSCHANGED
00000000 4000004 13c 0 instance 00000000 components
7b21a924 4000005 140 ...dows.Forms.Button 0 instance 012a8f14 btnLeak
79332f40 4000006 144 ...ections.Hashtable 0 instance 0128a638 hshDataHolder

12. Studying the above output, we ascertain the name of the hashtable as hshDataHolder.

This definitely makes it easier to find the related code! Next, it is up to the programmer to refine and fine-tune the lines of code in order to get rid of the memory leak.