You know you’re having a bad day when…
You walk into the office and get told that you need to drop everything you were going to be working on because there’s a recurring problem on the production servers. For the last few days we’ve been experiencing a 100% CPU utilization on the w3wp.exe (the asp.net worker) process. Killing it off will fix the problem temporarily – but it will start up again sooner or later. Time to start production debugging!
- Install the Debugging Tools for Windows and thank the good Lord you’ve been reading Tess’s blog.
- Create a “hang” dump of the process.
> adplus -hang -p 1680
- Open the dump in windbg.exe
>windbg.exe -z Hang_Mode__Date_06-28-2007__Time_11-21-2020
- Load sos.dll – the debug helper for the CLR.
> .load C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll
- Checkout what’s happening in the CLR thread pool.
0:000> !threadpool
CPU utilization 100%
Worker Thread: Total: 3 Running: 2 Idle: 1 MaxLimit: 200 MinLimit: 2
Work Request in Queue: 0
————————————–
Number of Timers: 9
————————————–
Completion Port Thread:Total: 2 Free: 1 MaxFree: 4 CurrentLimit: 2 MaxLimit: 200 MinLimit: 2
- There are two CLR threads running. Wonder what they’re doing? Are they GC’ing or in user code? Let’s take a look at what’s happening with the OS threads.
0:000> !threads
ThreadCount: 8
UnstartedThread: 0
BackgroundThread: 8
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
PreEmptive GC Alloc Lock
ID OSID ThreadOBJ State GC Context Domain Count APT Exception
13 1 90c 000dc6e8 1808220 Enabled 02b64538:02b65458 000d9710 0 Ukn (Threadpool Worker)
17 2 fdc 000e2fc8 b220 Enabled 00000000:00000000 000d9710 0 MTA (Finalizer)
18 3 38c 000f7798 80a220 Enabled 00000000:00000000 000d9710 0 MTA (Threadpool Completion Port)
19 4 84c 000fa800 1220 Enabled 00000000:00000000 000d9710 0 Ukn
11 5 cd0 00116b30 880a220 Enabled 00000000:00000000 000d9710 0 MTA (Threadpool Completion Port)
21 6 e94 0ebd6bc8 180b220 Disabled 00000000:00000000 000fb1a0 1 MTA (Threadpool Worker)
22 8 b70 0ebde1a8 180b220 Disabled 00000000:00000000 000fb1a0 1 MTA (Threadpool Worker)
23 7 e14 0ec0a378 880b220 Enabled 00000000:00000000 000d9710 0 MTA (Threadpool Completion Port)
- There’s two threads running user code there – are they the ones that are pegging the CPU? Let’s see.
0:000> !runaway
User Mode Time
Thread Time
21:e94 0 days 0:19:50.296
22:b70 0 days 0:15:38.625
13:90c 0 days 0:00:07.062
15:934 0 days 0:00:00.156
16:a38 0 days 0:00:00.140
3:bec 0 days 0:00:00.062
5:538 0 days 0:00:00.046
0:2e0 0 days 0:00:00.046
28:6f0 0 days 0:00:00.000
27:a3c 0 days 0:00:00.000
26:530 0 days 0:00:00.000
25:358 0 days 0:00:00.000
24:b8c 0 days 0:00:00.000
23:e14 0 days 0:00:00.000
20:9e8 0 days 0:00:00.000
19:84c 0 days 0:00:00.000
18:38c 0 days 0:00:00.000
17:fdc 0 days 0:00:00.000
14:5d4 0 days 0:00:00.000
12:ab8 0 days 0:00:00.000
11:cd0 0 days 0:00:00.000
10:f1c 0 days 0:00:00.000
9:ac4 0 days 0:00:00.000
8:a40 0 days 0:00:00.000
7:a24 0 days 0:00:00.000
6:c84 0 days 0:00:00.000
4:f94 0 days 0:00:00.000
2:bb0 0 days 0:00:00.000
1:270 0 days 0:00:00.000
- Yep – the two CLR threads are chewing all the CPU. Let’s switch our thread context to the first one. (21).
0:006> ~21 s
eax=026bcd88 ebx=026bcd01 ecx=026bcd88 edx=02702a44 esi=026bcd88 edi=026bf064
eip=0ee0709e esp=0eeff1ec ebp=026bcd88 iopl=0 nv up ei ng nz na pe cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000287
Demo!Models.CMSPage.IsSubPage(Models.CMSPage)+0×56:
0ee0709e 0fb6db movzx ebx,bl
- Let’s take a look at what managed objects are reachable from the current stack.
0:021> !dso
OS Thread Id: 0xe94 (21)
ESP/REG Object Name
eax 026bcd88 Models.CMSPage
ecx 026bcd88 Models.CMSPage
esi 026bcd88 Models.CMSPage
edi 026bf064 Models.CMSPage
ebp 026bcd88 Models.CMSPage
0eeff1f8 026bcd88 Models.CMSPage
0eeff1fc 026bf064 Models.CMSPage
0eeff208 06529858 System.String <ul id=”menu-level{0}”>
0eeff238 026cd1cc System.Object[] (Models.CMSPage[])
0eeff240 790d6584 System.String
0eeff244 026b906c ASP.skins_demo_master_master
0eeff284 026b906c ASP.skins_demo_master_master
0eeff28c 026bdea4 System.Text.StringBuilder
0eeff29c 026cd1cc System.Object[] (Models.CMSPage[])
0eeff2a0 026bdea4 System.Text.StringBuilder
0eeff2a4 026cd1cc System.Object[] (Models.CMSPage[])
0eeff2a8 026bcd88 Models.CMSPage
0eeff2b8 026b906c ASP.skins_demo_master_master
0eeff2bc 026bdd2c System.Web.UI.HtmlControls.HtmlLink
0eeff2cc 026df978 System.Object
0eeff2d0 024af58c System.String class
0eeff2dc 0651f798 System.String screen.css
0eeff2f8 026b906c ASP.skins_demo_master_master
0eeff310 024a793c System.EventArgs
0eeff33c 026b906c ASP.skins_demo_master_master
0eeff350 026b4bdc ASP.default_aspx
0eeff374 790d6584 System.String
0eeff37c 026afcf4 System.Web.HttpContext
0eeff380 026b4bdc ASP.default_aspx
0eeff514 026b4bdc ASP.default_aspx
0eeff528 026b4bdc ASP.default_aspx
0eeff558 02464340 System.Globalization.CultureInfo
0eeff55c 02464110 System.Globalization.CultureInfo
0eeff560 025beff0 System.Threading.Thread
0eeff564 026b4bdc ASP.default_aspx
0eeff59c 026afcf4 System.Web.HttpContext
0eeff5a0 026b4bdc ASP.default_aspx
0eeff5bc 026afcf4 System.Web.HttpContext
0eeff5c0 025d1084 System.Web.HttpApplication+CallHandlerExecutionStep
0eeff5f8 025d1084 System.Web.HttpApplication+CallHandlerExecutionStep
0eeff5fc 025beeb0 ASP.global_asax
0eeff62c 025beeb0 ASP.global_asax
0eeff630 026affb4 System.Web.HttpApplication+ThreadContext
0eeff634 025e54a0 System.Web.AspNetSynchronizationContext
0eeff63c 025beeb0 ASP.global_asax
0eeff668 026aff98 System.Web.HttpAsyncResult
0eeff690 026afcf4 System.Web.HttpContext
0eeff694 026af6ac System.Web.Hosting.ISAPIWorkerRequestInProcForIIS6
0eeff698 02477fb0 System.Web.HttpRuntime
0eeff6e0 0247b8e4 System.Web.Caching.CacheSingle
0eeff700 0248e4a4 System.Web.Hosting.ISAPIRuntime
- Himmm.. we have the same instance (026bcd88) of that CMSPage object floating around in a bunch of places. I wonder if we have some bad data in the database. This class has a parent/child relationship with itself. Could it be causing an infinite loop?
0:021> !do 026bcd88
Name: Models.CMSPage
MethodTable: 0e941234
EEClass: 0244cc60
Size: 92(0x5c) bytes
(C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files\root\5bf6c3da\19d72350\assembly\dl3\36150507\00d93296_9ca7c701\Demo.DLL)
Fields:
MT Field Offset Type VT Attr Value Name
0e942118 4000140 40 …sionFactoryHolder 0 static 024c4e34 holder
791036b0 4000147 4 …ections.ArrayList 0 instance 026bcde4 __validators
79113dfc 4000148 8 …tions.IDictionary 0 instance 00000000 __failedProperties
79124228 4000149 c System.Object[] 0 instance 00000000 _errorMessages
790fed1c 40000f7 40 System.Int32 0 instance 102 _id
790fa3e0 40000f8 18 System.String 0 instance 026bc668 _name
790fa3e0 40000f9 1c System.String 0 instance 00000000 _url
790fa3e0 40000fa 20 System.String 0 instance 026bc6a0 _title
790fa3e0 40000fb 24 System.String 0 instance 026bc768 _content
790fa3e0 40000fc 28 System.String 0 instance 026bc6c8 _heading
790fa3e0 40000fd 2c System.String 0 instance 026bc6e4 _metaDescription
790fa3e0 40000fe 30 System.String 0 instance 026bc71c _metaKeyWords
0e941174 40000ff 44 System.Int32 0 instance 2 _status
791059c0 4000100 10 System.Double 0 instance 1.000000 _displayOrder
0e941234 4000101 34 Models.CMSPage 0 instance 026bcd88 _parent
7910e274 4000102 38 …Collections.IList 0 instance 026bcef0 _children
79105ef0 4000103 48 System.DateTime 1 instance 026bcdd0 _dateUpdatedAt
79105ef0 4000104 50 System.DateTime 1 instance 026bcdd8 _dateCreatedAt
0e94337c 4000105 3c Models.Site 0 instance 026b8e30 _Site
- 11. Yep – check that – the _parent field of instance 026bcd88 is pointing at itself (026bcd88). Log into the database and manually patch up the data. Next job – write some more defensive code around how that works.
Well – I hope you enjoyed that. Production debugging is always fun!
2 Comments