thinking in geek tagline’s are so web2.0

Posted
28 June 2007 @ 5pm

Tagged
.net, asp.net, debugging, iis

You know you’re having a bad day when…

You walk into the office and get told that you need to drop everything you were going to be working on because there’s a recurring problem on the production servers. For the last few days we’ve been experiencing a 100% CPU utilization on the w3wp.exe (the asp.net worker) process. Killing it off will fix the problem temporarily - but it will start up again sooner or later. Time to start production debugging!

  1. Install the Debugging Tools for Windows and thank the good Lord you’ve been reading Tess’s blog.
  2. Create a “hang” dump of the process.

    > adplus -hang -p 1680

  3. Open the dump in windbg.exe

    >windbg.exe -z Hang_Mode__Date_06-28-2007__Time_11-21-2020

  4. Load sos.dll - the debug helper for the CLR.

    > .load C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll

  5. Checkout what’s happening in the CLR thread pool.

    0:000> !threadpool
    CPU utilization 100%
    Worker Thread: Total: 3 Running: 2 Idle: 1 MaxLimit: 200 MinLimit: 2
    Work Request in Queue: 0
    ————————————–
    Number of Timers: 9
    ————————————–
    Completion Port Thread:Total: 2 Free: 1 MaxFree: 4 CurrentLimit: 2 MaxLimit: 200 MinLimit: 2

  6. There are two CLR threads running. Wonder what they’re doing? Are they GC’ing or in user code? Let’s take a look at what’s happening with the OS threads.

    0:000> !threads
    ThreadCount: 8
    UnstartedThread: 0
    BackgroundThread: 8
    PendingThread: 0
    DeadThread: 0
    Hosted Runtime: no
    PreEmptive GC Alloc Lock
    ID OSID ThreadOBJ State GC Context Domain Count APT Exception
    13 1 90c 000dc6e8 1808220 Enabled 02b64538:02b65458 000d9710 0 Ukn (Threadpool Worker)
    17 2 fdc 000e2fc8 b220 Enabled 00000000:00000000 000d9710 0 MTA (Finalizer)
    18 3 38c 000f7798 80a220 Enabled 00000000:00000000 000d9710 0 MTA (Threadpool Completion Port)
    19 4 84c 000fa800 1220 Enabled 00000000:00000000 000d9710 0 Ukn
    11 5 cd0 00116b30 880a220 Enabled 00000000:00000000 000d9710 0 MTA (Threadpool Completion Port)
    21 6 e94 0ebd6bc8 180b220 Disabled 00000000:00000000 000fb1a0 1 MTA (Threadpool Worker)
    22 8 b70 0ebde1a8 180b220 Disabled 00000000:00000000 000fb1a0 1 MTA (Threadpool Worker)

    23 7 e14 0ec0a378 880b220 Enabled 00000000:00000000 000d9710 0 MTA (Threadpool Completion Port)

  7. There’s two threads running user code there - are they the ones that are pegging the CPU? Let’s see.

    0:000> !runaway
    User Mode Time
    Thread Time
    21:e94 0 days 0:19:50.296
    22:b70 0 days 0:15:38.625

    13:90c 0 days 0:00:07.062
    15:934 0 days 0:00:00.156
    16:a38 0 days 0:00:00.140
    3:bec 0 days 0:00:00.062
    5:538 0 days 0:00:00.046
    0:2e0 0 days 0:00:00.046
    28:6f0 0 days 0:00:00.000
    27:a3c 0 days 0:00:00.000
    26:530 0 days 0:00:00.000
    25:358 0 days 0:00:00.000
    24:b8c 0 days 0:00:00.000
    23:e14 0 days 0:00:00.000
    20:9e8 0 days 0:00:00.000
    19:84c 0 days 0:00:00.000
    18:38c 0 days 0:00:00.000
    17:fdc 0 days 0:00:00.000
    14:5d4 0 days 0:00:00.000
    12:ab8 0 days 0:00:00.000
    11:cd0 0 days 0:00:00.000
    10:f1c 0 days 0:00:00.000
    9:ac4 0 days 0:00:00.000
    8:a40 0 days 0:00:00.000
    7:a24 0 days 0:00:00.000
    6:c84 0 days 0:00:00.000
    4:f94 0 days 0:00:00.000
    2:bb0 0 days 0:00:00.000
    1:270 0 days 0:00:00.000

  8. Yep - the two CLR threads are chewing all the CPU. Let’s switch our thread context to the first one. (21).

    0:006> ~21 s
    eax=026bcd88 ebx=026bcd01 ecx=026bcd88 edx=02702a44 esi=026bcd88 edi=026bf064
    eip=0ee0709e esp=0eeff1ec ebp=026bcd88 iopl=0 nv up ei ng nz na pe cy
    cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000287
    Demo!Models.CMSPage.IsSubPage(Models.CMSPage)+0×56:
    0ee0709e 0fb6db movzx ebx,bl

  9. Let’s take a look at what managed objects are reachable from the current stack.

    0:021> !dso
    OS Thread Id: 0xe94 (21)
    ESP/REG Object Name
    eax 026bcd88 Models.CMSPage
    ecx 026bcd88 Models.CMSPage
    esi 026bcd88 Models.CMSPage
    edi 026bf064 Models.CMSPage
    ebp 026bcd88 Models.CMSPage
    0eeff1f8 026bcd88 Models.CMSPage
    0eeff1fc 026bf064 Models.CMSPage
    0eeff208 06529858 System.String <ul id=”menu-level{0}”>
    0eeff238 026cd1cc System.Object[] (Models.CMSPage[])
    0eeff240 790d6584 System.String
    0eeff244 026b906c ASP.skins_demo_master_master
    0eeff284 026b906c ASP.skins_demo_master_master
    0eeff28c 026bdea4 System.Text.StringBuilder
    0eeff29c 026cd1cc System.Object[] (Models.CMSPage[])
    0eeff2a0 026bdea4 System.Text.StringBuilder
    0eeff2a4 026cd1cc System.Object[] (Models.CMSPage[])
    0eeff2a8 026bcd88 Models.CMSPage
    0eeff2b8 026b906c ASP.skins_demo_master_master
    0eeff2bc 026bdd2c System.Web.UI.HtmlControls.HtmlLink
    0eeff2cc 026df978 System.Object
    0eeff2d0 024af58c System.String class
    0eeff2dc 0651f798 System.String screen.css
    0eeff2f8 026b906c ASP.skins_demo_master_master
    0eeff310 024a793c System.EventArgs
    0eeff33c 026b906c ASP.skins_demo_master_master
    0eeff350 026b4bdc ASP.default_aspx
    0eeff374 790d6584 System.String
    0eeff37c 026afcf4 System.Web.HttpContext
    0eeff380 026b4bdc ASP.default_aspx
    0eeff514 026b4bdc ASP.default_aspx
    0eeff528 026b4bdc ASP.default_aspx
    0eeff558 02464340 System.Globalization.CultureInfo
    0eeff55c 02464110 System.Globalization.CultureInfo
    0eeff560 025beff0 System.Threading.Thread
    0eeff564 026b4bdc ASP.default_aspx
    0eeff59c 026afcf4 System.Web.HttpContext
    0eeff5a0 026b4bdc ASP.default_aspx
    0eeff5bc 026afcf4 System.Web.HttpContext
    0eeff5c0 025d1084 System.Web.HttpApplication+CallHandlerExecutionStep
    0eeff5f8 025d1084 System.Web.HttpApplication+CallHandlerExecutionStep
    0eeff5fc 025beeb0 ASP.global_asax
    0eeff62c 025beeb0 ASP.global_asax
    0eeff630 026affb4 System.Web.HttpApplication+ThreadContext
    0eeff634 025e54a0 System.Web.AspNetSynchronizationContext
    0eeff63c 025beeb0 ASP.global_asax
    0eeff668 026aff98 System.Web.HttpAsyncResult
    0eeff690 026afcf4 System.Web.HttpContext
    0eeff694 026af6ac System.Web.Hosting.ISAPIWorkerRequestInProcForIIS6
    0eeff698 02477fb0 System.Web.HttpRuntime
    0eeff6e0 0247b8e4 System.Web.Caching.CacheSingle
    0eeff700 0248e4a4 System.Web.Hosting.ISAPIRuntime

  10. Himmm.. we have the same instance (026bcd88) of that CMSPage object floating around in a bunch of places. I wonder if we have some bad data in the database. This class has a parent/child relationship with itself. Could it be causing an infinite loop?

    0:021> !do 026bcd88
    Name: Models.CMSPage
    MethodTable: 0e941234
    EEClass: 0244cc60
    Size: 92(0×5c) bytes
    (C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files\root\5bf6c3da\19d72350\assembly\dl3\36150507\00d93296_9ca7c701\Demo.DLL)
    Fields:
    MT Field Offset Type VT Attr Value Name
    0e942118 4000140 40 …sionFactoryHolder 0 static 024c4e34 holder
    791036b0 4000147 4 …ections.ArrayList 0 instance 026bcde4 __validators
    79113dfc 4000148 8 …tions.IDictionary 0 instance 00000000 __failedProperties
    79124228 4000149 c System.Object[] 0 instance 00000000 _errorMessages
    790fed1c 40000f7 40 System.Int32 0 instance 102 _id
    790fa3e0 40000f8 18 System.String 0 instance 026bc668 _name
    790fa3e0 40000f9 1c System.String 0 instance 00000000 _url
    790fa3e0 40000fa 20 System.String 0 instance 026bc6a0 _title
    790fa3e0 40000fb 24 System.String 0 instance 026bc768 _content
    790fa3e0 40000fc 28 System.String 0 instance 026bc6c8 _heading
    790fa3e0 40000fd 2c System.String 0 instance 026bc6e4 _metaDescription
    790fa3e0 40000fe 30 System.String 0 instance 026bc71c _metaKeyWords
    0e941174 40000ff 44 System.Int32 0 instance 2 _status
    791059c0 4000100 10 System.Double 0 instance 1.000000 _displayOrder
    0e941234 4000101 34 Models.CMSPage 0 instance 026bcd88 _parent
    7910e274 4000102 38 …Collections.IList 0 instance 026bcef0 _children
    79105ef0 4000103 48 System.DateTime 1 instance 026bcdd0 _dateUpdatedAt
    79105ef0 4000104 50 System.DateTime 1 instance 026bcdd8 _dateCreatedAt
    0e94337c 4000105 3c Models.Site 0 instance 026b8e30 _Site

  11. 11. Yep - check that - the _parent field of instance 026bcd88 is pointing at itself (026bcd88). Log into the database and manually patch up the data. Next job - write some more defensive code around how that works.

Well - I hope you enjoyed that. Production debugging is always fun!


2 Comments

Posted by
Darron
29 June 2007 @ 12pm

Hate that !


[...] has a post on hardcore debugging in production. I didn’t even know that was doable! Great stuff! Can’t wait to have a production [...]


Leave a Comment

Products vs Services BackgroundMotion code == Great