Attack of Old Bugs – Netapp high CPU

Just an FYI when you jump from Ontap 7-Mode 7.3.x to 8.1.4 you can have a re-occurrence of the Netapp bug 568758 which had to do with block deletes killing performance and spiking CPU due to serialization of volume cleanup processes. (Even though NetApp ‘fixed’ this bug in 8.1.4 from happening new, the snaps on the volumes in a certain way can cause it to crop up. Perfect Storm. )

http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=568758

I have ran into something totally new which is really based on something old with that bug…

I just did a headswap from FAS3140 to FAS3220, and what I thought was high CPU due to dedupe runs and Fingerprint updating, at first, ended up lasting through the morning. The customer called me in a panic before I headed out of town.

High CPU, as seen in System Manager’s performance view. Sitting at 100% all morning. Even though real AVG is just 33% and the issue isn’t a major meltdown.

I never ever ever trust any CPU information because most collect the wrong counter to be useful. ok let’s look deeper.

Well, now my customer is all freaked to hell. CPU3 is pegged and the “ANY” or CPU Domain is 100%. Let’s look deeper wtf is going on!

Huh? no raid or kahuna but Kahuna exempt [WAFL_Ex(Kahu)] cpu is pegged at 103%?

Let’s look deeper!

I looked to see if throttling settings were set right. This has cause major CPU issues on other systems in the past.

They look great.

How’s wafl scan status? Perfect. No scans.

Hows aggr status? Perfect, in RLW_Upgrading but no active scrubs
How’s options raid.scrub.perf_impact ? low.

We’ll crap. What could be wrong.

Screw this, let’s get crazy.

WAFL_DELAYED_FREE_WO ?? Awwww bawls. This is going to screw my day up. So much for sleep or food any time soon.

Let’s just nip this off right now.

Well, how’s about that crap. boooo ya!!!

Be Sociable, Share!
Comments are closed.