Sunday, February 12, 2012

HP SAN vs "Dividing by Zero". 0-1. SAN hardware crashes.

Hello All,

Never, ever, ever, ever divide by zero otherwise very bad things can happen. A client's less than 1 year old HP SAN environment crashed last week thanks to a SAN firmware bug (dividing by zero) which caused a kernel panic. And of course this only happened when uptime hit 208.5 days. Hence, HP calls it the "208 Day" bug. I call it poor software development.

Absolutely ridicuous. I have never really liked the HP SAN hardware which was brought over from Lefthand Networks. For mission critical environments, I'm a believer in proper SAN hardware such as the Dell EqualLogic line. REEF has deployed HP and Dell SAN hardware, and without a doubt, the Dell SAN hardware is better. Even the HP software has problems with Hyper-V and running under Windows Core. Disappointing. The Dell SAN hardware is better built and cheaper, and this is why REEF Solutions' is a Dell Premier Partner.


HP bug which causes reboots after 208.5 days


-Ben

P.S. The IT Director of the client who experienced the problem at least has a good sense of humor. This "dividing by zero" programming mistake is clearly a common issue. Enjoy the image below.

Preparing for D Day for Me...

Hello All,

I've been doing a lot of "house keeping" lately before "D" Day. "D" day being delivery day. My wife is due with our 3rd child. While my wife is nesting, I'm doing the equivalent for an IT person. We had a false alarm when we thought it was happening, so now I feel like I'm living on borrowed time and have all this "extra" time. In the last week I've done the following:
  • getting our REEF NY & TX SonicWall firewalls updated to the latest code (VPN tunnel speed to my TX off-site environment doubled in speed)
  • rolling out a SonicWall based network bandwidth and auditing solution (we currently monitor it using another solution) for REEF's networks.
  • NY based on-site servers replication operating system re-installed (for REEF environment, the on-site server is 2008 R2 based. The replication data was not touched, since it is iSCSI based.)
  • NY based on-site servers replication software upgraded (to improve performance, noticeable positive difference between AppAssure Replay 4.6.1,31257 and 4.7.2.40512 [found a bug in the replication UI and alerted AppAssure about it and received a support response in 5 minutes. Impressive. I wish all AppAssure support techs responded so quickly]). For REEF environment. Enjoy the image below.
  • TX based off-site servers replication upgraded (same AppAssure Replay versions upgraded)
  • rolled out my digital photo album solution based on a BlackBerry PlayBook. Considered an iPad, but security, performance, and low cost of the 64GB PlayBook ($300) made it the better solution.
  • NY on-site server operating system re-installed (for clients environment, the server environment is Windows 2003 x86 based. Currently using a stable release of Ahsay. Planning to upgrade to latest stable version shortly.
  • working on deploying a new wireless SonicWall based solution so guests at home will be on a separate VLAN based network. In preparation for all those home visitors.
Notice the replication speed showing “10.22MBit/sec”. It should be “Mb”, not MB. A capital “B” is BYTES, while a lower case “b” is bits. This is on the latest version 4.7.2.40512. Dev has been alerted per support's response.



Back to spending time with the existing kids and wife,
-Ben