18.4.13

“Disk Failure”, watch your S.M.A.R.T!


Yes, for those in any sort of "ops" or Sysadmin position, the very phrase makes your blood run cold causing cranial dermis spasms, usually around the eyes, along with that sinking feeling that makes you feel like you are exiting this reality at warp speed.

Well, its that taboo topic that needs talking about.  But you may say/think that “I have backups, I am fine” or “I use Various RAID levels with redundancy, I’ll be fine.”  but are you really?  If this is you, READ ON!  :)

Recently I had a NAS Failure.  Well, not so much as a complete failure, but rather, a NAS that was seemingly not happy.  Causing significant service disruptions and delays.  Big deal right?  Deal with it and move on.  Well recently, after fully integrating a storage area network in my workplace, I was thinking this is great.  I can spin up a VM, move it from node to node, add disks when I need more space, and all seemed hunky dory.  

Many months into its service, services seemed to sporadically crash, or have significant delays.   As I was diagnosing issues, I was thinking there was an issue with my compute node setup in my ESXi cloud.  My virtual machines would simply freeze.  They would show many disk errors.  I thought there was an interruption in the transport layer (or on the wire if you will) because once I rebooted the compute node all machines seemed to work again.  Slowly these issues began to bleed through to other servers connected to the SAN.  

I thought, and I admit perhaps naively, that well, the particular NAS (or Network Attached Storage) unit that was serving these servers had RAID5.  If there was a disk issue, it would fail the disk to the hot spare and given the redundant nature of RAID5, it would continue to operate and alert me of the failed disk.  WRONG!   
For those who are not familiar with RAID5, in a nut shell RAID is the way you configure multiple hard disks to operate together.  RAID-5 for example, what I was using, is the practice of using 3 or more disks with fault tolerance such that if one hard disk failed the other 2 would function and there would be no data loss.  
So, in my case, I was using 7 2TB disks, effectively having 6 2TB drives or  12TB of storage  (less formatting) available.  RAID5 complexity allows any one disk to fail and there would be no service impacting issues.  Especially with the “hot spare” waiting.  (A hot spare is an installed disk on standby in case another fails.)

Now back to the point, I began to suspect that a particular NAS server was showing signs of complete failure.  After making sure all my backups were up to date, I began to take a look at the system.  When I looked at the console, It was violently spewing out disk error and retry messages.  Every time there was a service stall, the console would begin to projectile puke these massages that make you freeze in fear.  Could my NAS be completely failing???  Could RAID5 have deceived me?

With poise and professionalism, I got my composure and exited my server room to face an office at 9am slowly growing full of people asking questions like; “why is FogBugz not working?” and “why cant I access the shared files on the server?” … and on and on.  I was looking at a systems failure of about 50% of all the services I operate.

On looking at the web client of the NAS, I began to check the health of all the disks.  What was actually happening was a SMART drive abnormality.  (SMART is the function that is built into all modern hard drives that can detect pending failure).  A disk in the RAID5 array had not completely failed, rather, it was about to.  It was not quite timing out to a failure, but as the drive was franticly trying to reallocate bad sectors, it stalled out the whole disk array.  But it would restore operations before it times out completely to failure but this was service impacting.  

So what I am trying to say here is that you cannot rely on RAID levels as a redundant measure.  You MUST also monitor the SMART status of all the attached drives.  A drive with an emanate failure can take out an entire array and play havoc with your services.  All I needed to do was remove the bad disk and the system failed over to the hot spare.  After the RAID rebuild completed, I added a new hot spare and all was good in the land of NAS once again.

Moral of my story here is that make sure all your monitoring systems are in check.  Setup regular SMART tests on all production systems.  In the case of this drive the SMART system was telling me that there was an abnormally high rate of bad sector reallocation.  It was a bad disk.  

As a side note here, we have the incorrect thought that modern hard drives are more stable than older hard drives.  But in actuality, the have MUCH more bad sector counts( more sectors, more failures).  They just reallocate these bad sectors and mark them as bad more frequently.  We just never know about it now unless you run disk utilities to view this internal data.

Also, if you reply on a NAS unit for as many services as I do here, I recommend that you have a redundant NAS unit that is mirroring so that you can fail over to in  cases like this.  Fix the issues and keep services running.  Sadly, this was not in my budget this year.

Until Next time,
//Ian\\

PS: If you have any topic you are interested in, please let me know, and feel free to leave comments and start discussion.  It would be nice to see an actual comment rather than spam :)  

19.3.13

Digital Signage Expo 2013


Well folks, better late than never.  This year I have the opportunity once again to attend Digital Signage Expo, or DSE.   This year marked the 10th Anniversary of the Digital Signage Expo.  I will avoid talking about typical displays.  But rather, I would like to highlight the unique installations and technologies.  LCD displays are  simply displays.  With various sizes and aspects.  There is not much to be said (well with the exception of 4K displays!).  The teaser here is uniquely shaped displays, unique applications/installations, LED assisted bezels, and players sub $100 that are capable of playing HTML5 content. 



Lets start with the show floor and talk about the technologies I have seen. Planar had a rather unique mosaic installation (above).  Using their software to deliver your content.  As digital advertising becomes more of the 'norm' being creative and unique with your displays is going to be more important.  Although this install was not ground breaking, it was a rather nice and unique installation.  Although it may be costly to install, it can deliver a memorable experience.





Touch revolution also had some new products.  This year, they debuted their commercial Android tablets!  Seen here they are rated for 24/7 operation, and run PoE (Power over Ethernet) so no extra cabling is required other than Ethernet and of course they do run wireless.  These are made for fixed installation (shelf side or other applications).  These also come with a software toolkit to lock the devices down at a software level.  Onboard they have an ethernet, MicroSD card slot, usb ports and power port.  I really like these and can see many applications for use in retail.  Unlike using consumer tablets in an commercial application, these are designed for 24/7 operation and are front side, NON-branded.  I cant stress this enough.  Commercial devices should be non branded.  You are using their products to advertise OTHER products.  So why would once pay prime dollars for branded commercial devices.  This was refteshing to see.



Among other innovative products, were e-paper based  wifi enabled shelf price tags from Opticon.  These  were disposable products, however, based on 5 price changes a day, would run for about 5 years.  From various companies were other e-ink/paper based signs which were quite effective for things like room signage.  These are great for B/W NON INTRUSIVE, low energy usage signage.  On initial glance you would not realize they were digital signs.

One product did grab my attention.  As we know, many manufacturers are trying to eliminate the screen bezel all together.  However we know this is not w/o limitations.  Lxi International did come up with a very interesting way to work around it.  Their displays are simply displays WITH a bezel.  However the bezels are strips of LED lights which change colour to match what’s happening on the edges.  Although you can see the edges...   they do look better than black lines in my opinion.  Have a look!
  


Another very interesting product that would be a serious asset to actually gaining quantitative data for installations was a self contained customer profiling box.  We have had technology that would asses a viewers dwell time, approximate age and sex for a while, but they have been very complex installations and required some high end hardware.  Now Aware Live Technologies has come up with an appliance to do just this.  Its a small box with Ethernet, power and an Asus (Kinect like) camera attached.  This can be hidden within an installation to log and graph quantitative data.  


Seen above is a picture of me on its monitoring screen.  It has assigned me a unique visitor number, sex, age and added a count to the amount of people in front of it.  You can aggregate this data as to the direction the person has moved off the screen, and other valuable data for the client.  

Another interesting product was a video delivery device made by Matrox which would scale output to match the screen.  This would be a very useful device when sending video to may different screens.   So long as the aspect is the same resolution, it would not matter as it would scale the content for us.

There were also many other oddly shaped displays such as long aspects and rounded displays. 

Among other things were a plethora of signage players.  Many too complex and useful for very specific installations.  However the last item I would like to profile is Broadsign's $99 android player.  This small device is very light weight, and by small, I mean very small.  Sadly I did not get a picture of the device I was just so excited about this product and got talking to the rep for a while about it and walked away and did not take a snap.  Lets just say it will fit in the palm of your hand.  Also, since the device is Android, Dual Core 1.5Ghz it should be able to deliver signage through our stack.

This years show was much larger than 2012, and full of energy you see at the likes of InfoComm.  If you are into signage or displays of any sort, a visit to DSE will be in order.  Dont forget all, that next years show will be at the Sans Feb 11-13.  Not the LasVegas Convention Centre as it has been in the past.  Mark your calendars now.

//Ian\\




13.2.13

Computing Power Paradigm Shift


So I have noticed that I am now part of a paradigm shift.  I am not talking about a major shift in assumptions here nor the dreaded marketing speak, but a shift in how I consume my computing power.   I personally see computers as part of our lives now.  By far we are not the Star Trek characters known as Binars...  yet...  But I noticed having grown up as a small child entering the computer revolution and the loving geekdom that my life became, that it was about how bad ass your 286, 386 and 486 rigs could be!   Ok that dates me.  The Pentium entered the scene when I was in my OAC or grade 13 year in high school.

But I digress, back to the point.  I feel that unless you are a serious gamer or somebody who requires graphics, or scientific computing power, that its no longer about how powerful your PC is.  I realized this (at least for myself) in the last few days as I was getting increasingly frustrated with my Samsung Galaxy Note.

This is a very fine phone.  In fact, for most people its more than they need.  I however want the best from my mobile device.  I use it more than I do my own laptop now.  It keeps me posted on new messages, I take calls on it, I listen to music on it.  a LOT of music on it.  While I do that, I am usually playing a game, watching a movie, surfing,or reading and replying to emails before I arrive at the office.

Since I used my phone so much, I began to get frustrated at the slowness it would have to switch between tasks.  Since I used so many apps frequently, it had issues keeping up with my scatter minded multitasking.  I found that it was becoming too slow.


So even though my laptop is a MacBook Pro Core 2 Duo with 8GB Ram, and I run flight simulators and musical editing suites, it was primary for me to have a better, faster phone.  Lets face it, the Laptop I have does its job.  This is where I realized as I walked out of the local wireless store with my Samsung Galaxy Note 2 in hand, in a mild state of cognitive dissonance,  that I was more willing to drop my hard earned money on a new phone rather than a new Laptop or an upgrade.

I have entered the computing power paradigm shift.  Its all about the portable device now.  I never thought in my increasingly luddite mind that this would ever happen, but hey, perhaps I am not going to become a luddite after all!  I am ahead of the curve?

Thoughts?






1.2.13

Digital Signage Expo 2013

Ok folks!   Its that time if year again and at the end of the month, I will be headed down to Sin City once again to take in the latest in Digital Signage technology at DSE2013.  As usual, I plan to post while I am there.

So stay Tuned.  Also to my few followers, Sorry I have not posted in some time.  Busy time of year, but I am back at it once again!

Cheers Folks!
//Ian\\