Category: Diaster Recovery

Comprehensive Capacity Optimization – Deduplication 2.0


Technology is great isn’t it?  When someone thinks they have a new idea on the same old technology foundation they call it “X 2.0″.  I have been watching the banter between analysts and vendors (specifically NTAP’s Dr. Dedupe and Permabit’s CEO Tom Cook) on the topic of Deduplication 2.0 and it is my belief that the proverbial boat is being missed (since we are using water analogies).  I have been watching these guys hash it out for the past few weeks and decided I have to jump in.  I find the real value to these conversations is the value to the end user.  At the end of the day, it doesn’t really matter who ‘coined’ or ‘invented’ a term (like deduplication 2.0) but what does matter is if  the term actually helps describe a technology and how that technology can be leveraged to make things better in the data center.  We should focus on the implications of this new generation of deduplication – ‘deduplication 2.0’.

In May I delivered a presentation to a number of EMC customers on the topic of Data Deduplication 2.0 – Comprehensive Capacity Optimization.  The point of my presentation was simple (and keep in mind this was before the Data Domain acquisition); there are a number of capacity optimization technologies/capabilities that are available to customers today.  Originally these deduplication technologies were used primarily for backup purposes but slowly, deduplication is making its way into primary storage. Deduplication in primary storage makes a lot of sense FOR DATA THAT IS STATIC.  Why only static data?  Static data is data that isn’t used frequently (doesn’t mean it’s not important, it just simply is not accessed often); because access to this data is infrequent, the performance requirements for this data is less than that of active data. Remember; nothing in IT is free.  If I deduplicate data, in order to use it, I must ‘rehydrate’ it and thus there is a performance implication so I want to be careful where I deduplicate data so as not to inhibit performance on production data.

A Data Protection Reference Architecture – The Final Chapter


The Architecture

This ‘architecture’ diagram, as you can see, is not a typical architecture diagram, but hopefully it can be used to align your business and business objectives with the technologies that are available and can best be applied to solve your issues helping to balance, cost, complexity and compliance.

This diagram can also be used to do a couple of other things.  It can help you begin to classify your data and align your  data to your business objectives.  It also lets you begin to identify what data or data services in your environment that may be more important to you than others and based on this help you to choose areas you may want to outsource or move to the cloud.

As you can tell, there really is not one solution for meeting all your data protection needs.  The challenge comes with managing multiple solutions in an effort to meet your business objectives.  While there are only a few technologies available that allow you to manage your environment across all your RPOs and RTOs, it is important that I point out EMC’s NetWorker is able to do this, centralizing your data protection infrastructure  for ease of management.  It allows you to manage traditional backup, source based deduplicated backup with Avamar, CDP with RecoverPoint, as well as the EMC disk libraries and tape where the data is stored.  Now, I am not saying that NetWorker solves all of your data protection challenges, nor am I suggesting that replacing one traditional backup technology for another is the right answer, but what I am saying is that if you’re looking to have all the feature functionality required to meet all your business objectives and you want easier management, NetWorker is one avenue to get you there.  Additionally, the underlying image of the triangle represents data protection management.  Putting all the new technology in place is one thing, managing it, and ensuring you are now meeting your business needs is another.  EMC’s Data Protection Advisor can help here as well.

A Data Proteciton Reference Architecture – Part 4


Business Critical Applications

The tip of the triangle focuses on the applications (or data) that drives your business.  It is these applications within your business that, should they go down for any length of time, cost you money.  The recovery of this information, in the event of a ‘disaster’, needs to be very fast (RTO in minutes) and the data can’t be very ‘old’ when it is recovered (short RPO, less than 24 hours).   Typically,  the technologies that are used for these types of applications are replication (synchronous or asynchronous) or continuous data protection (CDP).  These technologies ensure that recovery at the alternate location  are instant (or near instant) and / or give users the ability to pick a point in time they want to recover to in order to ensure no data loss and the ability to bring up the applications as fast and accurately as possible.  This category, much like the rest of them, have the same disclaimer, ‘one size (product) does not fit all’.  Depending upon the value of the data in this tier, and the risk to the business if this data is unavailable drives the technology and spend in this part of the triangle.  Keep in mind, the right technology (Don’t choose CDP if you need an active remote file system) gives you the best recovery (RPO) for your business needs and can keep you on the Road to Recovery.

Post to Twitter

Process vs. Technology


The hardest thing to change inside IT is not technology, it is process!  I say this because all too often there are technologies available that provide a far superior solution to a complex IT problem, however, this new technology may not fit into your existing business process.  Need proof?  Let’s take data protection as an example.  Did you know that VTLs (virtual tape libraries) and data deduplication technologies came out at the exact same point in history, 10 years ago?  Which technology had faster market adoption?  VTLs of course because implementing them didn’t cause a major disruption in processes.

Let’s take a look at a simple backup environment.  We won’t worry about archiving or compliance for the moment, just operational backup and recovery.  Today’s backup has a number of complexities.  There are some data sets that have weekly full backups and daily incremental backups.  There are some data sets that sit under applications that, for faster recovery capabilities and simplicity, require daily full backups.  Once the backups are done, in order to ensure true data protection reliability, a process of checking the backup logs to ensure every system was successfully protected begins.  Next, backup tapes are either created (if it is a disk based backup) or tapes are taken from the library and moved to a transportable box, hopefully a secure box.  Finally, a third party vendor comes to pick up the tapes and take them off site for safe-keeping.  Additionally, if the data is backed up using encryption, then the encryption keys are also kept off site for security purposes.

 Customers face these standard backup challenges:

1) Backups take too long and cannot meet backup windows as a result of too much data.

2) Backups fail due to poorly configured (networked) backup environments.

3) Backups at remote offices are ‘unreliable’. (Don’t follow best practices set in the data center.)

a. No one with the appropriate skill set is available to monitor these backups.

b. No one with the appropriate skill set is available to troubleshoot these backups.