Tag: "Archive"

Data Protection, Retention and Archive Starts with Data Value


 It feels good to open up the blogging again to new topics, especially ones I am intimately familiar with.  (But have no fear, there will be references to primary storage optimization / compression.)

This weekend I had an interesting conversation with my Dad.  We were discussing backup.  My dad basically runs IT for the State of Maine.  The State of Maine uses CommVault backup software.  So I posed the question to him, “What would it take for you to rip out CommVault and replace it with another solution.  He thought about it for a moment and replied “I wouldn’t”.  His answer came down to a couple of reasons.

First was the expense.  It’s not just about buying the new software, it would be training people to run the new software and it would be about throwing away the massive investment they have in their existing product as well as converting all the years of backup takes created with one software to the new software.  This is one of the biggest things vendors forget when trying to sell a customer on their backup software.

Second was the fact that, feature for feature, the top 5 traditional backup software products are not really that different from one another.  Sure, I do agree that some products have features that others don’t, and others products have features that work better than others, but in reality, the delta is so small and the workarounds are so simple it doesn’t really matter.  Unless your replacing traditional backup software with an evolutionary source based data deduplication software (which is only applicable for some environments) there is no advantage to switching software.

The challenge is if Data Protection is still one of the biggest and most expensive pain points within IT, how do the problems get resolved if replacing the software controlling it all is too costly to change?

Create PDF    Send article as PDF   

Storage’s 2010 Hottest Technology


Each year there tends to be one technology that stands out in the storage space.  In 2009 it was data deduplication.  At the end of 2008 EMC made an acquisition of a source based deduplicaiton solution called Avamar.  Later, in 2009, they announced a strategic partnership with Quantum for data deduplication at the target.  Then in 2009 EMC made a bid against NetApp for Data Domain and won.  In addition, NetApp had data deduplication announcements with its ASIS technology.  Quantum, Falconstor, and Symantec all had their own story with data deduplication and a host of non-public companies such as Permabit, Sepaton, and Exagrid all were talking about the merits of data deduplication.

As the story goes, if you haven't put data deduplication in your backup environment yet you're either in an environment where there is not one iota of duplicate data, which is highly unlikely, or the company you work for has gobs of money and has no problem:

  1. Backing up to slow tape
  2. No worries about slow recovery from tape
  3. Keeping massive amounts of data on unreliable tape
  4. Backing up full streams of data to disk (and wasting valuable storage space)

What I am saying is that if you haven't implemented a data deduplication solution by now, you have been left in the technology dust.  Data deduplication just makes too much sense.  I know we have all heard the expression "No one ever got fired for buying X."  But has anyone ever got promoted because they bought X?  I have to believe that the IT team that can save their company 50% or more of their storage will get promoted.  Storage is a cost drain on IT.  It's the applications that make a company money.  Its time to start focusing some of those valuable IT dollars on the applications that make your company money, its time to be the IT Super Hero!

Create PDF    Send article as PDF   

Comprehensive Capacity Optimization – Deduplication 2.0


Technology is great isn't it?  When someone thinks they have a new idea on the same old technology foundation they call it "X 2.0".  I have been watching the banter between analysts and vendors (specifically NTAP’s Dr. Dedupe and Permabit’s CEO Tom Cook) on the topic of Deduplication 2.0 and it is my belief that the proverbial boat is being missed (since we are using water analogies).  I have been watching these guys hash it out for the past few weeks and decided I have to jump in.  I find the real value to these conversations is the value to the end user.  At the end of the day, it doesn't really matter who 'coined' or 'invented' a term (like deduplication 2.0) but what does matter is if  the term actually helps describe a technology and how that technology can be leveraged to make things better in the data center.  We should focus on the implications of this new generation of deduplication - ‘deduplication 2.0’.

In May I delivered a presentation to a number of EMC customers on the topic of Data Deduplication 2.0 - Comprehensive Capacity Optimization.  The point of my presentation was simple (and keep in mind this was before the Data Domain acquisition); there are a number of capacity optimization technologies/capabilities that are available to customers today.  Originally these deduplication technologies were used primarily for backup purposes but slowly, deduplication is making its way into primary storage. Deduplication in primary storage makes a lot of sense FOR DATA THAT IS STATIC.  Why only static data?  Static data is data that isn't used frequently (doesn't mean it's not important, it just simply is not accessed often); because access to this data is infrequent, the performance requirements for this data is less than that of active data. Remember; nothing in IT is free.  If I deduplicate data, in order to use it, I must ‘rehydrate’ it and thus there is a performance implication so I want to be careful where I deduplicate data so as not to inhibit performance on production data.

PDF Creator    Send article as PDF   

A Data Protection Reference Architecture – The Final Chapter


The Architecture

This ‘architecture’ diagram, as you can see, is not a typical architecture diagram, but hopefully it can be used to align your business and business objectives with the technologies that are available and can best be applied to solve your issues helping to balance, cost, complexity and compliance.

This diagram can also be used to do a couple of other things.  It can help you begin to classify your data and align your  data to your business objectives.  It also lets you begin to identify what data or data services in your environment that may be more important to you than others and based on this help you to choose areas you may want to outsource or move to the cloud.

As you can tell, there really is not one solution for meeting all your data protection needs.  The challenge comes with managing multiple solutions in an effort to meet your business objectives.  While there are only a few technologies available that allow you to manage your environment across all your RPOs and RTOs, it is important that I point out EMC’s NetWorker is able to do this, centralizing your data protection infrastructure  for ease of management.  It allows you to manage traditional backup, source based deduplicated backup with Avamar, CDP with RecoverPoint, as well as the EMC disk libraries and tape where the data is stored.  Now, I am not saying that NetWorker solves all of your data protection challenges, nor am I suggesting that replacing one traditional backup technology for another is the right answer, but what I am saying is that if you’re looking to have all the feature functionality required to meet all your business objectives and you want easier management, NetWorker is one avenue to get you there.  Additionally, the underlying image of the triangle represents data protection management.  Putting all the new technology in place is one thing, managing it, and ensuring you are now meeting your business needs is another.  EMC's Data Protection Advisor can help here as well.

PDF Download    Send article as PDF   

A Data Protection Reference Architecture – Part 1


This blog will have multiple parts.  I will introduce my view of a data protection reference architecture and the next few blog posts will talk to components of that architecture.

The other day  I had a very interesting conversation with a colleague of mine in Australia.  He was looking for a data protection reference architecture that he could use to speak to his customer.  As you can imagine having this conversation over the phone could pose to be a difficult challenge.  When the conversation began, my fear was he was looking for an ‘architecture’ diagram that included data protection appliances, backup servers, disk libraries, tape libraries and backup agents.  I quickly realized that this is an impossible conversation to have with him without knowing:

A)     the customer’s environment or challenges

B)      the customer’s business objectives

I find that most vendors don’t know A or B when speaking to a customer about their data protection ‘issues’, but they really should.  Having a more thoughtful conversation with customers in a consultative fashion is more relevant to customers in understanding their challenges and helping to align these challenges to the best possible solution.

I started my conversation with the diagram shown below (Figure 1).  A simple triangle divided horizontally into 4 segments and the middle two segments divided vertically in half.  Each segment represents different business objectives within a company.  As you go around the triangle, you can see that there are different technologies and different methodologies for attacking data protection challenges, which is why there is no longer a “one size fits all” approach when it comes to protecting data today. Let’s face it; the two most important commodities in backup are time and capacity.  One of the primary drivers behind the type of protection that is used is the Recovery Point Objective or RPO.  Different technologies provide different RPOs and each has a different price point as well as there are different processes that can be applied to attach RPOs.

Figure 1

Figure 1

PDF Printer    Send article as PDF   

What Happened in Vegas, Stayed in Vegas


Well, until now.  This is an interesting story about archiving and how it could have, but didn't help a friend of mine.

Often, when speaking with customers, I talk to them about the 4 fundamental principals with regard to data protection:

  1. Assess
  2. Archive
  3. Backup
  4. Manage

The assessment phase is a multi-dimensional phase.  It's about people, process and technology.  Like with most things, the technology piece is the easy piece.  EMC has tools that allow us to scan file systems, data bases and email systems that report back a litany of information including but not limited to:

  • Number of files
  • Age of files
  • Volume of data
  • Owner of the data

Once EMC passes the information to the customer about their data, the real hard work begins.  Armed with the information, IT now has to go and speak to line of business managers in order to determine the value of the data, and how data of a specific value needs to be managed and protected.  The problem is line of business managers want everything saved forever, until IT tells them what the bill would be.  IT begins to describe the different 'classes' of service capabilities and line of business managers, who don't really care about the details (not because they don't care, they are just too busy), finally say "Just give me the highest level of protection I can get for the least amount of money."  IT now does the best they can to align their perceived value of the data, to the most appropriate backup and archive capabilities they have.

Now, in Vegas, I think we can all agree that the video surveillance has a ton of value to  the stake holders of the hotels and casinos.  The amount of debauchery that takes place in Vegas with the amount of money that is 'rolling' around Vegas, it is important to 'know what is going on' and to make sure all situations can be handled as efficiently as possible and this is where video surveillance comes into play and the more you 'save' on high speed disk, the easier it is to get to the truth or solve the mystery.

PDF Creator    Send article as PDF   

Information Classification – IT's Hardest Job


I have decided information today, is like a group of friends. If you look at my LinkedIn page or my Facebook page you see that I have over 600 connections and over 180 friends respectively. What does this really mean? Obviously don't stay in touch with all of these people. So why do we have these connections? I think it is because we believe that in the future, each one of these connections will offer some kind of value to us. It may be that they will be a friend to us, they may share common experiences to help us through a personal issue, and they may help us find a mate or even a job. We just don't know so we hang on to the connection.

This is not unlike information. We are all tired of hearing that "data is growing at an exponential rate" but we never look at why. It is simple. We believe that ‘someday' we may need that ‘valuable' piece of content so we better not delete it. More importantly, the people who are accountable for managing that data (IT) are one step removed from the ‘value' discussion (usually) so rather than delete anything and be responsible for "loosing data" they save and protect everything.

Recently I spent 4 hours on my Facebook page ‘categorizing' my friends. I created a number of categories, friends from high-school, friends from college, colleagues from work (current), colleagues from work (past), industry connections and relatives. As you can imagine there are some friends that belong in more than one category - so how do I choose which one they should go in? Also, what happens if I change jobs? Where do the ‘colleagues (work)' friends go? When do I move them? Do I remember to move them?

PDF Creator    Send article as PDF   

Road to Recovery


Our domain, Backup & Beyond was the tagline for Avamar Technologies, a company EMC acquired in November of 2006.  This tagline was very fitting from a data protection standpoint because Avamar utilized a traditional client / server architecture to protect data but with a twist.  Avamar utilizes a more intelligent client side agent that provides source based, variable block deduplication to enable the most efficient backups available in the market for more than 80% of a data centers data.  Avamar also leverages this same technology to replicate this data between disk based backup targets there by dramatically reducing the reliance on tape.  This new technology, that has enabled new processes is taking backup beyond.

The title of our blog, Road to Recovery - well, like every good title it is a play on words and trust me, as with every title it took us a while to come up with it.  That said, the industry has been talking about the fact that backup is really about recovery.  The same can be said for other data protection tools.  This is why our goal is to talk about methodologies (technologies and processes) that help you to recover data.  When IT professionals are polled, they often say that data protection (backup) is still the number one issue they have in the data center.  We say it is time to stepup and admit it and start the 'Road to Recovery' when it comes to your data protection environment.

Let us know what your challengs are, we are here for you, your support system and we welcome you comments and questions.

PDF Printer    Send article as PDF