Category: Archive

A Blueprint for Primary Storage Optimization


During the past three to four months the storage industry has seen a spike in the number of reports, white papers and news articles surrounding the evolution of primary storage technology, capacity optimization (it is 2010’s Hottest Storage Technology).

The reason this technology is getting a lot of ‘air play’ these days is due to the fact that this technology is so critical to help control the growth and costs of storage.  In 2010 the EMC sponsored IDC Report The Digital Universe Decade – Are You Ready? was release and stated that:

  • In 2009, amid the “Great Recession,” the amount of digital information grew 62% over 2008 to 800 billion gigabytes (0.8 Zettabytes).
  • The amount of digital information created annually will grow by a factor of 44 from 2009 to 2020…

The folks at Wikibon also released an info graph that exposes the true explosion of data.

Information Explosion & Cloud Storage
Via: Wikibon

When you combine storage capacity (and the foot print it takes up) along with the power it takes to run it and cool it as well as the human resource it takes to manage it, you soon realize we cannot keep ‘just adding more cheap disk’ in an effort to manage the storage demands.  High Tech companies with high tech labs are also telling IT that ‘they are out of tricks’ when it comes to the ability to continue deliver disk drive that double capacity every 18 months.  It is for these reasons that primary storage optimization technologies have stepped into the ‘lime light’ as it serves as a means to help control the growth of primary storage including the foot print, power, cooling and man power required to manage it.

However, as we all know in IT, no two environments are the same and what may be good for one may not be good for another.  When looking at primary storage optimization there seem to be a number of available technologies and ways to deploy these technologies and the key question is what is right for ‘my’ environment.

Storage’s 2010 Hottest Technology


Each year there tends to be one technology that stands out in the storage space.  In 2009 it was data deduplication.  At the end of 2008 EMC made an acquisition of a source based deduplicaiton solution called Avamar.  Later, in 2009, they announced a strategic partnership with Quantum for data deduplication at the target.  Then in 2009 EMC made a bid against NetApp for Data Domain and won.  In addition, NetApp had data deduplication announcements with its ASIS technology.  Quantum, Falconstor, and Symantec all had their own story with data deduplication and a host of non-public companies such as Permabit, Sepaton, and Exagrid all were talking about the merits of data deduplication.

As the story goes, if you haven’t put data deduplication in your backup environment yet you’re either in an environment where there is not one iota of duplicate data, which is highly unlikely, or the company you work for has gobs of money and has no problem:

  1. Backing up to slow tape
  2. No worries about slow recovery from tape
  3. Keeping massive amounts of data on unreliable tape
  4. Backing up full streams of data to disk (and wasting valuable storage space)

What I am saying is that if you haven’t implemented a data deduplication solution by now, you have been left in the technology dust.  Data deduplication just makes too much sense.  I know we have all heard the expression “No one ever got fired for buying X.”  But has anyone ever got promoted because they bought X?  I have to believe that the IT team that can save their company 50% or more of their storage will get promoted.  Storage is a cost drain on IT.  It’s the applications that make a company money.  Its time to start focusing some of those valuable IT dollars on the applications that make your company money, its time to be the IT Super Hero!

Comprehensive Capacity Optimization – Deduplication 2.0


Technology is great isn’t it?  When someone thinks they have a new idea on the same old technology foundation they call it “X 2.0″.  I have been watching the banter between analysts and vendors (specifically NTAP’s Dr. Dedupe and Permabit’s CEO Tom Cook) on the topic of Deduplication 2.0 and it is my belief that the proverbial boat is being missed (since we are using water analogies).  I have been watching these guys hash it out for the past few weeks and decided I have to jump in.  I find the real value to these conversations is the value to the end user.  At the end of the day, it doesn’t really matter who ‘coined’ or ‘invented’ a term (like deduplication 2.0) but what does matter is if  the term actually helps describe a technology and how that technology can be leveraged to make things better in the data center.  We should focus on the implications of this new generation of deduplication – ‘deduplication 2.0’.

In May I delivered a presentation to a number of EMC customers on the topic of Data Deduplication 2.0 – Comprehensive Capacity Optimization.  The point of my presentation was simple (and keep in mind this was before the Data Domain acquisition); there are a number of capacity optimization technologies/capabilities that are available to customers today.  Originally these deduplication technologies were used primarily for backup purposes but slowly, deduplication is making its way into primary storage. Deduplication in primary storage makes a lot of sense FOR DATA THAT IS STATIC.  Why only static data?  Static data is data that isn’t used frequently (doesn’t mean it’s not important, it just simply is not accessed often); because access to this data is infrequent, the performance requirements for this data is less than that of active data. Remember; nothing in IT is free.  If I deduplicate data, in order to use it, I must ‘rehydrate’ it and thus there is a performance implication so I want to be careful where I deduplicate data so as not to inhibit performance on production data.

A Data Protection Reference Architecture – The Final Chapter


The Architecture

This ‘architecture’ diagram, as you can see, is not a typical architecture diagram, but hopefully it can be used to align your business and business objectives with the technologies that are available and can best be applied to solve your issues helping to balance, cost, complexity and compliance.

This diagram can also be used to do a couple of other things.  It can help you begin to classify your data and align your  data to your business objectives.  It also lets you begin to identify what data or data services in your environment that may be more important to you than others and based on this help you to choose areas you may want to outsource or move to the cloud.

As you can tell, there really is not one solution for meeting all your data protection needs.  The challenge comes with managing multiple solutions in an effort to meet your business objectives.  While there are only a few technologies available that allow you to manage your environment across all your RPOs and RTOs, it is important that I point out EMC’s NetWorker is able to do this, centralizing your data protection infrastructure  for ease of management.  It allows you to manage traditional backup, source based deduplicated backup with Avamar, CDP with RecoverPoint, as well as the EMC disk libraries and tape where the data is stored.  Now, I am not saying that NetWorker solves all of your data protection challenges, nor am I suggesting that replacing one traditional backup technology for another is the right answer, but what I am saying is that if you’re looking to have all the feature functionality required to meet all your business objectives and you want easier management, NetWorker is one avenue to get you there.  Additionally, the underlying image of the triangle represents data protection management.  Putting all the new technology in place is one thing, managing it, and ensuring you are now meeting your business needs is another.  EMC’s Data Protection Advisor can help here as well.

A Data Protection Reference Architecture – Part 2


Archive

The most fundamental part of developing a good data protection architecture starts at the base of the triangle with Archive.  Archive is often an overlooked component of data protection – It’s not just for regulated business anymore.  Archive essentially gives users 100% data deduplication efficiency.  What I mean by this is that you have the ability to remove ‘stale’ data (and by ‘stale’ I don’t mean unimportant data, I just mean data that is not accessed frequently) completely from your backup stream so you don’t continue to back it up.  Let’s face it; the two most important commodities in backup are time and capacity.  Both of these are interdependent of one another.  The more capacity you have, the longer it takes to backup and the more money it costs to store.  The longer it takes you to backup, the less likely you are to be meeting your business objectives.  Data capacities aren’t shrinking, they are growing.  According to the latest IDC data, capacity is growing at a staggering pace of 65% year over year and the digital pack rat in all of us is too afraid to get rid of anything,  compromising backup windows and hence the business.  By archiving data that hasn’t been touched in some period of time and removing it from the backup stream, you can relieve some of the pressure on your backups and possibly not have to make any significant changes to your backup infrastructure.

Also, you don’t have to backup to a special purpose device or appliance for archive.  You can archive data to any file system.  I would keep in mind however, that you want to archive to a platform that can keep costs low.  Remember this data is not unimportant, just not highly used.  Take into account your RTO and store the data on the most cost effective platform possible that also aligns to the business objectives.  This may be tape, it may be optical or it may be disk.  If it is disk, you want to store it on disk that is optimized for this type of data, optimized for capacity (deduplication, compression, single instancing), has low power and cooling costs, can replicate for availability and is highly reliable.  You will also want to make sure that it is integrated to some extent with an application that lets you find the data pretty quickly when you need it and put you further down the Road to Recovery.

A Data Protection Reference Architecture – Part 1


This blog will have multiple parts.  I will introduce my view of a data protection reference architecture and the next few blog posts will talk to components of that architecture.

The other day  I had a very interesting conversation with a colleague of mine in Australia.  He was looking for a data protection reference architecture that he could use to speak to his customer.  As you can imagine having this conversation over the phone could pose to be a difficult challenge.  When the conversation began, my fear was he was looking for an ‘architecture’ diagram that included data protection appliances, backup servers, disk libraries, tape libraries and backup agents.  I quickly realized that this is an impossible conversation to have with him without knowing:

A)     the customer’s environment or challenges

B)      the customer’s business objectives

I find that most vendors don’t know A or B when speaking to a customer about their data protection ‘issues’, but they really should.  Having a more thoughtful conversation with customers in a consultative fashion is more relevant to customers in understanding their challenges and helping to align these challenges to the best possible solution.

I started my conversation with the diagram shown below (Figure 1).  A simple triangle divided horizontally into 4 segments and the middle two segments divided vertically in half.  Each segment represents different business objectives within a company.  As you go around the triangle, you can see that there are different technologies and different methodologies for attacking data protection challenges, which is why there is no longer a “one size fits all” approach when it comes to protecting data today. Let’s face it; the two most important commodities in backup are time and capacity.  One of the primary drivers behind the type of protection that is used is the Recovery Point Objective or RPO.  Different technologies provide different RPOs and each has a different price point as well as there are different processes that can be applied to attach RPOs.

Figure 1

Figure 1

What Happened in Vegas, Stayed in Vegas


Well, until now.  This is an interesting story about archiving and how it could have, but didn’t help a friend of mine.

Often, when speaking with customers, I talk to them about the 4 fundamental principals with regard to data protection:

  1. Assess
  2. Archive
  3. Backup
  4. Manage

The assessment phase is a multi-dimensional phase.  It’s about people, process and technology.  Like with most things, the technology piece is the easy piece.  EMC has tools that allow us to scan file systems, data bases and email systems that report back a litany of information including but not limited to:

  • Number of files
  • Age of files
  • Volume of data
  • Owner of the data

Once EMC passes the information to the customer about their data, the real hard work begins.  Armed with the information, IT now has to go and speak to line of business managers in order to determine the value of the data, and how data of a specific value needs to be managed and protected.  The problem is line of business managers want everything saved forever, until IT tells them what the bill would be.  IT begins to describe the different ‘classes’ of service capabilities and line of business managers, who don’t really care about the details (not because they don’t care, they are just too busy), finally say “Just give me the highest level of protection I can get for the least amount of money.”  IT now does the best they can to align their perceived value of the data, to the most appropriate backup and archive capabilities they have.

Now, in Vegas, I think we can all agree that the video surveillance has a ton of value to  the stake holders of the hotels and casinos.  The amount of debauchery that takes place in Vegas with the amount of money that is ‘rolling’ around Vegas, it is important to ‘know what is going on’ and to make sure all situations can be handled as efficiently as possible and this is where video surveillance comes into play and the more you ‘save’ on high speed disk, the easier it is to get to the truth or solve the mystery.

Road to 'Data' Recovery – 12 Steps


Hi, my name is Steve and I have a recovery problem.  Well, a data recovery problem that is.  So, I think it is about time that I apply the ’12 steps’ to help me with my data recovery problem.

Step 1 – It is time that I admit that I am powerless over my backup environment and my data protection world is unmanageable.

Step 2 – I have come to believe that there is a Technology greater that I that can help me restore (my sanity).

Step 3 – I have made a decision to put our company’s data and the process of recovery into the hands of a true data protection specialist.

Step 4 – I have helped to create a classified inventory of our company’s data.

Step 5 – I will admit to our CEO that I have failed at 63% of my recovery attempts costing the business $MMs.

Step 6 – I am prepared to have the new data protection administrator remove all of my defective technologies.

Step 7 – I will humbly ask ‘him’ to remove all of my failed processes.

Step 8 – I must make a list of all the people I have been unable to recover data for and be willing to try to restore their lost information.

Step 9 – I must make amends to all the people I have been unable to recover data for.

Step 10 – I must continue to take an inventory of all the tapes we have and promptly convert them to a newer technology to enable faster recovery.

Step 11 – I will seek out best of bread technology, parnters and vendors to improve our company’s capabilities for daily operational recovery.

Step 12 – Having had this spiritual awakening as a result of these steps, I will carry this message to all IT administrators who are challenged with data recovery issues.

I believe that by following these 12 steps, I will have put our company back on… the Road to ‘Data’ Recovery.

Road to Recovery


Our domain, Backup & Beyond was the tagline for Avamar Technologies, a company EMC acquired in November of 2006.  This tagline was very fitting from a data protection standpoint because Avamar utilized a traditional client / server architecture to protect data but with a twist.  Avamar utilizes a more intelligent client side agent that provides source based, variable block deduplication to enable the most efficient backups available in the market for more than 80% of a data centers data.  Avamar also leverages this same technology to replicate this data between disk based backup targets there by dramatically reducing the reliance on tape.  This new technology, that has enabled new processes is taking backup beyond.

The title of our blog, Road to Recovery – well, like every good title it is a play on words and trust me, as with every title it took us a while to come up with it.  That said, the industry has been talking about the fact that backup is really about recovery.  The same can be said for other data protection tools.  This is why our goal is to talk about methodologies (technologies and processes) that help you to recover data.  When IT professionals are polled, they often say that data protection (backup) is still the number one issue they have in the data center.  We say it is time to stepup and admit it and start the ‘Road to Recovery’ when it comes to your data protection environment.

Let us know what your challengs are, we are here for you, your support system and we welcome you comments and questions.

Post to Twitter