Category: Cloud

A Blueprint for Primary Storage Optimization


During the past three to four months the storage industry has seen a spike in the number of reports, white papers and news articles surrounding the evolution of primary storage technology, capacity optimization (it is 2010’s Hottest Storage Technology).

The reason this technology is getting a lot of ‘air play’ these days is due to the fact that this technology is so critical to help control the growth and costs of storage.  In 2010 the EMC sponsored IDC Report The Digital Universe Decade – Are You Ready? was release and stated that:

  • In 2009, amid the “Great Recession,” the amount of digital information grew 62% over 2008 to 800 billion gigabytes (0.8 Zettabytes).
  • The amount of digital information created annually will grow by a factor of 44 from 2009 to 2020…

The folks at Wikibon also released an info graph that exposes the true explosion of data.

Information Explosion & Cloud Storage
Via: Wikibon

When you combine storage capacity (and the foot print it takes up) along with the power it takes to run it and cool it as well as the human resource it takes to manage it, you soon realize we cannot keep ‘just adding more cheap disk’ in an effort to manage the storage demands.  High Tech companies with high tech labs are also telling IT that ‘they are out of tricks’ when it comes to the ability to continue deliver disk drive that double capacity every 18 months.  It is for these reasons that primary storage optimization technologies have stepped into the ‘lime light’ as it serves as a means to help control the growth of primary storage including the foot print, power, cooling and man power required to manage it.

However, as we all know in IT, no two environments are the same and what may be good for one may not be good for another.  When looking at primary storage optimization there seem to be a number of available technologies and ways to deploy these technologies and the key question is what is right for ‘my’ environment.

Comprehensive Capacity Optimization – Deduplication 2.0


Technology is great isn’t it?  When someone thinks they have a new idea on the same old technology foundation they call it “X 2.0″.  I have been watching the banter between analysts and vendors (specifically NTAP’s Dr. Dedupe and Permabit’s CEO Tom Cook) on the topic of Deduplication 2.0 and it is my belief that the proverbial boat is being missed (since we are using water analogies).  I have been watching these guys hash it out for the past few weeks and decided I have to jump in.  I find the real value to these conversations is the value to the end user.  At the end of the day, it doesn’t really matter who ‘coined’ or ‘invented’ a term (like deduplication 2.0) but what does matter is if  the term actually helps describe a technology and how that technology can be leveraged to make things better in the data center.  We should focus on the implications of this new generation of deduplication – ‘deduplication 2.0’.

In May I delivered a presentation to a number of EMC customers on the topic of Data Deduplication 2.0 – Comprehensive Capacity Optimization.  The point of my presentation was simple (and keep in mind this was before the Data Domain acquisition); there are a number of capacity optimization technologies/capabilities that are available to customers today.  Originally these deduplication technologies were used primarily for backup purposes but slowly, deduplication is making its way into primary storage. Deduplication in primary storage makes a lot of sense FOR DATA THAT IS STATIC.  Why only static data?  Static data is data that isn’t used frequently (doesn’t mean it’s not important, it just simply is not accessed often); because access to this data is infrequent, the performance requirements for this data is less than that of active data. Remember; nothing in IT is free.  If I deduplicate data, in order to use it, I must ‘rehydrate’ it and thus there is a performance implication so I want to be careful where I deduplicate data so as not to inhibit performance on production data.

A Data Protection Reference Architecture – The Final Chapter


The Architecture

This ‘architecture’ diagram, as you can see, is not a typical architecture diagram, but hopefully it can be used to align your business and business objectives with the technologies that are available and can best be applied to solve your issues helping to balance, cost, complexity and compliance.

This diagram can also be used to do a couple of other things.  It can help you begin to classify your data and align your  data to your business objectives.  It also lets you begin to identify what data or data services in your environment that may be more important to you than others and based on this help you to choose areas you may want to outsource or move to the cloud.

As you can tell, there really is not one solution for meeting all your data protection needs.  The challenge comes with managing multiple solutions in an effort to meet your business objectives.  While there are only a few technologies available that allow you to manage your environment across all your RPOs and RTOs, it is important that I point out EMC’s NetWorker is able to do this, centralizing your data protection infrastructure  for ease of management.  It allows you to manage traditional backup, source based deduplicated backup with Avamar, CDP with RecoverPoint, as well as the EMC disk libraries and tape where the data is stored.  Now, I am not saying that NetWorker solves all of your data protection challenges, nor am I suggesting that replacing one traditional backup technology for another is the right answer, but what I am saying is that if you’re looking to have all the feature functionality required to meet all your business objectives and you want easier management, NetWorker is one avenue to get you there.  Additionally, the underlying image of the triangle represents data protection management.  Putting all the new technology in place is one thing, managing it, and ensuring you are now meeting your business needs is another.  EMC’s Data Protection Advisor can help here as well.

A Data Protection Reference Architecture – Part 1


This blog will have multiple parts.  I will introduce my view of a data protection reference architecture and the next few blog posts will talk to components of that architecture.

The other day  I had a very interesting conversation with a colleague of mine in Australia.  He was looking for a data protection reference architecture that he could use to speak to his customer.  As you can imagine having this conversation over the phone could pose to be a difficult challenge.  When the conversation began, my fear was he was looking for an ‘architecture’ diagram that included data protection appliances, backup servers, disk libraries, tape libraries and backup agents.  I quickly realized that this is an impossible conversation to have with him without knowing:

A)     the customer’s environment or challenges

B)      the customer’s business objectives

I find that most vendors don’t know A or B when speaking to a customer about their data protection ‘issues’, but they really should.  Having a more thoughtful conversation with customers in a consultative fashion is more relevant to customers in understanding their challenges and helping to align these challenges to the best possible solution.

I started my conversation with the diagram shown below (Figure 1).  A simple triangle divided horizontally into 4 segments and the middle two segments divided vertically in half.  Each segment represents different business objectives within a company.  As you go around the triangle, you can see that there are different technologies and different methodologies for attacking data protection challenges, which is why there is no longer a “one size fits all” approach when it comes to protecting data today. Let’s face it; the two most important commodities in backup are time and capacity.  One of the primary drivers behind the type of protection that is used is the Recovery Point Objective or RPO.  Different technologies provide different RPOs and each has a different price point as well as there are different processes that can be applied to attach RPOs.

Figure 1

Figure 1

EMC World Kicks Off with Clouds and Virtualization


EMC World kicked off this morning first with a presentation from yours truly on Data Deduplication 2.0 – Comprehensive Capacity Optimization.  We discussed how data deduplication 1.0 is morphing into all areas of EMC’s storage ecosystem in order to optimize capacity everywhere.  I talked about data deduplication as well as single instancing and compression are technology components that will help EMC achieve this goal.

Next Joe Tucci spoke in his keynote about how data deduplication as well as compression are key technologies for the data center of the future and how these technologies will aid in delivering a more efficient cloud computing strategy.  Not only will these technologies help in building out a cloud infrastructure, they will also help to protect a cloud infrastructure (which is what we are all about here).

Finally, Paul Maritz gave his keynote on how the virtual infrastructure will help to fulfill the goals of a private cloud.  He also discussed that it is time to invest in software and people and not hardware as VMware continues to drive value into their software to help make your data center, better, smarter, stronger and faster for less.

Each of these initiatives will have an impact on how data is stored and ultimately protected but new storage services will enable more efficient storage and protection across the virtual data center and the cloud and ultimately take backup beyond and put you on the road to recovery.

Stay tuned for more updates about the show.

Post to Twitter