Tag: "Deduplication"

Fixed Input vs. Variable Input Compression


As a number of you know, I have been blogging about the merits of Real-time Compression.  It may be of some interest to know that when Ed Walsh, CEO of Storwize, asked me to join and told me the company focused on "compression", I first thought he was joking.  I mean the industry has had compression available for years.  The reality is, there is no other technology like Real-time Compression available from any vendor, and it is today, even more clear, why IBM chose to own this technology.  In the next few blog pieces I plan to talk about a few of the concepts of the IP that make this technology so far advanced than any / all of its competition. Today’s piece is about fixed input versus variable input compression.  This is a very simple concept to understand really.  Traditional compression uses a process called 'fixed input' / 'variable output'.  If we refer to the diagrams below, we start off with the original file on the left and the compressed file on the right.  The way traditional compression works (and you can actually watch this on your home computer if you winzip a file) is the following: The compression algorithm will 'chunk up' the original file into 'manageable' sizes before it compresses the file.  The tradeoff here, and why this process happens, is like with anything in computer science, performance for optimization.  The first diagram shows the large file being 'chunked up', compressed and stuffed into the smaller file.

 

  

Figure 1

There are two significant issues with this. The first issue is that the compression dictionary is not shared across multiple ‘chunks’ when compression is taking place.  The example in Figure 2 shows that the letter “F” in ‘chunk’ 1 does not get compressed with the letter “F” in ‘chunk’ 4.  This means that the compression ratio is simply not optimized across the file.

PDF    Send article as PDF   

Storage Efficiency Panel – SNW 2011 Fall


Yesterday I was on a panel at SNW in Orlando Florida.  The panel was hosted by Dave Vellente, Founder of Wikibon and always a great host for these kinds of things.  On the panel was Larry Freeman of NetApp, Craig Nunes of HP (formally 3Par), Jarred Floyed CTO / Founder at Permabit and myself, IBM (formally Storwize).

Some interesting data came out of this panel.  There were probably over 150 people in the audience.  It was a well-attended session.  Also, Dave is VERY good about asking the audience questions.  Let me start by making sure we all know where everyone sits at the “storage efficiency table” that was on the panel.

  • Larry Freeman is from NetApp – they claim, and I believe them, that they have 10 storage efficiency technologies that are embedded into WAFL
  • Craig Nunes main focus on the panel was ‘zero reclamation’ to optimize storage
  • I have a Real-time Compression drum I am beating
  • Jarred Floyed focuses on data deduplication

Here are some questions and answers Dave got when speaking to the audience:

Dave’s Question

Audience Response (in close estimated %)

How many people use deduplication / compression in their storage? 60% responded they did use one or both of these technologies in their environment
How do users use these technologies - embedded or appliance? 100% of the 60% said "embedded"
Who is your storage vendor was that provided these technologies? 100% of the 60% said NTAP
What is the number 1 issue was with the embedded solution and making it not more widely adopted? Performance was the answer.  They all believed that for 70% of their applications, the embedded solution was “good enough” but for 30% where performance is critical – it couldn’t do the job.
Why are not more appliances deployed to solve the performance issues? The response was that customers didn’t want to have to manage multiple solutions in their environment doing the same thing.
PDF Printer    Send article as PDF   

Storage in Eastern Europe


 

 

 

 

 

 

 

 

 

 

Today I begin a 12 day trip to Easter Europe to talk about IBM Storage.

The trip will take me to:

  • Moscow, Russia
  • Warsaw, Poland
  • Prague, Czech Republic
  • Ljubljana, Slovenia
  • Umag, Croatia

In Russia, on September 6, I will be at the Information Infrastructure Conference and the following day meeting with customers to discuss storage and storage efficiency.

In Poland on September 8, I will be presenting IBM’s Real-time Compression at Storage University.

In Prague I will be meeting with the press as well as speaking with customers.  Additionally, I will be spending the weekend in Prague, a city I have always wanted to visit.

In Slovenia on September 14, I will be presenting at IBM’s Innovation Center at an IBM Solutions Event.

Finally in Croatia on September 15, I will be at the IBM Forum, the largest IBM even in Croatia.

In each location, I will be speaking with partners and customer on IBM’s innovation in storage, storage efficiency and Real-time Compression.  I am looking forward to learning what the largest storage challenges are across Eastern Europe and users go about solving their challenges.  Additionally, I will be doing some local enablement for our partners and sellers.

I will blog from each location.  I will talk about the professional part of my travels as well as, hopefully, one personal event.  I have tried to make sure that in each city I have time to do one interesting thing.  I don’t know when, if ever, I’ll be back to these cities and these are some places I have always hoped to go.  Too often we travel and its all business.

Also stay tuned, when I land I will have an update from my trip to VMworld.  It was fantastic.  Truly the best end user show around.  I learned a great deal and can’t wait to share some of what I saw.  As always – comments are always welcome.

PDF Creator    Send article as PDF   

Storage Efficiency Spotlight at VMworld


VMworld Live 2011
Via: Wikibon

Free PDF    Send article as PDF   

Efficiency vs. Optimization


“Storage Efficiency” has become a big topic over the past 12 months.  There are a number of new technologies that have come out in the last few years that are helping to deal with storage growth.  We all know that data is the root of the decisions that drive business today.  The more data you have, hopefully, the better decisions you can make to drive your business to success.  The question is, “what is the value (and hence the cost) of the infrastructure to create that success?”  What we do know is that the ability to put more data in a highly efficient footprint can give your company a competitive edge.  There are five technologies that can help an IT organization create an efficient storage infrastructure.  These are:

 

1)      Tiering

2)      Virtualization

3)      Thin Provisioning

4)      Compression

5)      Deduplication

It is also important to point out that there are some semantics when talking about storage efficiency, specifically between efficiency and optimization technologies.  I think it is useful to attempt to define these as they lead us to picking the right solutions for what we are trying to accomplish.  For the purpose of this post, efficiency will relate to making existing capacity more useful and optimization will mean making more capacity out of existing capacity.

Using these definitions, technologies such as Tiering, Virtualization and Thin Provisioning are efficiency technologies.  These technologies help to utilize the existing capacity that you have.

Tiering is technology that is used on about 10% of your data or less.  It is used to move data that requires higher performance to flash storage.  Good tiering technology analyzes data access patterns and moves the most active data to the highest performing disk.  It doesn’t really change the amount of physical capacity that is required; it just changes what type of capacity is required and allows IT to make sure data is operating as fast and efficiently as possible.

PDF Creator    Send article as PDF   

Data Protection, Retention and Archive Starts with Data Value


 It feels good to open up the blogging again to new topics, especially ones I am intimately familiar with.  (But have no fear, there will be references to primary storage optimization / compression.)

This weekend I had an interesting conversation with my Dad.  We were discussing backup.  My dad basically runs IT for the State of Maine.  The State of Maine uses CommVault backup software.  So I posed the question to him, “What would it take for you to rip out CommVault and replace it with another solution.  He thought about it for a moment and replied “I wouldn’t”.  His answer came down to a couple of reasons.

First was the expense.  It’s not just about buying the new software, it would be training people to run the new software and it would be about throwing away the massive investment they have in their existing product as well as converting all the years of backup takes created with one software to the new software.  This is one of the biggest things vendors forget when trying to sell a customer on their backup software.

Second was the fact that, feature for feature, the top 5 traditional backup software products are not really that different from one another.  Sure, I do agree that some products have features that others don’t, and others products have features that work better than others, but in reality, the delta is so small and the workarounds are so simple it doesn’t really matter.  Unless your replacing traditional backup software with an evolutionary source based data deduplication software (which is only applicable for some environments) there is no advantage to switching software.

The challenge is if Data Protection is still one of the biggest and most expensive pain points within IT, how do the problems get resolved if replacing the software controlling it all is too costly to change?

PDF Creator    Send article as PDF   

Real-time Compression “Meets Minimum”


IBM's Ed Walsh, Director of Storage Efficiency sits down with Steve Duplessie, Founder of ESG to talk about how IBM Real-time Compression sets the bar for doing storage optimization in NAS. At the end of the day, if you can do compression in real time, without sacrificing performance and the transparency of the implementation, then why wouldn't you - given the savings you can get over traditional compression.

We all know compression is not new and it is coming as a standard feature in a number of storage systems. The issue is, each of these technologies has a significant impact on performance - both primary storage performance as well as the performance on all of the back end operations such as backups, replication etc...

IBM's Real-time Compression doesn't have any of these limitations - listen to Ed to hear more.

PDF    Send article as PDF   

Key Competitive Advantages to IBM Real-time Compression


It still baffles me when there is so much information available for people to learn about any topic and it is not used.  Many times people just tend to rely on the information provided by their employer (which in many cases is just competitive FUD).  This video was the result of reading an email between IBM and one of their key partners on the competitive knowledge of each others products.

PDF    Send article as PDF   

Storage Alchemist Video Update #2


See how data deduplication and IBM Real-time Compression work hand in hand.

PDF Printer    Send article as PDF   

Linked In Storage Discussion on Storage Efficiency


Great conversation on Linked In about deduplication and compression for storage efficiency in the Data Storage Professionals Group.  Help the storage community answer this question:

Does anyone has any experience in NAS de-duplication at filesystem level, like NetApps. Does it really work? I concerns/limitations?

Free PDF    Send article as PDF   
medrol . car site circular car rental in france dirt cheap duties . Вода вода для кулера дешево. Пражская.