Who Will Cache in on Cloud Storage?

Stacey Higginbotham, Saturday, April 26, 2008 at 12:02 AM PT Comments (10)

As data moves into the cloud, storage companies are taking advantage of virtualization and adding more memory to the data center. Techniques such as storage virtualization can improve the usage of existing storage hardware and make provisioning easier, while adding memory to the data center can make accessing information faster.

Many companies are evaluating their use of memory in the data center as they try to strike a balance between easily accessible cache memory powered by flash and slower-to-access disk memory powered by hard drives. At the same time, they’re trying to make their storage easier to provision and more reliable by looking at some form of virtualization. Both trends will change the dynamic for large storage vendors in the years to come.

As you move along the storage technology continuum, you’re trading price for speed. Getting information stored on tape, which is cheap, can take hours or days while accessing something on flash, which costs a pretty penny, takes microseconds. Plus, solid-state drives using flash can’t possibly store all of the data people are creating. There’s also the question of how reliable it is.

Given this, most companies requiring huge storage arrays rely on expensive machines from the likes of EMC or HP. Or they make their own “storage cloud” using commodity disk drives and a proprietary layer of software. By allowing companies to allocate and provision the storage in a software layer, it virtualizes the storage array. It’s essentially the same model that underpins the storage services offered by Amazon S3 and Nirvanix.

Meanwhile, tier-one storage equipment vendors companies such as EMC, IBM and HP have recognized that cloud storage is the future of computing, and are attempting to ride that wave without cannibalizing their high-margin box business. For example, EMC is offering services for SMBs through its Mozy acquisition. IBM last year purchased XIV, which makes the software that can be used to virtualize storage. Large companies such as NetApp and 3Par are attempting virtualize storage as well.

But once the cloud is in place, there’s still the issue of calling up data and delivering it relatively quickly. For certain applications, such as those requiring instantaneous access to large quantities of data like seismic graphing or historical financial analysis, cloud storage may never replace a spinning drive connected to a sever via Fibre Channel.

But for many applications, including media delivery and most application delivery, tweaking storage for the cloud means adding faster cache memory or optimizing the storage infrastructure by geographic location. Nirvanix, the startup providing hosted storage in competition with Amazon’s S3, touts its multiple storage clusters as a way to deliver faster access to stored content. It’s also looking to provide nodes on the customer premise called “NAS heads” that will basically allow for frequently called up “hot data” to be stored there.

Alternatively, or possibly in conjunction with such a setup, a customer interested in amping up the speed of cloud storage might buy equipment from startups providing different levels of cache to aid in hasty data retrieval. We’ve covered some before, such as Atrato, which actually offers a box of disks attached to a controller that runs software designed to access and configure the hundreds of spinning disks. The result is the reliability of spinning disks with a faster information retrieval speed. Others that rely strictly on intelligently routing needed data to cache included Gear6 and Xiotech Corp.

Storage being served via the cloud is a forgone conclusion. It only remains to be seen if a startup like Nirvanix can grow to compete with the big players in storage or hosted computing, and how the larger storage vendors will walk the line of creating cloud products without jeopardizing their hardware business.

A far more interesting trend to watch will be how the growing amount of stored data is kept and delivered in the fastest amount of time. For proof that storage is relevant check out Facebook’s hardware. A little more than 8% of their servers are devoted to the distributed caching system, memcached. The entire purpose of those servers is to speed delivery of information for the social network. In this age of instant gratification, we may find that cache is king.

Rating: 47% Thumbs Up Thumbs Down
Print

5 trackbacks so far

April 29th, 2008
8:50 AM PT

[...] Who Will Cache in on Cloud Storage? [...]

June 4th, 2008
3:03 PM PT

[...] covered startups in the past whose entire existence is based on figuring out how to get to existing data faster, either through appliances or compression. With users storing more data and expecting continual [...]

June 12th, 2008
6:08 AM PT

[...] read more | digg story [...]

June 19th, 2008
7:27 AM PT

[...] which means keeping them requires a trade-off between fast access and cheap storage. A range of companies are trying to address these sorts of storage problems through compression, caching and even Flash memory in the data [...]

July 3rd, 2008
12:11 AM PT

[...] which means keeping them requires a trade-off between fast access and cheap storage. A range of companies are trying to address these sorts of storage problems through compression, caching and even Flash memory in the data [...]

5 comments so far

April 26th, 2008
8:33 AM PT
Deppy said:

Hah…”cache is king”…nice piece, Stacy.

April 26th, 2008
10:47 AM PT
dave said:

you know, for some very interesting insights into all of this, perhaps consider a fup piece looking at the new world of database startups - nothing for years and then out of nowhere a group of companies get funded including luminaries like stonebraker (vertica), tan at greenplum, and so on…they’re all looking at how data stores scale for the web, the basic idea being that these “things” that were built 25 years ago for banks really no longer make sense.

and i’m not just talking about interesting stuff like swiveldb and dabbledb, i also mean oracle, microsoft and ibm and hp…

funny, business week just has a piece where hp’s own cto could not explain to the journalist reporting what “cloud” actually means! very comical, but darkly comical, right?

April 26th, 2008
11:00 AM PT
Stacey Higginbotham said:

@Dave, good idea. As for HP’s CTO, I would say that clouds are nebulous :)
There’s even been some debate in our comment sections about defining the cloud.

April 26th, 2008
11:59 AM PT
Nati Shalom said:

“In this age of instant gratification, we may find that cache is king”

Another option is combining an pure in-memory storage a.k.a Data Grid and use SimpleDB/S3 and back end storage. In this way the application can benefit from the speed of memory storage while keeping the data backed by low cost storage and even get better scaling and performance then high-end storage devices. See an example of such project here:
(link)

April 26th, 2008
7:02 PM PT
snyggast said:

nirvanix vs s3. can’t seem to decide. but check this out

When you’re really pushing traffic, Amazon S3 is more expensive than a CDN

(link)

Leave a Comment

Get the comments RSS feed, instant notification of new comments

Most Comments

S3 Outage Highlights Fragility of Web Services
Om Malik, July 20, 47 comments
Why Silicon Valley Should Be Worried
Om Malik, July 17, 41 comments
GigaOM Acquires jkOnTheRun
Om Malik, July 22, 36 comments
Why Metered Broadband Is Bad for Microsoft, Google & Us
Allan Leinwand, July 17, 27 comments
F|R Crib Sheet: 15 Sites to Cut Your Startup Operating Costs
Carleen Hawn, July 19, 27 comments
Close
E-mail It