Elements to Architecting Database Services into an All-Flash Cloud Infrastructure (part 2)

In part 1 we saw that the next generation of storage based on flash is less about high performance and more about coupling that high performance to storage features needed to architect a cloud. In this post we look at building out databases services in our cloud. The critical thing to understand here is that many flash vendors provide high IOPs – not all of them provide critical resiliency and storage features.  In part 3 we will discuss some of the architectures around virtualizaton.

Let’s start with a common pattern architecture used for virtualization, databases and middleware. We can aim for using a high performance network as the backbone.

gen_arch1

Database.  So let’s start by looking at databases. Consider we are aiming to build something that can compete against an Oracle Exadata. We can use SQL Server DB or Oracle DB both work nicely with flash-based arrays.  In this example, we will use Oracle 11g DB Enterprise Edition and RAC running on the Dell R910 server (video). For reference, you can look at :

dellarch1

So, because the architecture must support an always-on Oracle RAC, the arrays must have full redundancy and non-disruptive upgrades (NDU).  Redundancy is important because if something fails you want a fully redundant system so one failure doesn’t take down your array, your data and with it, your applications. Further, NDU allows you to upgrade the system without taking the array out of service.  For this, exercise we can select the Pure Storage (FA-400-based) array.  There are a number of other arrays that we could use in this example like NetApp’s EF540 or HDS’s flash array.  The key thing here is not to trade IOPS for possible data corruption or outages. The key thing is to couple performance, non-disruptive upgrades and operational resilience/redundancy.  In our hypothetical architecture below we get high performance compute, fast networking, overall redundancy, array compression and NDU.

archO5

Non-Disruptive Upgrades. We can chose to use other server hardware but for this hypothetical example we can use the Dell R910 – we could just as easily choose to use HP servers.  The Pure Storage array gives the architecture high performance while at the same time providing non-disruptive upgrades.  NDU is a critical enterprise feature.  Let’s take a quick peek at what this NDU feature provides in this video :

It should be pointed out that the array supports redundancy allowing non-disruptive capacity expansion, non-disruptive controller upgrades, non-disruptive hardware replacements, non-disruptive software updates and with virtually no performance impact.

One simple approach to increasing performance of Oracle DB is to put the Redo logs on flash storage. You can see what happens in this example :

PureO

Now let’s look at the advantages of running Oracle itself on flash – Pure Storage highlights these advantages :

PureO2

In this post I have highlighted the advantages of running Oracle’s database on flash but not without the key ingredients of hardware redundancy and non-disruptive upgrades.

In a future post we will dive further into Oracle’s database running on flash.

 
 


gotostorageGo to more posts on storage and flash storage blogs at http://digitalcld.com/cld/category/storage.


 

Eight Elements to Architecting An All-Flash Cloud Infrastructure

The emphasis on ‘all-flash’ is somewhat aggressively misplaced.  Hybrid solutions make a lot of sense from a transition point of view and from a price standpoint.  Not all storage has for example high performance requirements.  Many vendors are talking about an all-flash cloud (storage) infrastructure and in this post I will reference a number of companies offering some key features. Some vendors only like to only talk about performance.  The question is how do we take advantage of performance that is available in virtually dozens of new flash startups and older storage companies that are offering flash-based technologies.  Performance is only one part of the equation.  Many companies want to make performance the main part of the discussion  because they are simply trying to sell flash-based products and they have the performance part.   Instead of trying to simply incorporate a flash-based solution whether it makes sense or not – let’s look at what we really need for Cloud infrastructure.  The reality is that flash alone is often not the only answer. If the flash array does not solve key problems and does not provide the critical features that such a flash-based cloud infrastructure will need – then simply moving to flash without carefully consideration may cause as many problems as it solves. Let start simply and abstractly.  What is the chance that this all flash cloud will have highly multi-tenant virtualization ?  Very, very high.  It doesn’t matter whether we are talking about KVM, VMware, OVM, Solaris Zones or any other virtualization -with lots of tenants comes competition for CPU, memory and also storage in the form of both capacity and IOPS.  It is all about quality of service for each tenant. Such a cloud needs resource management for CPU and memory to limit run-away applications from consuming huge amounts of CPU and memory. VMware and other virtualization solutions have usually done a reasonable job at providing these resource controls.  It shouldn’t stop there and it hasn’t. VStorage APIs for Array Integration (VAAI) and VMware Virtual SAN are other aspects of the solution.  There is a type of resource management for storage in the form of providing quality-of-service with regards to assured IOPS to the many tenants is just as critical.  IOPS starvation or the notion of ‘noisy neighbors’ happens in virtualization.  To avoid that, a modern flash storage array, should have a quality-of-service feature built in which should offer threshold settings for minimum IOPS, maximum IOPS and burst IOPS. Any vendor that suggests that quality-of-service is not needed is not really serious about how a cloud offers resources to the cloud’s many tenants.  Within this context, consider that in a cloud or cloud hosting environment – IOPS thresholds can not only be guaranteed – but they are monetized.  Two examples of companies addressing this issue, SolidFire and NetApp both attack the quality-of-service problem head-on with solutions to address it.  Second, within a highly virtualized cloud setting – they are many duplicate images – native deduplication within the array is not an option – it is a necessity.  Native deduplication prevents redundant images from existing and prevents wasting considerable storage.  Not having this feature is great for the storage vendor but not so great for the Cloud provider.  There are a number of storage vendors that offer flash solutions that sell into virtualized and cloud environments – Nutanix, Skyera and SolidFire are three examples that offer this.  Third, compression and encryption are important and it is hard to find companies that don’t offer these features. Compression reduces the storage being used and can be used to both by databases, applications and to segment seldom used files.  Encryption can be seen as method of providing security to data.   Fourth, many vendors sell islands to put your data on.  That is, they sell arrays that can not be clustered and then sell a monitor and management system to manage them.  In reality, this doesn’t change the fact that the array acts alone as an island and that they are not clustered. The ability to add arrays or nodes that aggregate and scale-out the architecture and create scale-out clusters that can be not only managed and monitored but can also allow a single view of the storage in the cluster as one single instance capacity dealing in petabytes.    Fifth, thin-provisioning is an important offering that allows on-demand storage as opposed to fully allocated storage up-front.  Thin-provisioning of both virtualization and storage (not to mention other resources) can have the effect of keeping allocated but under-utilized storage to a minimum.   Sixth, replicationsnapshots and clones, offers a way to do a fast point-in-time copy.  Seventh, one key feature that mostly enterprises use but increasingly will be important even to cloud environments is the ability to do a non-disruptive upgrade (NDU) on the entire storage array.  Any vendor that uses the word “enterprise” would normally have this – but some don’t.  Imagine an array software upgrade that requires that you take an outage – offload data off the array, upgrade the array’s software and then copy your data back on to the array. Unimaginable ? It’s a real example.  An example in the opposite direction where  NDU allows for no downtime are NetApp, Hitachi and Pure Storage. Pure Storage for example, references non-disruptive everything (ND*).  It supports non-disruptive maintenance, expansion and upgrading of everything without a performance impact. Other vendors like the newest Hitachi flash-arrays have this NDU feature. Non-disruptive upgrades also imply no single point of failure which both of these handle well.  Finally, offering block storage is common but not all flash arrays offer key protocols – NFS and SMB.  Both are pervasively used.

There are of course other key elements. We didn’t cover everything.  The various flash-vendors are offering RAID and RAID-like solutions.  They are also offering monitoring and management solutions. Other emerging aspects are  extremely fast networking now being offered, such as 16Gbps Gen 5 Fiber Channel cards and switches. It also goes without saying that high performance servers are naturally part of the equation.  More on this in the next segment.

It is easy to focus on performance, but overall performance, low-latency and throughput are just part of the equation and are reasonably offered by most of the flash and storage vendors  – yet people are sometimes drawn to numbers (1 million IOPS, 2 million IOPS, 5 million IOPS, etc) but the key to success in the cloud and as well in the enterprise is actually not simply performance elements, but also these storage features.

Vendors that only show big numbers sometimes mask the lack of storage features that create a fast, holistic, balanced and reliable architecture.

In part 2 we will look at what an all-flash cloud infrastructure looks like.

 


gotostorageGo to more posts on storage and flash storage blogs at http://digitalcld.com/cld/category/storage.


 

Recommended Reading : Beware of the Giraffes in your Data

This is a nice article which discusses how we approach data and we should question the obvious.  Sometimes the obvious can lead us to adopt the wrong strategies.  The obvious or the portions of data that dominate can fool us into making seriously flawed decisions. This article provides insight into this aspect.  The article provides some nice examples of this.

gigaom

Recommended Reading : Google Releases ‘The Datacenter As A Computer’

Google has just released a new introductory PDF book – The Datacenter As A Computer : An Introduction to the Design of Warehouse-Scale Machines (2nd Edition). This book covers warehouse-scale machines, cost efficiencies at scale, datacenter deployments, architectural overviews of the infrastructure, workloads, software infrastructures, datacenter basics, energy/power efficiency and quite a bit more.  This is an excellent read for anyone dealing with datacenter deployments or aspects of the datacenter.

screenshot_17

 
 


gotostorageGo to more posts on storage and flash storage blogs at http://digitalcld.com/cld/category/storage.


 

Recommended : Java 7 Provides Support for Infiniband SDP/RDMA

In case you missed it – if you run Java applications and your applications run over Infiniband – think Java 7.  Java 7 includes Java Sockets Direct Protocol (SDP) which leverages Infiniband Remote Direct Memory Access (RDMA).  In low-latency use-cases  of SDP – RDMA bypasses the operating system. This does three good things – gives you the lowest latency possible, the highest throughput and the smallest CPU footprint.

java7sdp

Flash Memory Summit 2013 Presentations (PDF)

If you didn’t go to this year’s Flash Memory Summit wouldn’t it be nice to see what people were talking about ?  There is a wealth of presentations that are available :

flashsummit

There is simply too many interesting things to go through in the proceedings.  One item that caught my eye was the leader in flash storage PCIe cards, Fusion IO open source projects around flash, Creating Flash-Aware Applications.  Another one that looked highly interesting is how Facebook uses flash, Flash at Facebook provides the dimensions and scale of flash use at Facebook.  Another point of interest, Flash in the Cloud : A Gentle Introduction is worth looking at.

 
 


gotostorageGo to more posts on storage and flash storage blogs at http://digitalcld.com/cld/category/storage.


 

Cloud Storage : In Search of the The Next Generation Cloud Storage Platform

Today’s post is the second one on Cloud Storage. Today we look at SolidFire. We examine some of things SolidFire has built into their platform.  A year ago, there were only a few flash storage vendors – today it is a crowded field – and last year’s leaders are only running on last years success and marketing momentum.  A new group of flash storage companies have emerged with as much emphasis on software and storage features as on hardware.  Everyone has IOPs this year – but not everyone has key storage features like QoS, dedup, etc.

In high multi-tenant environments such as a cloud or a highly virtualized architectures – resource management is an extremely important feature.  Not just of CPU, network bandwidth and memory – but also IOPs.  There is a lot of learning going on with regards to virtualization and what it can teach us about the next generation of cloud deployments. SolidFire has an interesting read :

solidfire1

There are two interesting meta-cloud projects aimed at cloud infrastructures. For those unaware of these projects, their scope is stunning.  OpenStack is a project aimed at providing infrastructure as a service (IaaS).  There is an OpenStack Foundation that manages the project.  There are over 200 companies that are part of this project. With the OpenStack project are number of inter-related sub-projects aimed at controlling :

It should be noted that a cornerstone of the project is that OpenStack’s APIs be compatibility with Amazon’s EC2 and Amazon’s S3.

As if that wasn’t interesting enough, there is another Cloud IaaS project – CloudStack which  provides many of the same features and has been around long enough to have significant adoption.

You can see what SolidFire is doing with the Citrix’s CloudPlatform (which is based on CloudStack) and provides in this reference architecture document.

SolidFire2

OpenStack has a lot of backers and the adoption rate is quite high. With IBM’s endorsement of OpenStack it has given the project a big boost.  Enterprise flash array vendors are providing reference architectures that show how OpenStack plays with their flash storage arrays.  Today we will focus on one such company, SolidFire, which has done a lot of work to make sure that their storage products work with OpenStack and CloudStack, let alone VMware.  Both open source stacks have large followings – but for today we are looking at what SolidFire is up to.  For example, with regards to OpenStack, they have spent considerable energy providing reference architecture documents :

They have provided a short OpenStack 101 video :

solidfire10

One thing one notices is that SolidFire provides a Quality-of-Service (QoS) architecture.  Any one that has worked in virtualized environments recognizes immediately the need for resource controls on the tenants.  IO is a natural place to have such a control.   Some vendors pretend that this feature is unnecessary, but the opposite is true.  In a cloud or a highly virtualized environments with high multi-tenancy making demands on IO subsystems it makes perfectly good sense to have QoS.  SolidFire provides a an elegant solution that limits ‘noisy neighbors’ (tenants making extremely high demands on the IO subsystem and effecting performance of other tenants).  One extremely important point is that SolidFire’s storage system was not only architected with QoS in mind, but each SolidFire node is a self-contained node but when combined with other nodes functions cluster – exactly what you would expect for cloud storage.

It does, however, get even better. SolidFire’s ElementOS delivers features that other storage vendors lack. One is deduplication.  In virtualized environment this is basically one of those features that is pretty important. Often it can reduce disk use from 25% to 40%.  Some vendors don’t provide dedup and there is an enormous use of redundant files which wastes significant portions of available storage. Also delivered is thin-provisioning and real-time compression.

One recent cloud-oriented move, OnApp is working together with SolidFire to allow finely tuned billing of IOPs.  Using OnApp’s control panel it can specify minimum, maximum and burst IOPs.

solidfire11

One notices that unlike a number of  vendors that like to point to their high peak IOPs numbers – SolidFire’s aim is quite a bit more sophisticated and centered around what is increasingly the future – cloud deployments.  Deliver QoS resource controls for Cloud flash storage (which many vendors lack), deliver high performance flash storage, provide compression, snapshots/cloning, dedup, thin-provisioning and provide hardware that aggregates into useful storage clusters. In the end, unlike some vendors that deliver terabytes of isolated islands of arrays, SolidFire delivers petabytes of cloud-optimized and resource-managed clusters, and that is what the next generation of cloud storage should look like.

 
 


gotostorageGo to more posts on storage and flash storage blogs at http://digitalcld.com/cld/category/storage.


 

Recommended : AWS Self-Paced Learning

If you have always been interested in learning how to use AWS, you may be interested in a self-paced learning program that is now available.  The training covers the basics of Amazon EC2, elastic block store (EBS), elastic load balancing and auto-scaling.  Once your done with those you can move on to more advanced learning.

AWS

More details can be found on the AWS Blog.  By the way, these are self-paced online training labs that run within a live AWS environment.