It certainly worth viewing the presentation by Mohit Thatte as he provides a deep-dive into the Clojure data structures. The video can be found here :
The slides can be found at slideshare.
It certainly worth viewing the presentation by Mohit Thatte as he provides a deep-dive into the Clojure data structures. The video can be found here :
The slides can be found at slideshare.
The move to the cloud is on. Increasingly, even companies that are mandated to comply with various corporate and national privacy and security standards, such as HIPAA, are also looking at ways they can extend their company networks to include auto-scaling clouds while at the same time abiding by those security standards. With the availability of sophisticated cloud providers such as AWS, Azure, Joyent and others it is increasingly attractive for companies to figure out ways to leverage these cloud providers and burst out of the corporate networks and transparently use these clouds.
In thinking about this, it becomes of interest to figure out the “how-to” of doing this. A number of cloud providers continue to work on being able to stretch on-premise with cloud. We can look at what Microsoft Azure has been doing to figure out why companies are looking at a merge of on-premises datacenter and cloud. They have provided an example of how a company could extend their on-premises datacenter with the Azure Cloud. In their informational diagram it is possible to turn on and off information bits within the the datacenter/cloud architecture. They allow for turning on and off information layers as can be seen from the figure above. You can see the example in the image below.
The above cloud represents an example of how companies can now merge the Azure cloud and their company network securely. You can find more details on this at the link below.
There are all sorts of challenges but companies like Microsoft are increasingly delivering ways to securely extend corporate networks into auto-scaling clouds.
Another company that allows bursting to the cloud is Cloudian. Their focus on providing an enterprise hybrid cloud allows corporate networks to connect safely with clouds. In Cloudian’s case, their product, HyperStore, combined with the Amazon cloud, allows for a next-generation hybrid IT cloud. The Cloudian/Amazon combination allows a 100 percent S3-compliant hybrid cloud storage platform. Dynamic migration from on-premises physical storage to off-premises cloud storage allows near infinite capacity scaling to meet the security and cost requirements of enterprise environments. Service providers who provide multi-SLA storage services are also benefited by this hybrid structure. You can read more about it :
In the next extending-into-the-cloud post, we will look at extending Microsoft SQL Server into the cloud. SQL Server 2016, when it arrives, it will encrypt all data by default, and is integrated with the R statistical programming language. More interestingly it allows a stretch into the Azure cloud. More on this in the next post. In the post that follows we will also discuss HIPAA cloud providers and whether they can remain relevant in the face of substantial improvement in merging the on-premises networks with clouds.
I’ve been heavily invested in learning and working on Solr deployments and also learning Chef this past few months. More on those technologies is coming shortly.
It is worth reading a Clojure introduction if you are trying to learn Clojure quickly. Here are two quick and useful reads.
Also there is a nice stackoverflow question and answer on learning how to write Clojure web services.
Sometime ago I discussed deduplication and the state of it within the flash array community in a post, Deduplication Fears From Those That Don’t Offer It, that was primarily prompted by a series of negative posts by two CTOs at Violin Memory and a tweet that was somewhat startling :
Our thoughts on why the advantages of Inline Deduplication do not exist: http://t.co/7TCSMfl8V5
— Violin Memory (@ViolinMemory) December 17, 2013
Both inline dedup and dedup offer substantial benefits. There are applications where they may offer minimal advantages but on the whole they are a net positive. And in many applications such as VDI, virtualization, etc – they are simply very, very useful. And yes one can find edge-cases where dedup may not give you anything except overhead. You can read my original post. The tweet was not simply inflammatory by suggesting that the benefits of inline dedup were not existent, but it also suggested that the entire rest of the flash array community was wrong to suggest otherwise.
An announcement discussing a Violin Memory and Symantec collaboration that announced dedup was coming took place in August 13, 2012. What transpired in the intervening timeframe was the above dissonant posts and tweets and finally and relatively recently, Violin Memory introduced a Microsoft-centric Windows Flash Array running a Windows Server OS which included dedup and a Concerto 2200 appliance which also offered dedup and could work in concert with their arrays. However, up to last week’s launch, their mainline array operating system lacked dedup and generally inline data reduction found in competitive systems.
In my original post I offered a preference : “Nimbus Data’s Halo approach to de-duplication seems to me to be very reasonable – it allows you to turn de-duplication on and off on a LUN/filesystem basis. In any case, at the very least, they should offer their customers the opportunity to use deduplication on good-fit use-cases by including it in their array’s operating system.” I liked and still do like the idea of being able to turn off dedup if you feel you don’t need it, however vendors like Tegile, Pure Storage and Solidfire routinely prove to me that having it always turned on works just fine across the gamut of applications running on their platforms that span databases to virtualization. Regardless it is nice (in my mind) to be able to turn a feature on and off if you feel you need to.
In this week’s launch, which in my mind, was primarily about erasing this competitive deficiency of not having dedup and generally inline data reduction – they launched a rebranded version of their OS which now offers ‘granular’ inline block deduplication and ‘granular’ inline block compression. And the tune has changed dramatically with this tweet.
— Violin Memory (@ViolinMemory) February 17, 2015
Basically offering a form of inline data reduction that offers the option to turn these things off if you want – really very good news – they are no longer minus this key enterprise data reduction feature-set. Congratulations. This is a milestone that allows Violin Memory to more efficiently expand the raw storage. Check out their website to learn more.
It certainly is worth looking at the latest white paper from Symantec and Intel. They have dropped a small bomb on the flash array party. In a white paper, Software-defined Storage at the Speed of Flash, the duo provide a look at a nice Oracle database architecture where they show both price/performance advantages and comparable performance to flash arrays from Violin Memory, EMC and a CISCO solution. Two Intel R1 208WTTGS 1RU servers were outfitted with four Intel P3700 Series SSDs, 128 GB DDR4 Memory, Symantec Storage Foundation Cluster File System 6.2, Oracle 11gR2 and Red Hat Enterprise 6.5 OS. The two servers are interconnected with high speed dual-port Intel Ethernet Converged Network Adapter. The white paper goes into quite a bit of detail and offers a nice chart comparing the converged solution with the flash arrays solutions.
Update 3 : I have added two new aspects to consider – the notion of the guarantee of the array you buy and the aspect of capacity.
Update 2 : If you are interested in my recommendations for All-Flash Arrays please read my most recent post on the new class of flash arrays that fully support deduplication and a full range of storage features, Recommendations for All-Flash Storage Arrays; Aiming Beyond Simply IOPS and Low Latency. But make sure to read this as well as it provides a structure for helping you to determine how to go about choosing an all-flash array.
Update 1. If you are grappling with what questions to ask your SSD or flash storage array vendor who says his last customer saw a 10x increase in performance but can’t really give you a nice write-up of the hardware and software configuration – consider that maybe the 10x increase wasn’t only the flash in the array. Interestingly if everything on the market represents “next generation” “cloud” or “enterprise” infrastructure and software then the words “next generation” kind of lose their meaning. Here are a few candidate questions I have assembled to consider when talking to flash array vendors. I am continuously adding other questions that should be asked of flash storage array vendor. In this post I have added some more – Replication, Backup and Restore software, Monitoring, Automation and Data (Protection) and Company Viability. In the past, I aimed at the heart of the matter in another post, Today’s IOPS Matter Less Than A Good Architecture and Storage Features. In today’s post, I am aiming at providing for anyone looking at purchasing a new flash storage array with some questions that might cause tachycardia among sales folks – and perhaps even worse among marketing folks that often do their best to hide a product’s warts. These are part of the notes from a long-running, work-in-progress, An Introduction to Using High-Performance Flash-Based Storage in Constructing High Volume Transaction Architectures – A Manager’s Guide to Selecting Flash Storage – which will be coming out in the near future. This post is only part of the story – one of the architectural choices today being made today are whether server-local flash or networked array-shared flash will be chosen. Architects have and today make both architectures work. And the choices are becoming more interesting because the SSD and PCIe Flash vendors are providing increasingly higher capacities and in some cases their solutions are hot-swappable. It is possible to choose, for example, Fusion IO’s PCIe flash card solutions to provide in-memory databases with extremely high performance. Obviously, even though networks are getting faster – not traversing a network provides advantages. We will discuss these aspects in a future post.
If you have a fire-breathing dragon of a flash array that can deliver millions of IOPS but you can’t leverage the features you need to increase the storage capacity via data reduction (deduplication, compression and thin provisioning), upgrade the array while it’s running live in production, or you can’t easily replicate the data on it, or can’t scale-out twenty or so arrays into one unified view of the storage or guarantee service levels or throttle those IOPS with quality-of-service or protect running operations with scale-out forms of high availability – what do those IOPS serve ? Features that support cloud and enterprise operations within these flash storage arrays are as or more important than sheer IOPS and certainly architecture and price are important considerations as well. In reality, flash usually does provide better performance but in the end the IOPS that flash delivers matters less than a good architecture and good storage feature and the capacity that serve to give you critical features – some of which you may already have with your traditional disk architecture.
Some flash array vendors want you to ask the other vendors ‘gotcha’ questions. This gets silly and often it turns into asking questions of low relevance to you. Also the focus on high performance and low latency is a fraction of the problem we all face – some things can be very fast and not have features that you need – yet some vendors have overly focused on performance and low latency because historically they have been very week in other areas. Here is my list of questions to ask – these are not ‘gotcha’ questions, some may relate to what you are doing and some may be less relevant.
All the flash array vendors today have managed to do quite well at offering great performance and latency numbers on their arrays. Not all of them are equal in terms of storage features.
A quick word about benchmarks. When benchmarks are offered up – it’s good know what is being demonstrated. Some things to keep in mind include knowing exactly what type of benchmarks or workloads are executed and what the measured IOPS are in terms of block sizes. Also whether the benchmarks are read-only or a mixed read/write workload. Obviously, benchmarks done by reputable benchmarking sites like thessdperformanceblog, MySQL Performance Blog, Tom’s Hardware andStorageReview.com can speak volumes about the product in question. However, increasing use of vague anecdotal evidence is used to avoid providing these sites with the hardware to test and for customers to see true comparison. Anecdotes like – Customer X saw a 10x improvement in metric Y. While interesting is often virtually meaningless without a detailed whitepaper of workload, network, hardware and software configuration details and rigorous test information. Consider that a boost in performance can often be demonstrated by simply going from 8 Gbps FC to 16 Gbps FC cards. Did customer X really see a 10x improvement because of the flash array only or was it a combination of adding memory or adding high speed networking. Anecdotes are increasingly used loosely to avoid hard benchmark results. Beware of the anecdote bomb – it is usually delivered up with the intention of avoiding giving up hard information or distracting away from discussing or seeing reputable, independent third party benchmarks sites. The anecdote bomb looks something like this “X was able to achieve 8x processing times” and then the discussion turns to discussing that specific 8x increase in processing times – there is no real comparisons of before/after configurations (including possible network upgrades). The 8x or 20x number that is provided and is intended to distract and become the number the buyer will assume they will get.
On to some questions you might consider asking.
Does the array support full redundancy with hot-swap everything ? For example, does it have two controllers ? Do the scale-out arrays support each other if a full array fails ? This type of availability translates into increased resiliency. It also translates into an ability to hot-swap all components of an array in production without downtime. Or is resilience built specifically into a cluster of nodes to avoid outages. There are different ways to handle resiliency.
How does the flash array handle failure ? What if I pull a hot component out ? How will the array behave ? If you have data corruption – you have a serious problem. This is not a science question. It’s good to know before someone accidentally pulls out the wrong component out of the wrong array in the middle of the night – what happens.
Does the array support full non-disruptive upgrades to all aspects of the array ? Anytime you need to upgrade the operating environment in your flash array – you want to do so without taking an outage. Some vendors in the very recent past actually told you they had non-disruptive upgrades, but the sordid truth is they didn’t have full non-disruptive upgrades. They actually could partially upgrade their array without taking an outage – but that’s not a full upgrade of the array and you are left with a schizophrenic array inhabited by two versions of their operating system. If you don’t have full non-disruptive upgrade capability any major upgrade will probably require taking an outage. Why put the burden of this work on your storage administrators when the array should be doing this for you. Array vendors like Nimbus Data, SolidFire, Pure Storage and HDS provide full non-disruptive upgrades.
Does the flash array’s operating environment support data reduction features ? Specifically : de-duplication, compression and thin-provisioning ? De-duplication saves you considerably in storage costs. There is a lot of effort and discussion around dedup. It literally increases the capacity of your array. Is there an add-on cost to these in-line data-reduction features ? Data reduction matters. It makes you spend less on additional flash storage. It saves you on storage, storage costs and datacenter space plus the power and cooling costs. HDS, Pure Storage, SolidFire and Nimbus offer all three forms of data reduction with their arrays.
Does the array support scale-out features ? Can you natively cluster multiple arrays to produce a single view of storage ? Some vendors like SolidFire and Kaminario offer an ability to cluster their arrays from five to a hundred nodes. You get not only a single view of your flash storage but you also get cluster redundancy included. A number of flash arrays can scale only up to four nodes, others can scale much higher. If you are building an enterprise cloud or public cloud it is important to know that you can scale to petabytes rather than in the low hundreds of terabytes. For example, some vendors can scale-out to four (and later eight) nodes – 280 TB or 560 TB versus others than can scale multiple petabytes and over 100 nodes.
Does the array natively support snapshots and clones ? Is this an add-on cost ? Consider that this feature gives you a quick and easy way to do point-in-time snapshots of your data. In virtualization you can leverage clones with VMware linked clones which provide you with conserving storage space.
Does your flash storage vendor support NFS and SMB/CIFS ? Both of these are pervasive. For example, some companies rely completely on NFS as their shared file system. Others that are Windows-centric rely on SMB. Key question – what version of these protocols ?
How is the flash array serviced ? Some arrays require sliding them out of the rack and taking the top off to service to expose components. Others can be serviced in the rack from the front or back. This makes a difference. Also how service-safe are the components ? Does the act itself of servicing the array provide a risk or has the vendor designed fool-proof serviceability into the array design.
In some environments, like cloud providers – there is a desire for quality-of-service (QoS). Does the array offer QoS ? The quality-of-service feature allows performance guarantees or limits some hosts from over-consuming IOPS. Some of these cloud providers use QoS to monetize storage IOPS guarantees. This is something SolidFire does extremely well. Consider that in virtualized environments – memory, CPU and storage space is controlled via VM/System resource management. Already, some vendors treat IOPS in the same way and this is becoming extremely important in heavily virtualized cloud settings.
How easy is it to use the array ? In other words, does the array provide an easy to use user interface to easily create, export and delete LUNS or file filesystems ? Does it provide a way to easily view the LUNS and exported file systems.
Does the array support replication, if yes, what type ? Replication is an important feature. It allows an efficient method of keeping a remote a copy of the data elsewhere. It is a strategy which underlies disaster recovery. Does the flash array in question offer replication ?
When a single or more SSD drives or a flash modules suddenly die- what happens when storage fails ? What is the protection provided to support continued operations. How fast are the rebuilds ? What mechanism is used for protection – RAID or a RAID-like technology ?
Do they have a backup and restore strategy for the product ? Does the vendor bundle in a backup and restore software or do they provide you with recommendations ? Or do they leave it totally up to the customer to figure out the backup/restore strategy.
Does the vendor provide detailed reference architectures for building out a cloud or database cluster ? Do these reference architecture provide the details of the architecture – servers used, storage used, software used, etc ? Are they collaborating with companies that create the server and operating system ?
What tools are provided to monitor, first, the array and also the cloud storage aggregated arrays into a single storage view ? Some vendors have provided nice UI and command line tools for monitoring while others have remarkably sophisticated GUIs that
What level of automation and management is provided ? In other words are there user interface and command line automation tools that allow quick ways of managing storage management. Is their serious support for leading management frameworks such as OpenStack, CloudStack, VMware and other automaton frameworks ? Support for SNMP ?
Does the array support third party (Microsoft, Oracle, etc) In-Memory Database support ? Increasingly there is support for in-memory databases that leverage both main memory and flash storage (whether PCIe cards or flash arrays). In the PCIe card world, Fusion IO has demonstrated remarkable performance by leveraging in-memory features of databases like MySQL and Microsoft SQL Server. The open question for flash array vendors increasingly is can they demonstrate a qualitative advantage at leveraging any of these in-memory database features ?
What are the performance levels of networking that are supported ?New and extremely high performance network cards have and are becoming increasingly available. For example, 16 Gbps Fiber Channel, 40 Gbps Ethernet and 56 Gbps Infiniband are all or will be available – does the vendor support these networking cards.
What support is their for virtualization standards and products. Some vendors have been extremely slow to support OpenStack – and have simply offered rudimentary support, other vendors have chosen to support OpenStack by offering cloud reference platforms based on it. Obviously there is a difference in your enthusiasm for a vendor if you have chosen to go the OpenStack road. Not only OpenStack, but also VMware is another virtualization choice where support is important. Does the vendor support VMware ? In what ways ? Do they support VMware’s storage APIs ? Some companies, like Tintri and Nutanix, have attacked the problem by offering products that provision VMs, highlight rapidly changing VMs, create highly distributed architectures with a number of features aimed at heavily virtualized environments and cloud stacks.
What support is there for cloud use cases ? If you are building a cloud it is worthwhile asking a simple question – what public and private cloud infrastructures use your flash storage hardware ? Some companies have offered support to cloud providers because they support features cloud providers need. A simple example, scale-out and Quality of Service – provide cloud providers an ability to scale-out the storage (basically aggregate the arrays storage into one large storage namespace) and then control that storage with quality-of-service which allow for providing guaranteed service levels to customers.
Is the company viable ? Is it stable ? It should go without saying, but a company’s viability needs to be taken into consideration. I’m not talking about large versus small companies but about the question of how viable the company is. There are plenty of small, innovative viable companies with unique and excellent products. The thing to watch is the stability of the execs at a company and the earnings reports if they are public. For example, if the company loses their CTO, COO and CEO within a month – it is something to be extremely concerned about. If they are focused on storage, have only a few products and are losing money on storage quarter after quarter. A concern. These are clear warning signs. Is the company in heavy debt – with investors wanting to part it out or sell it? Or are they relatively debt free ? Or are they a large company with a long history in storage with extremely sluggish growth? Is the company viable for the long term ? All these are important considerations. Worse yet, are the companies where sales people are talking about the company being in hyper-growth mode, yet the company is quietly laying off people – it indicates a deep disconnect with reality. Large lay-offs indicate a loss of corporate memory and corporate talent and often to the company bing poorly managed. When you see a company where the technical and sales people are turning over quickly – that is a red flag. It usually indicates a poorly managed company. Viability can also be represented by the vibe the company produces in the marketplace – continuous losses, mass exodus of executives, lawsuits, steep stock declines, large lay-offs, competitive pressures, etc or a somewhat boring lack of activity or is the vibe one of creativity, quiet growth, a vision of direction and a lack of bad news – all of these signal a vibe. If the vibe produced is negative, it’s worth avoiding them and looking at a company that has a better chance of being here next year.
What are the maximum storage capacity provided ? You may need more capacity. Most flash arrays may not be able to address the amount of storage capacity you may need. In that case, you may want to look at hybrid arrays like those from Tegile (which also produces flash arrays). Hybrid arrays are a combination of traditional disk storage and flash storage and provide extremely excellent performance characteristics when compared to traditional disk arrays. A recent example at a University saw a hybrid array provide between 25-40x the performance of their traditional storage while giving them much more capacity. The interesting aspect about this option is it provides both a capacity and a performance choice and one can couple these as a high performance duo (made easier if it is from the same vendor with the same OS).
Finally, does your flash storage vendor offers you a guarantee ? One such guarantee (again from Tegile’s guarantee) offers minimum levels for arrays for performance levels, pricing per RAW GB of storage, minimum levels of data reduction, minimum levels of endurance guarantees for minimum heavy-write endurance (and free controller updates) and availability (minimum downtime per year). The notion of a guarantee for your arrays is something of a game-changer in my mind. It puts the onus of not living up to performance and endurance and cost claims by the vendor squarely on the vendor and provides the buyer with a significant advantage.
Depending on what you are doing, some or all of these may be of interest to you. In the end, it is about what you are trying to do with applications, performance, storage capacity required and a host of other aspects.
[ Photo : Dragon, Shanghai Art Museum. digitalcld.com ]
Go to more posts on storage and flash storage at http://digitalcld.com/cld/category/storage.
[Updated: I’ve added a new recommendation and some new comments on the flash storage landscape.]
A lot has happened in the past year in the flash storage market. I think most flash storage array companies or the flash storage business units within large storage companies should be growing at least between 10% to 20%. The market is sufficiently hot that it has attracted a lot of customers and a fiercely competitive crowd of flash storage array companies. Today’s post is about what I would recommend if a friend asked me what flash storage array I would recommend and depending on what they were doing. These three companies offer solid offerings with solid and very high performance with excellent features and all three companies are managed extremely well. All three are growing at hyper-growth rates. In the past I have offered Key Questions To Ask Your Flash Storage Vendor Before You Buy. More recently I have highlighted scalability and aggregation of the storage as two key features in the post, Why Lego Makes Sense in Toys, Software, Servers and Storage. Increasingly, smaller, more nimble storage companies are coming out with excellent flash storage solutions.
If you are looking for flash storage – the reality is that almost all the all-flash array vendors are fast enough, in fact the fastest of the lot are actually the weakest in terms of storage/software features and that should be an important aspect of your focus. Companies slog along trying to retrofit various solutions into their architectures – basically bolting on features. Take a storage feature such as dedup . I’ve written before about how some vendors , who until recently didn’t have dedup, have written some very defensive posts against dedup. Today these vendors have been forced to recant and adopt dedup. If you don’t get the storage features you need – you can’t use them. And if you get them the wrong way – they cost more. It’s worth paying attention to the feature set in the array you intend to purchase and make sure it has what you need.
Recently, someone asked me what flash array I would recommend to a friend. Here are my top three all-flash storage companies to look at. No slight to other companies producing arrays, they all have good aspects but these three all-flash arrays excel in providing both the technology (both in hardware and software), product innovation and product stability I would want in a recommendation. Two are scale-up arrays and the other a scale-out array. These three all-flash arrays provide excellent performance, a wealth of features, consistency and best practices. In addition, they offer support for various virtualization environments such VMware, Citrix, OpenStack and Cloudstack. These arrays have a large number of companies using their solutions successfully in a number of different settings. If you are looking at an all-flash array – keep in mind questions you should be asking – see : Top Thirteen Questions to Ask Your (All) Flash Storage Array Vendor. Here are the three all-flash arrays I would recommend to someone – keep in mind that these three choices transcends speed – these arrays will provide high IOPS and low latencies – they offer a strong wealth of useful features. I have added their strengths as a side chart. One of these companies has the advantage of offering also hybrid arrays.
Of course, there are many more all-flash arrays out there, many are also very good. In my opinion, if you are looking at all-flash arrays – Solidfire and Pure Storage have stood out and Tegile has emerged to offer a highly competitive offering. It’s not an accident that in Gartner’s latest All-Flash array critical capabilities survey Solidfire and Pure Storage out-performed the rest of the field (with the exception of Kaminario which is also a very good offering) they have a wealth of features. You can read more about this in the post Gartner Releases Flash Array Critical Capabilities Study – Solidfire, Pure Storage Come In First. Increasingly these companies are causing fits to the older flash storage companies that have been either trying to regain traction. If you don’t think this is causing fits all you have to do is look at IDC’s Worldwide All-Flash Array and Hybrid Flash Array 2014-2018 1H14 Vendor Shares. Just two (Solidfire and Pure Storage) accounted for over 18% of the flash capacity shipped. And both these companies have been growing at 700% (Solidfire in 2013 and was growing at 50% growth quarter over quarter in 2014..meanwhile Pure Storage also came in at 700% in 2013 ). Tegile just recently released their all-flash arrays – they have been able to watch and avoid many of the mistakes of their competitors. EMC comes out looking good in this particular report. If you look at the Gartner Critical Capabilities Report you can see how these smaller companies are faring competitively – remarkably they are ahead of the pack. It will be interesting to see how not only Violin Memory but also NetApp staves off these newcomers – let alone battle against large companies like IBM and EMC. It is a very rough competitive landscape. Regardless, this trio of Pure Storage, Solidfire and newcomer, Tegile, are having an a huge effect on the flash storage landscape.
Agree ? Don’t agree ? Feel free to send me a question @ email@example.com.
Go to more posts on storage and flash storage at http://digitalcld.com/cld/category/storage.