Datomic : Moving Beyond the Current Generation of Relational and NoSQL Databases

It’s really seldom that you get a chance to see a different thought pattern emerge in database design that literally covers the key advantages of both relational and NoSQL databases and offers a new set of features. Datomic is an aggressive attempt to create a next-generation database that moves us beyond the usual and is built around running in current and future cloud architectures. Let’s start with the some of the features in Datomic. It provides support for ACID transactions, Joins, a sound data model and a logical query language called Datalog. Most of all it breaks with the rigidity of document-based models and relational database models. It also breaks with a model of mutability and a lack of audit data trails. To understand it is worth taking a tour of both it and it’s query language. One surprising aspect that it provides this by using engines like Couchbase, Cassandra,Riak, etc as nothing more than storage engines or services.  Let’s start with the tour  of Datomic :

By deconstructing the current generation of databases and implementing a modern database focused on breaking with designs from decades ago or some of the issues with the newer NoSQL databases (lack of Joins, lack of ACID, etc).   To understand the power of the database you have to explore the power of the datalog query language:

Large, medium and small companies using Datomic.  So you may be interested in the “why”.  In this presentation – developers of the Brazilian NuBank expose four hidden super-powers of Datomic.  In this case, the Clojure programming language is used.

and here is another use-case from Room Key ( joint project of Hilton, Choice, Hyatt, IHT, Marriot, Windom) :


More reading :





Meeting: What’s New in Solr 5 Security & Solr Custom Collector: The Anti-Score

The arrival of Apache Solr 5 has brought with it a number of features. In these talks you will see some of the advantages of using Solr as your search engine.

Solr Meetup on Tuesday, August 11th in downtown Seattle discussing two topics on Solr :

  • What’s new in Solr 5 Security. Presented by Anshum Gupta, Lucidworks. Apache Solr has evolved into a highly scalable system, capable of handling a lot of data and high number of queries, but only recently was a mechanism to secure access in Solr provided. Apache Solr 5.2 shipped with pluggable authentication and authorization modules. These modules enable users to write their own plugins for managing security in Solr.

    This talk will cover an overview of both the authentication and authorization frameworks, and how they work together within Solr. It will also provide an overview of existing plugins and how to enable them to restrict user access to resources within Solr.

  • Solr Custom Collector: The Anti-Score. Presented by Michael Kosten, Getty Images. Sometimes, you don’t want to return just the top scoring documents as your search results.  If you have an eCommerce site, you may want to ensure that multiple lines of business are represented. If you incorporate customer interaction in your score, you may want to ensure that newer documents or certain categories are still represented and that your results don’t become stale. This requirement could be handled in middleware that post processes the search results, by requesting extra rows and rearranging them or by interleaving multiple queries. A better solution is to implement your own custom collector in Solr, so that search results can be arranged in any order. Michael will demonstrate a solution that returns top scoring documents, but grouped within categories. For example, a search for books could interweave the best fiction and non-fiction in a single query result. He will also demonstrate how to implement a custom priority queue to reduce memory requirements if there are many categories, and how the custom collector can be integrated into Solr without modifying the base distribution.

How to register for the talk : http://www.meetup.com/Seattle-Solr-Lucene-Meetup/events/223899316/

Climbing Over the Walls You Have Built : Extending Your Corporate Network to the Cloud (Part 1)

The move to the cloud is on.  Increasingly, even companies that are mandated to comply with various corporate and national privacy and security standards, such as HIPAA, are also looking at ways they can extend their company networks to include auto-scaling clouds while at the same time abiding by those security standards.  With the availability of sophisticated cloud diaglayersproviders such as AWS, Azure, Joyent and others it is increasingly attractive for companies to figure out ways to leverage these cloud providers and burst out of the corporate networks and transparently use these clouds.

In thinking about this, it becomes of interest to figure out the “how-to” of doing this. A number of cloud providers continue to work on being able to stretch on-premise with cloud. We can look at what Microsoft Azure has been doing to figure out why companies are looking at a merge of on-premises datacenter and cloud. They have provided an example of how a company could extend their on-premises datacenter with the Azure Cloud.  In their informational diagram it is possible to turn on and off information bits within the the datacenter/cloud architecture. They allow for turning on and off information layers as can be seen from the figure above. You can see the example in the image below.


The above cloud represents an example of how companies can now merge the Azure cloud and their company network securely. You can find more details on this at the link below.


There are all sorts of challenges but companies like Microsoft are increasingly delivering ways to securely extend corporate networks into auto-scaling clouds.

Another company that allows bursting to the cloud is Cloudian.  Their focus on providing an enterprise hybrid cloud allows corporate networks to connect safely with clouds. In Cloudian’s case, their product, HyperStore, combined with the Amazon cloud, allows for a next-generation hybrid IT cloud. The Cloudian/Amazon combination allows a 100 percent S3-compliant hybrid cloud storage platform. Dynamic migration from on-premises physical storage to off-premises cloud storage allows near infinite capacity scaling to meet the security and cost requirements of enterprise environments. Service providers who provide multi-SLA storage services are also benefited by this hybrid structure. You can read more about it :


In the next  extending-into-the-cloud post, we will look at extending Microsoft SQL Server into the cloud.  SQL Server 2016, when it arrives,  it will encrypt all data by default, and is integrated with the R statistical programming language. More interestingly it allows a stretch into the Azure cloud.  More on this in the next post.  In the post that follows we will also discuss HIPAA cloud providers and whether they can remain relevant in the face of substantial improvement in merging the on-premises networks with clouds.

Violin Memory Fixes Its Deduplication Problem : Take 2

Sometime ago I discussed deduplication and the state of it within the flash array community in a post, Deduplication Fears From Those That Don’t Offer It, that was primarily prompted by a series of negative posts by two CTOs at Violin Memory and a tweet that was somewhat startling :

Both inline dedup and dedup offer substantial benefits. There are applications where they may offer minimal advantages but on the whole they are a net positive. And in many applications such as VDI, virtualization, etc – they are simply very, very useful.  And yes one can find edge-cases where dedup may not give you anything except overhead. You can read my original post.  The tweet was not simply inflammatory by suggesting that the benefits of inline dedup were not existent, but it also suggested that the entire rest of the flash array community was wrong to suggest otherwise.

An announcement discussing a Violin Memory and Symantec collaboration that announced dedup was coming took place in August 13, 2012. What transpired in the intervening timeframe was the above dissonant posts and tweets and finally and relatively recently, Violin Memory introduced a Microsoft-centric Windows Flash Array running a Windows Server OS which included dedup and a Concerto 2200 appliance which also offered dedup and could work in concert with their arrays. However, up to last week’s launch, their mainline array operating system lacked dedup and generally inline data reduction found in competitive systems.

In my original post I offered a preference :   “Nimbus Data’s Halo approach to de-duplication seems to me to be very reasonable – it allows you to turn de-duplication on and off on a LUN/filesystem basis. In any case, at the very least, they should offer their customers the opportunity to use deduplication on good-fit use-cases by including it in their array’s operating system.”  I liked and still do like the idea of being able to turn off dedup if you feel you don’t need it, however vendors like Tegile, Pure Storage and Solidfire routinely  prove to me that having it always turned on works just fine across the gamut of applications running on their platforms that span databases to virtualization.  Regardless it is nice (in my mind) to be able to turn a feature on and off if you feel you need to.

In this week’s launch, which in my mind, was primarily about erasing this competitive deficiency of not having dedup and generally inline data reduction – they launched a rebranded version of their OS which now offers ‘granular’ inline block deduplication and ‘granular’ inline block compression.  And the tune has changed dramatically with this tweet.

Basically offering a form of inline data reduction that offers the option to turn these things off if you want – really very good news  – they are no longer minus this key enterprise data reduction feature-set. Congratulations. This is a milestone that allows Violin Memory to more efficiently expand the raw storage.  Check out their website to learn more.

Recommended Reading: Symantec/Intel Architecture Compares to Flash Array Architecture for Oracle Databases

It certainly is worth looking at the latest white paper from Symantec and Intel.  They have dropped a small bomb on the flash array party. In a white paper, Software-defined Storage at the Speed of Flash, the duo provide a look at a nice Oracle database architecture where they show both price/performance advantages and comparable performance to flash arrays from Violin Memory, EMC and a CISCO solution.  Two Intel R1 208WTTGS 1RU servers were outfitted with four Intel P3700 Series SSDs, 128 GB DDR4 Memory, Symantec Storage Foundation Cluster File System 6.2, Oracle 11gR2 and Red Hat Enterprise 6.5 OS. The two servers are interconnected with high speed dual-port Intel Ethernet Converged Network Adapter. The white paper goes into quite a bit of detail and offers a nice chart comparing the converged solution with the flash arrays solutions.screenshot_475



Recommended Reading : Key Questions To Ask Your Flash Storage Array Vendor Before You Buy [Update 3]

Update 3 : I have added two new aspects to consider – the notion of the guarantee of the array you buy and the aspect of capacity.

Update 2 : If you are interested in my recommendations for All-Flash Arrays please read my most recent post on the new class of flash arrays that fully support deduplication and a full range of storage features, Recommendations for All-Flash Storage Arrays; Aiming Beyond Simply IOPS and Low Latency. But make sure to read this as well as it provides a structure for helping you to determine how to go about choosing an all-flash array.

Update 1.  If you are grappling with what questions to ask your SSD or flash storage array vendor who says his last customer saw a 10x increase in performance but can’t really give you a nice write-up of the hardware and software configuration – consider that maybe the 10x increase wasn’t only the flash in the array.  Interestingly if everything on the market represents “next generation” “cloud” or “enterprise” infrastructure and software then the words “next generation” kind of lose their meaning. Here are a few candidate questions I have assembled to consider when talking to flash array vendors. I am continuously adding other questions that should be asked of flash storage array vendor. In this post I have added some more – Replication, Backup and Restore software, Monitoring, Automation and Data (Protection) and Company Viability.  In the past, I aimed at the heart of the matter in another post, Today’s IOPS Matter Less Than A Good Architecture and Storage Features. In today’s post, I am aiming at providing for anyone looking at purchasing a new flash storage array with some questions that might cause tachycardia among sales folks – and perhaps even worse among marketing folks that often do their best to hide a product’s warts.  These are part of the notes from a  long-running, work-in-progressAn Introduction to Using High-Performance Flash-Based Storage in Constructing High Volume Transaction Architectures – A Manager’s Guide to Selecting Flash Storage – which will be coming out in the near future. This post is only part of the story – one of the architectural choices today being made today are whether server-local flash or networked array-shared flash will be chosen. Architects have and today make both architectures work.  And the choices are becoming more interesting because the SSD and PCIe Flash vendors are providing increasingly higher capacities and in some cases their solutions are hot-swappable.  It is possible to choose, for example, Fusion IO’s PCIe flash card solutions to provide in-memory databases with extremely high performance.  Obviously, even though networks are getting faster – not traversing a network provides advantages.  We will discuss these aspects in a future post.

chinesedragonIf you have a fire-breathing dragon of a flash array that can deliver millions of IOPS  but you can’t leverage the features you need to increase the storage capacity via data reduction  (deduplication, compression and thin provisioning), upgrade the array while it’s running live in production, or you can’t easily replicate the data on it, or can’t scale-out twenty or so arrays into one unified view of the storage or guarantee service levels or throttle those IOPS with quality-of-service or protect running operations with scale-out forms of high availability  – what do those IOPS serve ?  Features that support cloud and enterprise operations within these flash storage arrays are as or more important than sheer IOPS and certainly architecture and price are important considerations as well. In reality, flash usually does provide better performance but in the end the IOPS that flash delivers matters less  than a good architecture and good storage feature and the capacity that serve to give you critical features – some of which you may already have with your traditional disk architecture.

Some flash array vendors  want you to ask the other vendors ‘gotcha’ questions. This gets silly and often it turns into asking questions of low relevance to you.  Also the focus on high performance and low latency is a fraction of the problem we all face – some things can be very fast and not have features that you need – yet some vendors have overly focused on performance and low latency because historically they have been very week in other areas.  Here is my list of questions to ask – these are not ‘gotcha’ questions, some may relate to what you are doing and some may be less relevant.

All the flash array vendors today have managed to do quite well at offering great performance and latency numbers on their arrays. Not all of them are equal in terms of storage features.  

A quick word about benchmarks.  When benchmarks are offered up – it’s good know what is being demonstrated. Some things to keep in mind include knowing exactly what type of  benchmarks or workloads are executed and what the measured IOPS are in terms of  block sizes.  Also whether the benchmarks are read-only or a mixed read/write workload. Obviously, benchmarks done by reputable benchmarking sites like thessdperformanceblog, MySQL Performance Blog,  Tom’s Hardware and StorageReview.com can speak volumes about the product in question. However, increasing use of vague anecdotal evidence is used to avoid providing these sites with the hardware to test and for customers to see true comparison.  Anecdotes like – Customer X saw a 10x improvement in metric Y.  While interesting is often virtually meaningless without a detailed whitepaper of workload, network, hardware and software configuration details and rigorous test information. Consider that a boost in performance can often be demonstrated by simply going from 8 Gbps FC to 16 Gbps FC cards. Did customer X really see a 10x improvement because of the flash array only or was it a combination of adding memory or adding high speed networking. Anecdotes are increasingly used loosely to avoid hard benchmark results. Beware of the anecdote bomb – it is usually delivered up with the  intention of avoiding giving up hard information or distracting away from discussing or seeing reputable, independent third party benchmarks sites.  The anecdote bomb looks something like this “Customer X was able to achieve 8x performance times over what they had” and then the discussion turns to discussing that specific 8x increase in processing times – there is no real comparisons of before/after configurations (including possible network upgrades).  The 8x or 20x number that is provided and is intended to distract and become the number the buyer will assume they will get.

On to some questions you might consider asking.

red1Does the array support full redundancy with hot-swap everything ?  For example, does it have two controllers ?  Do the scale-out arrays support each other if a full array fails ?  This type of availability translates into increased resiliency. It also translates into an ability to hot-swap all components of an array in production without downtime. Or is resilience built specifically into a cluster of nodes to avoid outages.  There are different ways to handle resiliency.

red2How does the flash array handle failure ?  What if I pull a hot component out ?  How will the array behave ?  If you have data corruption – you have a serious problem.  This is not a science question.  It’s good to know before someone accidentally pulls out the wrong component out of the wrong array in the middle of the night – what happens.

Does the array support full non-disruptive upgrades to all aspects of the array ?  Anytime you need to upgrade the operating environment in your red3flash array – you want to do so without taking an outage.  Some vendors in the very recent past  actually told you they had non-disruptive upgrades, but the sordid truth is they didn’t have full non-disruptive upgrades. They actually could partially upgrade their array without taking an outage – but that’s not a full upgrade of the array and you are left with a schizophrenic array inhabited by two versions of their operating system.  If you don’t have full non-disruptive upgrade capability any major upgrade will probably require taking an outage. Why put the burden of this work on your storage administrators when the array should be doing this for you.   Array vendors like Nimbus Data, SolidFire, Pure Storage and HDS provide full non-disruptive upgrades.

Does the flash array’s operating environment support data reduction red4features ? Specifically  : de-duplication, compression and thin-provisioning ?  De-duplication saves you considerably in storage costs. There is a lot of effort and discussion around dedup.  It literally increases the capacity of your array.    Is there an add-on cost to these in-line data-reduction features ? Data reduction matters. It makes you spend less on additional flash storage.  It saves you on storage, storage costs and  datacenter space plus the power and cooling costs.  HDS, Pure Storage, SolidFire and Nimbus offer all three forms of data reduction with their arrays. chiclet5 chiclet3 chiclet2 chiclet

Does the array support scale-out features ?  Can you natively cluster red5multiple arrays to produce a single view of storage ? Some vendors like SolidFire and Kaminario offer an ability to cluster their arrays from five to a hundred nodes. You get not only a single view of your flash storage but you also get cluster redundancy included. A number of flash arrays can scale only up to four nodes, others can scale much higher. If you are building an enterprise cloud or public cloud it is important to know that you can scale to petabytes rather than in the low hundreds of terabytes.  For example, some vendors can scale-out to four (and later eight) nodes – 280 TB or 560 TB versus others than can scale multiple petabytes and over 100 nodes.

red6Does the array natively support snapshots and clones ?   Is this an add-on cost ?  Consider that this feature gives you a quick and easy way to do point-in-time snapshots of your data.  In virtualization you can leverage clones with VMware linked clones which provide you with conserving storage space.

red7Does your flash storage vendor support NFS and SMB/CIFS ?  Both of these are pervasive.  For example, some companies rely completely on NFS as their shared file system.  Others that are Windows-centric rely on SMB. Key question – what version of these protocols ?

How is the flash array serviced ?  Some arrays require sliding them out of red8the rack and taking the top off to service to expose components.  Others can be serviced in the rack from the front or back.  This makes a difference.  Also how service-safe are the components ?  Does the act itself of servicing the array provide a risk or has the vendor designed fool-proof serviceability into the array design.


In some environments, like cloud providers – there is a desire for quality-of-service (QoS). Does the array offer QoS ? The quality-of-service feature allows performance guarantees or limits some hosts from over-consuming IOPS. Some of these cloud providers use QoS to monetize storage IOPS guarantees.  This is something SolidFire does extremely well. Consider that in virtualized environments – memory, CPU and storage space is controlled via VM/System resource management. Already, some vendors treat IOPS in the same way and this is becoming extremely important in heavily virtualized cloud settings. chiclet6 chiclet6


How easy is it to use the array ?   In other words, does the array provide an easy to use user interface to easily create, export and delete LUNS or file filesystems ? Does it provide a way to easily view the LUNS and exported file systems.


red11Does the array support replication, if yes, what type ? Replication is an important feature.  It allows an efficient method of keeping a remote a copy of the data elsewhere. It is a strategy which underlies disaster recovery.  Does the flash array in question offer replication ?


red12When a single or more SSD drives or a flash modules suddenly die- what happens when storage fails ?  What is the protection provided to support continued operations. How fast are the rebuilds ?  What mechanism is used for protection – RAID or a RAID-like technology ?


red13Do they have a backup and restore strategy for the product ?  Does the vendor bundle in a backup and restore software or do they provide you with recommendations ?  Or do they leave it totally up to the customer to figure out the backup/restore strategy.


red14Does the vendor provide detailed reference architectures for building out a cloud or database cluster ?  Do these reference architecture provide the details of the architecture – servers used, storage used, software used, etc ?  Are they collaborating with companies that create the server and operating system ?

red15What tools are provided to monitor, first, the array and also the cloud storage aggregated arrays into a single storage view ?  Some vendors have provided nice UI and command line tools for monitoring while others have remarkably sophisticated GUIs that


red16 What level of automation and management is provided ? In other words are there user interface and command line automation tools that allow quick ways of managing storage management.  Is their serious support for leading management frameworks such as OpenStack, CloudStack, VMware and other automaton frameworks ?  Support for SNMP ?

red17Does the array support third party (Microsoft, Oracle, etc) In-Memory Database support ?  Increasingly there is support for in-memory databases that leverage both main memory and flash storage (whether PCIe cards or flash arrays). In the PCIe card world, Fusion IO has demonstrated remarkable performance by leveraging in-memory features of databases like MySQL and Microsoft SQL Server. The open question for flash array vendors increasingly is can they demonstrate a qualitative advantage at leveraging any of these in-memory database features ?


red18What are the performance levels of networking that are supported ?New and extremely high performance network cards have and are becoming increasingly available.  For example, 16 Gbps Fiber Channel, 40 Gbps Ethernet and 56 Gbps Infiniband are all  or will be available – does the vendor support these networking cards.


red19What support is their for virtualization standards and products.  Some vendors have been extremely slow to support OpenStack – and have simply offered rudimentary support, other vendors have chosen to support OpenStack by offering cloud reference platforms based on it. Obviously there is a difference in your enthusiasm for a vendor if you have chosen to go the OpenStack road. Not only OpenStack, but also VMware is another virtualization choice where support is important.  Does the vendor support VMware ? In what ways ?  Do they support VMware’s storage APIs ?  Some companies, like Tintri and Nutanix, have attacked the problem by offering products that provision VMs, highlight rapidly changing VMs, create highly distributed architectures with a number of features aimed at heavily virtualized environments and cloud stacks.

What support is there for cloud use cases ? If you are building a cloud it is worthwhile asking a simple question – what public and private cloud infrastructures use your red20flash storage hardware ?  Some companies have offered support to cloud providers because they support features cloud providers need.  A simple example, scale-out and Quality of Service – provide cloud providers an ability to scale-out the storage (basically aggregate the arrays storage into one large storage namespace) and then control that storage with quality-of-service which allow for providing guaranteed service levels to customers.

Is the company viable ? Is it stable ? It should go without saying, but a company’s viability needs to be taken into consideration.  I’m not talking red21about large versus small companies but about the question of how viable the company is.  There are plenty of small, innovative viable companies with unique and excellent products.  The thing to watch is the stability of the execs at a company and the earnings reports if they are public.  For example, if the company loses their CTO, COO and CEO within a month – it is something to be extremely concerned about.  If they are focused on storage, have only a few products and are losing money on storage quarter after quarter.  A concern. These are clear warning signs. Is the company in heavy debt – with investors wanting to part it out or sell it? Or are they relatively debt free ? Or are they a large company with a long history in storage with extremely sluggish growth? Is the company viable for the long term ? All these are important considerations.  Worse yet, are the companies where sales people are talking about the company being in hyper-growth mode, yet the company is quietly laying off people – it indicates a deep disconnect with reality. Large lay-offs indicate a loss of corporate memory and corporate talent and often to the company bing  poorly managed. When you see a company where the technical and sales people are turning over quickly – that is a red flag. It usually indicates a poorly managed company. Viability can also be represented by the vibe the company produces in the marketplace – continuous losses, mass exodus of executives, lawsuits, steep stock declines, large lay-offs, competitive pressures, etc or a somewhat boring lack of activity or is the vibe one of creativity, quiet growth, a vision of direction and a lack of bad news – all of these signal a vibe.  If the vibe produced is negative, it’s worth avoiding them and looking at a company that has a better chance of being here next year.

capacityWhat are the maximum storage capacity provided ? You may need more capacity. Most flash arrays may not be able to address the amount of storage capacity you may need.  In that case, you may want to look at hybrid arrays like those from Tegile (which also produces flash arrays).  Hybrid arrays are a combination of traditional disk storage and flash storage and provide extremely excellent performance characteristics when compared to traditional disk arrays. A recent example at a University saw a hybrid array provide between 25-40x the performance of their traditional storage while giving them much more capacity. The interesting aspect about this option is it provides both a capacity and a performance choice and one can couple these as a high performance duo (made easier if it is from the same vendor with the same OS).

guaranteeFinally, does your flash storage vendor offers you a guarantee ?  One such guarantee (again from Tegile’s guarantee) offers minimum levels for arrays for performance levels, pricing per RAW GB of storage, minimum levels of data reduction, minimum levels of endurance guarantees for minimum heavy-write endurance (and free controller updates) and availability (minimum downtime per year).  The notion of a guarantee for your arrays is something of a game-changer in my mind. It puts the onus of not living up to performance and endurance and cost claims by the vendor squarely on the vendor and provides the buyer with a significant advantage.


Depending on what you are doing, some or all of these may be of interest to you. In the end, it is about what you are trying to do with applications, performance, storage capacity required and a host of other aspects.

[ Photo : Dragon, Shanghai Art Museum. digitalcld.com  ]

gotostorageGo to more posts on storage and flash storage at http://digitalcld.com/cld/category/storage.


Recommended Reading : Reference Architecture for Oracle on Cisco UCS Server and Tegile

There is a nice reference architecture white paper which provides a lot of details into setting up an architecture that supports Oracle Database deployment and operational details on infrastructure consisting of Cisco UCS servers and Tegile Hybrid Arrays. Select the item below to see it.

screenshot_414In addition, there are a number of new flash arrays – the following brief discusses Oracle and these new arrays :



Recommended Reading : Microservices Architectures and Clojure (A Quick Survey of Resources)

Increasingly people are describing micro services as small services with specific functions to create a larger application – they often have intelligence in the endpoints, automatation in deployment and lack of dependency on any specific language. There is a very nice description of this approach written by James Lewis and Martin Fowler. You can read more of it here :

A counter-position or cautionary position was recently written by Benjamin Wooton which is also worth reading.


And in InfoQ, Chris Richardson offers some details on decomposing applications into micoservices.


In the Clojure community there are some examples of how this is being used. Let’s start with three recent articles (the first one from a Ruby shop experimenting with Clojure) :




and the following description one developer’s views on a Microservice Clojure stack.


In addition, here is a way to integrate Unix (micro)services with Clojure into all of this :