Datomic : Moving Beyond the Current Generation of Relational and NoSQL Databases

It’s really seldom that you get a chance to see a different thought pattern emerge in database design that literally covers the key advantages of both relational and NoSQL databases and offers a new set of features. Datomic is an aggressive attempt to create a next-generation database that moves us beyond the usual and is built around running in current and future cloud architectures. Let’s start with the some of the features in Datomic. It provides support for ACID transactions, Joins, a sound data model and a logical query language called Datalog. Most of all it breaks with the rigidity of document-based models and relational database models. It also breaks with a model of mutability and a lack of audit data trails. To understand it is worth taking a tour of both it and it’s query language. One surprising aspect that it provides this by using engines like Couchbase, Cassandra,Riak, etc as nothing more than storage engines or services.  Let’s start with the tour  of Datomic :

By deconstructing the current generation of databases and implementing a modern database focused on breaking with designs from decades ago or some of the issues with the newer NoSQL databases (lack of Joins, lack of ACID, etc).   To understand the power of the database you have to explore the power of the datalog query language:

Large, medium and small companies using Datomic.  So you may be interested in the “why”.  In this presentation – developers of the Brazilian NuBank expose four hidden super-powers of Datomic.  In this case, the Clojure programming language is used.

and here is another use-case from Room Key ( joint project of Hilton, Choice, Hyatt, IHT, Marriot, Windom) :

 

More reading :

flying-studios


examples

 

 

Architecture : SSD-Based Solutions Show Advantages In the NoSQL DB Tier (Video)

Today we look at the NoSQL database tier.  Some of this is taken from notes from a work-in-progressAn Introduction to Using High-Performance Flash-Based Storage in Constructing High Volume Transaction Architectures – A Manager’s Guide to Selecting Flash Storage.  This is not a complete look at Big Data, rather a partial look at some of the things Aerospike, one of the more interesting NoSQL databases, is doing. 

Aerospike and the NoSQL Database Tier.  An alternative or in addition to the relational database tier, there is a NoSQL database tier. With the arrival in recent years of Big Data architectures, new elements of a new architecture for dealing with both structured and unstructured data has arrived and with it some databases, like Aerospike, offer an extreme high performance solution in transaction-oriented environments.  Quite a bit different from typical Hadoop implementations as one of Aerospike’s real differentiators is that Aerospike was built as an in-memory database. Traditionally, in the past, this tier we have seen a number of spinning disks.  However, in the past few years, especially with the need for real-time information there has been a move to SSDs and PCIe-based flash cards.  Using Aerospike’s NoSQL database provides a means to get those high performance results. It is built to be run in-memory or in-flash. A partial glimpse into an architecture.  It is built to run on relatively low cost clustered hardware with either lots of memory and/or flash storage.  It supports ACID properties and as a NoSQL database also leverages a key-value store. If we look at an example in this tier – you can see the an example architecture where various transactions are occurring within applications and Aerospike interacts with these. It should be noted that with App tier, Aerospike uses a Smart Client to communicate to the Aerospike cluster.

nosqlarch

Of course, the producing/consuming sources may vary dramatically – from applications, web services, hadoop clusters, mobile devices, weblogs, marketing data repositories and many more.   Aerospike  is a best-of-breed of the NoSQL databases. You can see an example of a typical deployment is (from the Aerospike presentation below) :

aerospike100

And some of the Aerospike server deployments :

aerospike101

Aerospike offers support for the ACID standard and support for a high performance, clustered architecture.

aero_ssd

Of course, there are other databases such as MongoDB, Cassandra and HBase to name a few. You may choose  to use NoSQL database over relational databases. It depends wholly on what you are doing. The NoSQL database tier’s storage on these servers can use SSDs, flash PCIe cards and flash arrays.  Traditionally this tier has adopted a “share-nothing” philosophy using traditional spinning disks, SSDs or flash PCIe cards. Up to recently, flash arrays seemed like not only over-kill but also seemingly moving against the grain of the “share-nothing” philosophy.  SSDs and cards, like Micron’s P320h offer excellent performance and offer a price/performance advantage over arrays.  As prices of flash drops flash arrays are becoming a consideration in this tier and there are a number of recent deployments leveraging flash arrays for the NoSQL DB tier.  Recently, Aerospike tested Micron’s P320h (SLC SSD) PCIe card.  It “blew away the competition” according to people doing the testing.  You can read more here:

aero_ssd

More information on the P320h :

tomshwp320h

It should be noted that there are two versions of this from Micron. Micron offers a 2.5″ Flash PCIe form-factor which is hot-swappable.  You can read more here :

digimicron

It should be noted that competitors are not standing still and Virident, Fusion-IO and others have and are coming out with new cards that are worth looking at.

To understand what Aerospike is doing it is worth watching this video :

Aerospike_video

If you want to learn more, it is worth visiting Aerospike’s BrightTalk site

brighttalk_aerospike

 


gotostorageGo to more posts on storage and flash storage at http://digitalcld.com/cld/category/storage.


 

Recommended Viewing : The Netflix Cloud and Cassandra

Netflix is doing some amazing things. If you have the service, you know they are dependent on Amazon Web Services but their cloud practices transcend that dependency.  Adrian Cockroft has delivered some really excellent talks explaining how they do what they do.

and also a nice talk on how they moved to Cassandra to do a lot of the heavy lifting.

and he provided another very nice presentation at the Cassandra conference C*2012 about running Cassandra on AWS.

Interesting in the two Cassandra talks he discusses use of SSDs to improve Cassandra performance. He talks about moving from 2 drives (1.7 TB) to 2 SSD volumes (2 TB).  He shows results from a hard disk versus SSD comparison.  Netflix is offering a number of Cassandra-related software as open source, such as Priam (for Cassandra automation), Astyanax (client, front-end into Cassandra) and more (like Aegisthus, Zeno, Chaos Monkey, Zuul, Pythias, etc).  Note that AppDynamics is used throughout these presentations.  One other project I’m aware of is a non-JVM way of getting to the recipes in Astyanax is STAASH.  You can follow all of this on the Netflix technical blog.

 

Also a post that may be of interest : Some Thoughts on Why We Want To Run Databases on Flash