Datomic : Moving Beyond the Current Generation of Relational and NoSQL Databases

It’s really seldom that you get a chance to see a different thought pattern emerge in database design that literally covers the key advantages of both relational and NoSQL databases and offers a new set of features. Datomic is an aggressive attempt to create a next-generation database that moves us beyond the usual and is built around running in current and future cloud architectures. Let’s start with the some of the features in Datomic. It provides support for ACID transactions, Joins, a sound data model and a logical query language called Datalog. Most of all it breaks with the rigidity of document-based models and relational database models. It also breaks with a model of mutability and a lack of audit data trails. To understand it is worth taking a tour of both it and it’s query language. One surprising aspect that it provides this by using engines like Couchbase, Cassandra,Riak, etc as nothing more than storage engines or services.  Let’s start with the tour  of Datomic :

By deconstructing the current generation of databases and implementing a modern database focused on breaking with designs from decades ago or some of the issues with the newer NoSQL databases (lack of Joins, lack of ACID, etc).   To understand the power of the database you have to explore the power of the datalog query language:

Large, medium and small companies using Datomic.  So you may be interested in the “why”.  In this presentation – developers of the Brazilian NuBank expose four hidden super-powers of Datomic.  In this case, the Clojure programming language is used.

and here is another use-case from Room Key ( joint project of Hilton, Choice, Hyatt, IHT, Marriot, Windom) :

 

More reading :

flying-studios


examples

 

 

Storm Persistence and Real-Time Analytics

A short while back I mentioned some of the nice work being done on Storm.  Brian Bulkowski, founder and CTO at Aerospike has done two very nice presentations that highlights some of Storm’s advantages.  In it he goes over some of the architecture associated with Storm.  He points out that it is read/write optimized for flash and provides high performance.  Used in conduction with Aerospike’s flash-optimized in-memory database – the combination provides a real performance boost.  He points out an actual customer analysis where 500,000 transactions/second were required.  Aerospike’s server requirements were dramatically lower (186 versus 14).  You can read more here :

storm105

A more up-to-date version of the talk was given recently covering Real-time analytics with Storm, Aerospike and Hadoop.  In this talk he provided more coverage of storm explaining more details about (spouts) connections with data sources, (bolts) method of analysis and data manipulations, (nimbus) a control entity that allows re-balancing in real-time and much more. He also provides a nice example (Trending Words).  He also describes running and coding in Storm.  You can see more here :

Though the audio isn’t perfect. Here is more on storm, select to go to the page:

Incidentally, in one of the presentation he mentions using the Micron P320h to deliver exceptionally high performance.

 

Recommended Use-Case : Snapdeal, India’s Largest Online Marketplace, Using Aerospike

Nice use-case write-up on Snapdeal, India’s largest online marketplace, using Aerospike (NoSQL) databases to power its services.  Snapdeal looked at MongoDB, Couchbase, Redis, Terracota Big Memory Max, Amazon’s DynoDB and Aerospike.  The chose Aerospike for a number of reasons. The rationale and results can be found in a nice use-case write-up.  Among them the in-memory Aerospike database maintained sub-millisecond latency on Amazon’s EC2 while managing 100 million objects, predictable low latency with 95-99% of transactions completing within 10 milliseconds, low maintenance, full replication across EC2 servers and more.

snapdealInteresting read.  If you want to learn more about the architecture and choices made by Snapdeal they are presenting on this topic on Wednesday, March 12th.

aerospikesnt02

 

 

Recommended Viewing : HazelCast Intro, Management Center and an Example

If you have been following HazelCast, you know it is one of the most interesting of startups. If you haven’t and you are working on big data applications or looking at in-memory grid solutions – you should look at Hazelcast.

haz03Hazelcast has a Community Edition, an Enterprise Edition and a Management Center.  Hazelcast  is an open source clustering and highly scalable data distribution platform. Hazelcast allows you to easily share and partition your application data across your cluster. Hazelcast is a peer-to-peer solution (there is no master node, every node is a peer) so there is no single point of failure.

 

Hazelcast Enterprise Edition (EE) is an extension to Community Edition. It contains extra features such as Elastic Memory and Security. Elastic memory helps businesses on storing large amounts of data with high throughput. The off-heap technology used in enterprise version, resolves the performance problems experienced handling terabytes of data. With Enterprise Edition, big data will not be a big challenge.

– from the Hazelcast web site.

JVMs running Hazelcast will dynamically cluster. Miko Matsumora has two nice videos on Hazelcast In-Memory Grid technologies. In the first one he gives a quick intro to HazelCast :

And in the second video shows how to set-up the HazelCast Management Center and walks you through it :

Finally there is a nice blog on how to get started with HazelCast that walks you through a simple example.

haz01

and a nice presentation (PDF) from Team High Calibre at San Jose State University :

haz02

 

Recommended : Nimbus Data Aims High and Delivers; Also Releases VDI Benchmark

Fast performance is one aspect – but when you couple to a suite of data reduction technologies and storage features you get something much more useful and resilient. Some of the features in the latest arrays from Nimbus Data are well thought out and absolutely great from and enterprise and cloud perspective.

Nimbus Data has really arrived. Its new Gemini arrays challenges it’s competitors in a serious way.  It has hopped over the leading flash array competitor by offering full non-disruptive upgrades coupled with full array redundancy, hot-swap-everything, in-line data reduction in the form of thin-provisioning, replication, deduplication and compression and NFS and CIFS.  The amazing thing is that those are just the tip of the iceberg.  A deep dive video really reveals an excellent design and some surprisingly great advancements to flash array technology in general :

It has also demonstrated something that many other leading flash vendors have not been able to do.  It leverages eight 16 Gbs FC ports in its Nimbus Gemini arrays. It also offers two hot-swappable controllers. They have advanced the multi-protocol capability of the product by offering the ability to run 40 Gb ethernet and Infiniband at the the same time or alternatively ethernet and fiber channel at the same time. They have adapters that can run at 10 Gb ethernet.  The controllers parallelize the IO across all 24 flash drives.  The modules can be removed from the front – a great design for removing the flash modules (most excellent – no removing the array out of the rack, taking the top off and potentially having servicing dilemmas like some vendors).

gemini2chart

In a new benchmark they demonstrated the strength of the new arrays at handling VDI.  The benchmark was run with a Nimbus Gemini dual-controller 2U F400 all-flash array with 24 TB of raw capacity.

Data Point : The single array had 17.6 TB usable capacity for the test, and featured 24 one terabyte solid-state disks and a 4 TB cache with write-back caching.  A single Nimbus Gemini F400 can support more than 4,000 simultaneous VDI users at less than $40 per desktop.

You can read the full report :

iovdimark

The focus on a unified array operating system (more on this in a future post) that offers a full range of storage features that I have written about in earlier posts is an important aspect of the new arrays.

 


gotostorageGo to more posts on storage and flash storage at http://digitalcld.com/cld/category/storage.


 

Recommended Reading : Beware of the Giraffes in your Data

This is a nice article which discusses how we approach data and we should question the obvious.  Sometimes the obvious can lead us to adopt the wrong strategies.  The obvious or the portions of data that dominate can fool us into making seriously flawed decisions. This article provides insight into this aspect.  The article provides some nice examples of this.

gigaom