Datomic : Moving Beyond the Current Generation of Relational and NoSQL Databases

It’s really seldom that you get a chance to see a different thought pattern emerge in database design that literally covers the key advantages of both relational and NoSQL databases and offers a new set of features. Datomic is an aggressive attempt to create a next-generation database that moves us beyond the usual and is built around running in current and future cloud architectures. Let’s start with the some of the features in Datomic. It provides support for ACID transactions, Joins, a sound data model and a logical query language called Datalog. Most of all it breaks with the rigidity of document-based models and relational database models. It also breaks with a model of mutability and a lack of audit data trails. To understand it is worth taking a tour of both it and it’s query language. One surprising aspect that it provides this by using engines like Couchbase, Cassandra,Riak, etc as nothing more than storage engines or services.  Let’s start with the tour  of Datomic :

By deconstructing the current generation of databases and implementing a modern database focused on breaking with designs from decades ago or some of the issues with the newer NoSQL databases (lack of Joins, lack of ACID, etc).   To understand the power of the database you have to explore the power of the datalog query language:

Large, medium and small companies using Datomic.  So you may be interested in the “why”.  In this presentation – developers of the Brazilian NuBank expose four hidden super-powers of Datomic.  In this case, the Clojure programming language is used.

and here is another use-case from Room Key ( joint project of Hilton, Choice, Hyatt, IHT, Marriot, Windom) :

 

More reading :

flying-studios


examples

 

 

Recommended Learning Exercise : Simulating A Parking Garage in Clojure

Here is a nice example of creating software that simulates a parking garage. Two different examples. If you are learning Clojure this might be of interest to you.  In the first example, Clojure Refs are used. The problem is to simulate operations on a garage used for parking vehicles – vehicles come into the parking garage, park and then later leave.  They are identified by their license plate number.  This is a nice example for those looking for Clojure examples.

parkgarage1

In the second example, the same parking garage problem from the previous post is solved using  clojure.spec . See more on the clojure.spec rationale and the  guide.  Here is the second example which uses Clojure.spec.

garagespec2

 

 

Recommended Reading : An Archaeology-Inspired Database

Yoav Rubin shows how a change in a common perspective affects the design and archdb2implementation of a well-studied type of software: a database.  In this excellent examination of how a shift in perspective changes everything, Yoav provides us with a rich examination of what we thought we knew but it turns out that perhaps there are better ways to approach databases. “Database systems are designed to store and query data. This is something that all information workers do; however, the systems themselves were designed by computer scientists. As a result, modern database systems are highly influenced by computer scientists’ definition of what data is, and what can be done with it.

For example, most modern databases implement updates by overwriting old data in-place instead of appending the new data and keeping the old. This mechanism, nicknamed “place-oriented programming” by Rich Hickey, saves storage space but makes it impossible to retrieve the entire history of a particular record. This design decision reflects the computer scientist’s perspective that “history” is less important than the price of its storage.”

As Yoav puts it succintly -If you were to instead ask an archaeologist where the old data can be found, the answer would be “hopefully, it’s just buried underneath”.

Leveraging the richness of the Clojure language and 360 lines of code – we have an “archeology-inspired” database. To read the article, select:

archdb

Samsung and Seagate Unleash Killer SSDs

While some of the flash array vendors struggle to even get 3D NAND into their arrays, Samsung and Seagate have unleashed next generation SSDs that simply topples the scales of measurement of these devices.  Consider that some flash array vendors offer storage capacity of 35 TB in 3 rack units – Samsung is offering a 32 TB 2.5″ SSD and you get a feel for the magnitude of what is happening in this segment.  Of course there are other aspects to be considered but it is a stunning development that offers a number of happy alternatives to storage engineers.

Samsung put their 32 TB SSD into a 2.5″ form factor while Seagate’s is in a 3.5″ form factor.

You can read more on Samsung’s SSD here and here.

You can read more on Seagate’s SSD here.

NetApp Buys SolidFire And With It a Solid All-Flash Storage Advantage

In a huge move, NetApp has announced it’s intention to buy Solidfire.  This acquisition bodes well for NetApp.  It indicates an understanding of the market they are transitioning into based on the mistakes they have made.  Having erred in trying to build it themselves they are now purchasing one of the most sophisticated and scalable flash storage array companies available. It has been my favorite company because they have done so many things correctly – from the management of the company to the technology they have built into their all-flash arrays. My hope is that they leverage SolidFire’s strong flash storage platform and not worry too much about cannibalizing their existing products.   SolidFire offers an excellent scale-out architecture that is without doubt ahead of all the other vendors when it comes to providing cloud storage features for cloud storage providers and entesfhigh2rprise cloud deployments.  They have won over a large number of cloud providers.  It is not a surprise – they have built in a number of critical scale-out and storage features which I have previously reviewed.  Solidfire scale-out arrays scale up past 100 nodes to provide a highly available view of storage with quality-of-service controls and all the usual suspects of data reduction built in to the operating system (including replication, dedup, compression, etc).  In my view, focusing only on storage misses an important point – storage lives within larger ecosystems.   Solidfire works on OpenStack, Cloudstack, Citrix and VMware frameworks and offers a solid well-rounded group of storage features with a focus on complete virtualization and  cloud solutions. You can look at the post OpenStack Announcement: Solidfire/Dell/Red Hat Unleash Solidfire Agile Infrastructure (Flash-Storage-Based) Clould Reference Architecture to understand their involvement in cloud solutions. The Solidfire architecture allows from four arrays to 100 to be clustered and provide petabytes of flash storage that are highly available.  Couple this with quality of service and all the standard data reduction features and you end up with a really nice flash storage foundation.  The advantage of their approach allows unlike arrays to be clustered with different types of SSDs.They can also make use of either iSCSI or 8/16Gb Fiber Channel.  It’s worth looking at some of the excellent features of this platform which include seamlessly upgrading storage or seamlessly scaling out. Here are some nice videos that demonstrate some of their advanced features :

Scale and Upgrade Storage Seamlessly

 

Provision, Control and Change Storage Performance

 

SolidFire Cluster Install and Setup in 5 Minutes

 

Quality of Service

 

 

Presentation/Demo : Couchbase in Containers with Bare-Metal Performance and Triton’s Remarkable Elastic Provisioning

Virtualization is a key aspect of modern computing architectures. Often the choice to go to hardware-level virtualization induces a damage to the performance characteristics of our virtual machines. As I have mentioned before – SmartOS zones and Docker offer a better way to go.  In this presentation, Bryan Cantrill of Joyent provides a rapid-fire and humorous presentation highlighting the history of virtualization and the advantages of running Couchbase containers leveraging Triton, SmartOS and Docker. Also demonstrated is a remarkable display of Triton elasticity – easily creating a number of Couchbase servers on-the-fly all within lightweight virtualized containers running across a datacenter.  What is  offered is a sophisticated,highly scalable, highly performant, elastic solution for a datacenter.

Also can be found here.

It gets better.  If you are interested in deploying the Couchbase containers yourself it is fairly straightforward and you can get the “recipe” from the following blog :

screenshot_605

 

Couchbase Use-Case : LinkedIn

Following on the previous post, today’s post discusses an interesting Couchbase use-case.

Often the questions about a particular technology or product are –

  • who is using it successfully ?
  • how is it being used ?
  • how scalable is it ?
  • does it have good performance (usually within a context) ?

In Couchbase’s case they have a large volume examples of customer use-cases.  One example is LinkedIn.  In the first presentation there is a discussion of how LinkedIn uses Couchbase :

Within this context, an obvious second presentation is a presentation of Couchbase server scalability and performance at LinkedIn:

 

Meeting: What’s New in Solr 5 Security & Solr Custom Collector: The Anti-Score

The arrival of Apache Solr 5 has brought with it a number of features. In these talks you will see some of the advantages of using Solr as your search engine.

Solr Meetup on Tuesday, August 11th in downtown Seattle discussing two topics on Solr :

  • What’s new in Solr 5 Security. Presented by Anshum Gupta, Lucidworks. Apache Solr has evolved into a highly scalable system, capable of handling a lot of data and high number of queries, but only recently was a mechanism to secure access in Solr provided. Apache Solr 5.2 shipped with pluggable authentication and authorization modules. These modules enable users to write their own plugins for managing security in Solr.

    This talk will cover an overview of both the authentication and authorization frameworks, and how they work together within Solr. It will also provide an overview of existing plugins and how to enable them to restrict user access to resources within Solr.

  • Solr Custom Collector: The Anti-Score. Presented by Michael Kosten, Getty Images. Sometimes, you don’t want to return just the top scoring documents as your search results.  If you have an eCommerce site, you may want to ensure that multiple lines of business are represented. If you incorporate customer interaction in your score, you may want to ensure that newer documents or certain categories are still represented and that your results don’t become stale. This requirement could be handled in middleware that post processes the search results, by requesting extra rows and rearranging them or by interleaving multiple queries. A better solution is to implement your own custom collector in Solr, so that search results can be arranged in any order. Michael will demonstrate a solution that returns top scoring documents, but grouped within categories. For example, a search for books could interweave the best fiction and non-fiction in a single query result. He will also demonstrate how to implement a custom priority queue to reduce memory requirements if there are many categories, and how the custom collector can be integrated into Solr without modifying the base distribution.

How to register for the talk : http://www.meetup.com/Seattle-Solr-Lucene-Meetup/events/223899316/