Hadoop in the Cloud

Increasingly, Big Data processing is moving to the clouds – Amazon, Joyent and Microsoft to name three.

Amazon has offered a way of attacking ‘big data’ from their clouds.  Amazon offers “Amazon Elastic MapReduce (Amazon EMR) which is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).”  To learn more

aws02

Meanwhile, Joyent, has created Manta – a ZFS object store. Manta offers strongly consistent writes with highly available reads with no object size limits and per-object replication policies.  This service can be coupled to MapReduce frameworks.  This coupling happens locally – unlike Amazon’s approach.  You can learn more about this coupling here :

joyent02

Microsoft, as well, has released HDInsight, which is Azure’s Hadoop-based service.

hdinsight

Ericsson’s Geoff Hollingsworth provided an example of their use of Manta :