Software Engineering discussion

9 views
Making Sense of NoSQL > Ch 4: NoSQL Data Architecture Patterns

Comments Showing 1-1 of 1 (1 new)    post a comment »
dateUp arrow    newest »

message 1: by [deleted user] (new)

This is one of the best introduction to the basic NoSQL data models I have seen. I think it is very helpful to keep this taxonomy in mind when assessing and learning about a specific product.

I think there is some potential confusion in the key/value model for readers who are familiar with or trying to learn the Hadoop MapReduce system. The Hadoop Distributed File System (HDFS) conforms to the model described in this chapter, where the key is the directory/file name and the value is the file contents. This is analogous to Amazon's S3 store. But, Hadoop's MapReduce model also uses a key and value, but in a very different way. Individual HDFS values (files) contain records, and these records can be divided into fields that are keys and fields that are values. This can be done dynamically at run time, so there is no need for a predefined schema, and two applications can (usually) intepret different fields as keys and values for the same file. Keys and values can be either primitive or complex types. But, unlike the model described in this chapter, there is no indexing over keys or retrieval by a key. Everything in MapReduce is done with a sequential scan over all records. So, the terms key/value are used at two different layers in Hadoop, with very different semantics.

On another point, Hive can be run over HBase directly, as well as HDFS, so this gives SQL-like access to that particular column family store.


back to top