Software Engineering discussion
Making Sense of NoSQL
>
Ch 1: NoSQL: It's about making intelligent choices
date
newest »


This is a good point. Some people don't think that a product is a database unless it has a query language. A simple API (PUT, GET, DELETE) would not count. They prefer the term "data store" to differentiate these items.
> Haven't we seen this all before, in the 1990s, when object-oriented databases were supposed to take over the world (and they fizzled)?
Yes, I agree that we have seen database innovations in the past that has not become mainstream. PICK and MUMPS are good examples of these that are still in use today. What I think is different now is the number of companies that are using new databases because Relational/SQL really are not meeting their needs.
> The chapter mentions standards several times, but I don't know of any standards across NoSQL products
This is a key question! I think we are just starting to see standards appear, but it is too early for many of the newer database architectures like graph stores. Although I have been doing XQuery since 2006, and I can port my apps between native-XML databases (eXist, MarkLogic etc.) and some document stores with adapters (JSONiq) , other databases put standards on a low priority. It is interesting to note that Amazon's S3 APIs have started to emerge as a standard in several key-value stores (Riak) and object stores like SWIFT. There is even a "S3QL" effort on bitbucket now. Each year at the NoSQL Now! conference we have a panel discussion on the topic and each year many vendors say "maybe next year". I predict in five years we will have roughly six core standards, one or two for each of the core architectures. (SQL, MDX, S3QL, RDF/SPARQL, Cypher and XQuery). Some databases already support two of these and one supports three of them. Until we have these standards, third party apps that us NoSQL will be few and far between.
- I haven't seen much about doing object persistence with NoSQL databases.
I think the key most people are finding is that if they use document stores they don't need any OR mapping layer. Document stores really become the serialization of business objects. Using document stores are the one area that developers cite is the biggest win in developer productivity. Spoiler alert: We have several case studies about this later on in the book.


I am intrigued by the concept of using NoSQL as a toolkit, not just as a database. What sort of leverage could be achieved for a CODASYL (network)database?
NoSQL is an umbrella for about a half-dozen different paradigms. Moving within a paradigm (for example, between graph DBs) is easier than moving between paradigms (for example, from a graph DB to a key/value DB).
Your CODASYL question is an interesting one. If the network has shared elements, then a graph DB might be the closest equivalent. If the elements are not shared, but nested, then a document DB or column family DB might be a better fit. Dan might have a better answer to this.
Your CODASYL question is an interesting one. If the network has shared elements, then a graph DB might be the closest equivalent. If the elements are not shared, but nested, then a document DB or column family DB might be a better fit. Dan might have a better answer to this.

Isn’t the choice of key the crucial decision impacting the performance of a key-value store? If so, I can see why the key might contain the name of the directory containing the value. But, what happens when the DBA splits the directory to handle a scaling issue? Must all the keys (where ever they are stored) be updated to reflect the change? How do you know the key is actually of the format that contains a directory name, and that the inclusion of the string ‘Plant’ is the name of a directory and not a person? Do keys typically have a format identifier, tag, version, etc. identifying how to read its contents? I could see putting the key in a CODASYL database to link to, for example, a picture of the customer. But, such a database might not get updated instantly so there would be a time where the old key might be used instead of the new key. I have seen DBA’s use surrogate keys to guard against the case in a relational database where the format of the primary key must change. I guess the changing of the format of the key might be analogous to schema migration in a database.

Yes, that would be one approach. The tricky thing is to be able to do "joins" and "merges" between two external databases that were not originally designed to work together. This is what RDF and SPARQL does. You can extract data from many sources, hopefully assigning the same items the same URI, and then join them together in a single triple store. If Tom Petters has the same SSN in both systems this is easy. But if items don't have the same ID the problems becomes much harder. Since Tom Petters has a wikipedia page http://en.wikipedia.org/wiki/Tom_Petters you could use this as the "person" ID as long as you knew it was the same "Tom Petters".
> Isn’t the choice of key the crucial decision impacting the performance of a key-value store?
Not usually. In most simple key-values stores the key is just a string that is indexed. How it is designed is not critical. In other datatypes of databases such as column-family stores key design is important to performance. For example in GIS systems using IDs that have sub-regions of a map as the key gets larger allows you to quickly "zoom in" to get more detail of a map.
> Must all the keys (where ever they are stored) be updated to reflect the change?
Not usually for simple key-value stores. The design of a key usually does not determine physical location on a disk.
>How do you know the key is actually of the format that contains a directory name, and that the inclusion of the string ‘Plant’ is the name of a directory and not a person?
It frequently depends on the implementation. In your example it sounds like you are referring to a hierarchical key-value store which is a bit more specialized and works like a distributed file system such as HDFS or CEPH.
>Do keys typically have a format identifier, tag, version, etc. identifying how to read its contents?
Not usually. However most column-family stores do have a 64-bit "version stamp" that is often used to store a time stamp. You can also use a column name to store the data type, although I have never seen this. Most document stores store files with extensions just like files and associate a mime-type with these extensions. So files that end in '.xml' would be indexed but files that end in '.jpg' are just stored as binaries.
Some comments/questions to get us going:
- Here is a list of 150 NoSQL databases: http://nosql-database.org/
- Is Memcached really a NoSQL database, or a performance/scale improvement technology sitting on top of traditional SQL databases?
- I also like to say that NoSQL stands for "Not Only SQL" instead of "No SQL". Some have said that NoSQL should be called NoREL "Not Relational". I think the name is unfortunate, but it is already set in concrete by now.
- Haven't we seen this all before, in the 1990s, when object-oriented databases were supposed to take over the world (and they fizzled)?
- The chapter mentions standards several times, but I don't know of any standards across NoSQL products (except, variations of SQL!).
- I haven't seen much about doing object persistence with NoSQL databases. Has anyone seen Hibernate-like products for persisting Java objects to one of the NoSQL stores? Which NoSQL model seems to provide the best fit for this use case?