Software Engineering discussion

34 views

Making Sense of NoSQL > Ch 1: NoSQL: It's about making intelligent choices

Comments Showing 1-7 of 7 (7 new) post a comment »

date

newest »

message 1: by [deleted user] (new)

Jan 01, 2014 06:50AM

Happy New Year, and welcome to the start of our discussion on the new book. I will post a topic for each chapter, like this one, about once a week, which will keep us on pace to finish this book over the next three months.

Some comments/questions to get us going:

- Here is a list of 150 NoSQL databases: http://nosql-database.org/

- Is Memcached really a NoSQL database, or a performance/scale improvement technology sitting on top of traditional SQL databases?

- I also like to say that NoSQL stands for "Not Only SQL" instead of "No SQL". Some have said that NoSQL should be called NoREL "Not Relational". I think the name is unfortunate, but it is already set in concrete by now.

- Haven't we seen this all before, in the 1990s, when object-oriented databases were supposed to take over the world (and they fizzled)?

- The chapter mentions standards several times, but I don't know of any standards across NoSQL products (except, variations of SQL!).

- I haven't seen much about doing object persistence with NoSQL databases. Has anyone seen Hibernate-like products for persisting Java objects to one of the NoSQL stores? Which NoSQL model seems to provide the best fit for this use case?

reply | flag

message 2: by Dan (new)

Jan 02, 2014 10:53AM

> Is Memcached really a NoSQL database
This is a good point. Some people don't think that a product is a database unless it has a query language. A simple API (PUT, GET, DELETE) would not count. They prefer the term "data store" to differentiate these items.

> Haven't we seen this all before, in the 1990s, when object-oriented databases were supposed to take over the world (and they fizzled)?
Yes, I agree that we have seen database innovations in the past that has not become mainstream. PICK and MUMPS are good examples of these that are still in use today. What I think is different now is the number of companies that are using new databases because Relational/SQL really are not meeting their needs.

> The chapter mentions standards several times, but I don't know of any standards across NoSQL products
This is a key question! I think we are just starting to see standards appear, but it is too early for many of the newer database architectures like graph stores. Although I have been doing XQuery since 2006, and I can port my apps between native-XML databases (eXist, MarkLogic etc.) and some document stores with adapters (JSONiq) , other databases put standards on a low priority. It is interesting to note that Amazon's S3 APIs have started to emerge as a standard in several key-value stores (Riak) and object stores like SWIFT. There is even a "S3QL" effort on bitbucket now. Each year at the NoSQL Now! conference we have a panel discussion on the topic and each year many vendors say "maybe next year". I predict in five years we will have roughly six core standards, one or two for each of the core architectures. (SQL, MDX, S3QL, RDF/SPARQL, Cypher and XQuery). Some databases already support two of these and one supports three of them. Until we have these standards, third party apps that us NoSQL will be few and far between.

- I haven't seen much about doing object persistence with NoSQL databases.
I think the key most people are finding is that if they use document stores they don't need any OR mapping layer. Document stores really become the serialization of business objects. Using document stores are the one area that developers cite is the biggest win in developer productivity. Spoiler alert: We have several case studies about this later on in the book.

reply | flag

message 3: by Dan (new)

Jan 03, 2014 09:40AM

I forgot to mention, if people are not sure if they want to buy the book they can get the first chapter free. The download is at http://manning.com/mccreary

reply | flag

message 4: by James (new)

Jan 04, 2014 02:26PM

Each relational vendor supports at least the SQL 92 Entry Level standard. Vendors also implement many extensions, which are crucial to application development. One such are is Built-In Functions (BIF's). This area is so extension prone there is a book detailing the similarities and differences between implementations: SQL Functions Programmer's Reference (Programmer to Programmer), by Arie Jones. Perhaps a book comparing NoSQL databases, for the purpose of conversion, would be interesting?

I am intrigued by the concept of using NoSQL as a toolkit, not just as a database. What sort of leverage could be achieved for a CODASYL (network)database?

reply | flag

message 5: by [deleted user] (new)

Jan 08, 2014 03:29PM

NoSQL is an umbrella for about a half-dozen different paradigms. Moving within a paradigm (for example, between graph DBs) is easier than moving between paradigms (for example, from a graph DB to a key/value DB).

Your CODASYL question is an interesting one. If the network has shared elements, then a graph DB might be the closest equivalent. If the elements are not shared, but nested, then a document DB or column family DB might be a better fit. Dan might have a better answer to this.

reply | flag

message 6: by James (new)

Jan 22, 2014 12:24PM

Now that I have read about graph databases (chapter 4), I see that graphs excel at documenting relationships. As you point out, if there are no relationships within the CODASYL database, the graph is not very interesting. But, what about relationships between databases? Suppose someone’s bank account had some large deposits. If that person were associated with Tom Petters (a local crook), the deposits could be “interesting”. Would an easy way to identify this situation be to load both the CODASYL and social information into the same graph database?

Isn’t the choice of key the crucial decision impacting the performance of a key-value store? If so, I can see why the key might contain the name of the directory containing the value. But, what happens when the DBA splits the directory to handle a scaling issue? Must all the keys (where ever they are stored) be updated to reflect the change? How do you know the key is actually of the format that contains a directory name, and that the inclusion of the string ‘Plant’ is the name of a directory and not a person? Do keys typically have a format identifier, tag, version, etc. identifying how to read its contents? I could see putting the key in a CODASYL database to link to, for example, a picture of the customer. But, such a database might not get updated instantly so there would be a time where the old key might be used instead of the new key. I have seen DBA’s use surrogate keys to guard against the case in a relational database where the format of the primary key must change. I guess the changing of the format of the key might be analogous to schema migration in a database.

reply | flag

message 7: by Dan (new)

Jan 22, 2014 06:21PM

> Would an easy way to identify this situation be to load both the CODASYL and social information into the same graph database?
Yes, that would be one approach. The tricky thing is to be able to do "joins" and "merges" between two external databases that were not originally designed to work together. This is what RDF and SPARQL does. You can extract data from many sources, hopefully assigning the same items the same URI, and then join them together in a single triple store. If Tom Petters has the same SSN in both systems this is easy. But if items don't have the same ID the problems becomes much harder. Since Tom Petters has a wikipedia page http://en.wikipedia.org/wiki/Tom_Petters you could use this as the "person" ID as long as you knew it was the same "Tom Petters".

> Isn’t the choice of key the crucial decision impacting the performance of a key-value store?
Not usually. In most simple key-values stores the key is just a string that is indexed. How it is designed is not critical. In other datatypes of databases such as column-family stores key design is important to performance. For example in GIS systems using IDs that have sub-regions of a map as the key gets larger allows you to quickly "zoom in" to get more detail of a map.

> Must all the keys (where ever they are stored) be updated to reflect the change?
Not usually for simple key-value stores. The design of a key usually does not determine physical location on a disk.

>How do you know the key is actually of the format that contains a directory name, and that the inclusion of the string ‘Plant’ is the name of a directory and not a person?
It frequently depends on the implementation. In your example it sounds like you are referring to a hierarchical key-value store which is a bit more specialized and works like a distributed file system such as HDFS or CEPH.

>Do keys typically have a format identifier, tag, version, etc. identifying how to read its contents?
Not usually. However most column-family stores do have a 64-bit "version stamp" that is often used to store a time stamp. You can also use a column name to store the data type, although I have never seen this. Most document stores store files with extensions just like files and associate a mime-type with these extensions. So files that end in '.xml' would be indexed but files that end in '.jpg' are just stored as binaries.

reply | flag