Software Engineering discussion

12 views

Making Sense of NoSQL > Ch 7: Finding Information with NoSQL Search

Comments Showing 1-2 of 2 (2 new) post a comment »

date

newest »

message 1: by [deleted user] (new)

Mar 07, 2014 08:12AM

This chapter is chock-full of information retrieval theory and application, primarily focused on the document store model. Text processing technology goes back many decades, well before the Internet, but is still important today because so much of our information is in unstructured, free-form text.

Although I agree that XML-based document storage can enable powerful query semantics, many studies show that users do not leverage that power, and just want to enter 1 or 2 keywords and then live with the result. So, the simpler bag-of-words model suffices for many use cases.

Overall, a concise and well-written introduction to this topic.

reply | flag

message 2: by Dan (new)

Mar 08, 2014 11:16AM

Thanks for the kind words Brad.

I agree with you that the bag-of-words model does work in many cases, especially when document collections are small and you have specific keywords that are well known to the people doing the searches and the document authors. The fact that many projects simply use Lucene, Solr and/or the popular Elastic Search is a good testimony to this fact. Solr continues to have new releases each year that make it a more viable NoSQL database by itself.

I also talked to Doug Cutting at the MinneAnlytics conference and he predicted that within a few years that the Solr and Hadoop will overlap and complement each other in functionality.

I have personally found that many search projects are very sensitive to users that are too impatient to even go to the second or third page of search results. So any tricks we can use to rank documents seem to help.

I also wanted to mention that this entire chapter is free on the Manning web site if you still have not purchased the book. Many people like to forward it to their friends when the topics of NoSQL and search come up.

http://www.manning-source.com/books/m...

When I wrote this chapter I was very focused on helping people new to the NoSQL space select a NoSQL system and writing good requirements for evaluation of the search components before a specific system was considered. What I have learned was that the search-centric comparisons seem complication to do and take a long time to setup and evaluate the "F" scores of two systems. Since this is only one part of the evaluation criteria most companies don't really compare the search abilities of NoSQL systems.

I also recently heard from a user that the "search creation wizards" that are being added to at least one NoSQL product is starting to get much more sophisticated. These tools makes is easier for novice users to setup and configure a search service.

We also did not spend much time in the book showing how social graphs can be used to enhance search results. This is something that the IBM "Watson" level systems really take advantage of.

Perhaps we will cover this topics in more detail in future version of the book!

I would like to hear from anyone that has feedback on this chapter and hear how it would drive your NoSQL database selection process. I would especially love to hear from anyone who creates public "RFPs" based on enterprise search requirements. Those documents have good insights into what search requirements organizations have.

reply | flag