Posts Tagged ‘collective intelligence’

Book review: Programming Collective Intelligence

Tuesday, July 14th, 2009

Programming Collective Intelligence is a new book from O’Reilly, which was written by Toby Segaran. The author graduated from MIT and is currently working at Metaweb Technologies. He develops ways to put large public datasets into Freebase, a free online semantic database. You can find more information about him on his blog: http://blog.kiwitobes.com/.

Web 2.0 cannot exist without Collective Intelligence. The “giants” use it everywhere, YouTube recommends similar movies, Last.fm knows what would you like to listen and Flickr which photos are your favorites etc. This technology empowers intelligent search, clustering, building price models and ranking on the web. I cannot imagine modern service without data analysis. That is the reason why it is worth to start read about it.

There are many titles about collective intelligence but recently I have read two, this one and “Collective Intelligence in Action”. Both are very pragmatic, but the O’Railly’s one is more focused on the merit of the CI. The code listings are much shorter (but examples are written in Python, so that was easy). In general these books comparison is like Java vs. Python. If you would like to build recommendation engine “in Action”/Java way, you would have to read a whole book, attach extra jar-s and design dozens of classes. The rapid Python way requires reading only 15 pages and voila, you have got the first recommendations. It is awesome!

So how about the rest of the book, there are still 319 pages! Further chapters say about: discovering groups, searching, ranking, optimization, document filtering, decision trees, price models or genetic algorithms. The book explains how to implement Simulated Annealing, k-Nearest Neighbors, Bayesian Classifier and many more. Take a look at the table of contents here, it does not list all the algorithms but you can find more information there.

Each chapter has about 20-30 pages. You do not have to read them all, you can choose the most important and still know what is going on. Every chapter contains minimum amount of theoretical introduction, for total beginners it might be not enough. I recommend this book for students who had statistics course (not only IT or computing science), it will show you how to use your knowledge in practice – there are many inspiring examples.

For those who do not know Python – do not be afraid – at the beginning you will find introduction to language syntax. All listings are very short and well described by the author – sometimes line by line. The book also contains necessary information about basic standard libraries responsible for xml processing or web pages downloading.

If you would like to start to learn about collective intelligence I would strongly recommend reading “Programming Collective Intelligence” first, then “Collective Intelligence in Action”. The first one shows how easy it is to implement basic algorithms, the second one would show you how to use existing open source projects related to machine learning.

You can find more about this book on it’s catalogue page here.

Book review: Collective Intelligence in Action

Monday, June 29th, 2009

I am a member of Poznan Java User Group. We have active book review program with O’Reilly, APress and Manning. I wrote some reviews and I’ll post them all, today my first one,  “Collective Intelligence in Action”.

Collective intelligence is very popular these days. Thanks to the Internet companies we can use this concept every day. I have to admit that I am a big fan of “intelligent” services. I search news on Digg, I listen to music through the Last.fm player, I use Wikipedia, YouTube, Amazon etc. everyday. I have been wondering many times how these sites work. The curiosity led me to read “Collective Intelligence in Action”. After I read it, I was surprised that this book is so practical. Theory is limited to minimum. After reading it, you should be able to add the CI features to the existing sites.

Author, Satnam Alag, has organized his work in a perfect way. Every chapter has an introduction, a summary and very handy references – I used them many times. All the mathematical concepts and definitions are shown in examples. There are lot of a Java code listings, therefore the basic knowledge of this language might be useful. The theoretical foundations are not necessary. Each chapter can be treated separately, but together they create coherent paper about recommendation system. The author of the code takes care not only about the correctness but also about the efficiency and the scalability.

The fact that Satnam presents a lot of stable and useful open source software is worth of noticing. Projects like Nutch, Lucene or Weka can be easily adapt to our services. The book shows how to do it from programmer’s point of view (API).

I recommend “Collective Intelligence in Action” to the Java developers who would like to know how to build recommendation systems, intelligent search of theirs resources, automatic tagging or network crawling. Book is worth reading even if you do not plan to use CI in your application. Base Web2.0 mechanisms are very easy to implement and do not require a lot of theoretical knowledge, Satnam Alag has proven this in his paper.

I would not recommend this book to data mining or text analysis experts. This is not an academic work, people who are looking for theoretical information about CI could be disappointed.