Department of computer Science, Rutgers University

CS672: Recommendation and Search Engines

A Social network approach

 

Professor Apostolos Gerasoulis, gerasoul@cs.rutgers.edu

FALL-2011.

 

Knowledge Needed: High level programming, Matlab,  Mysql, Linux, linear algebra , numerical algorithms, machine learning, algorithms.

Description:

Search engines have had a significant impact during the last decade. Searching has become a dominant web activity while recommendation engines have shown some promise as part of vertical activities.  Examples are the amazon shopping recommendation engine and the Netflix movie recommendation engine. The research community has been active in the area, and the KDD-Cup for 2011 is in Recommending Music Items based on the Yahoo! Music Dataset.  In this course we plan to investigate both recommendations and search engines using a social network approach.   The users and items(movies, music, shopping etc.) are represented as action matrices where each entry represents an action, a ranking by a user ,  a click on a web page or product, and so on.  Once the action matrix is being formed with sufficient density then the question of filling up all entries of the matrix (prediction of the entries) with sufficient high accuracy and recall are the problems to be considered and studied. There are many similarities between search engines and recommendation engines, In search engines the social network matrix is being formed by web links, user clicks and or/ web hierarchies. In recommendation engines, the users explicitly and/or implicitly define the social network matrix by buying a product, watching a movie, listening to music and then ranking all items.  The major emphasis in this course is to discuss the similarities and differences between these two major emerging technologies in the social web.  

Organization and Expectations- There are three important parts in the course: Heuristic study and development, software ,and manipulation of large data sets.  Students will participate via presentations, software writing for heuristics and system integration.   Some of the main systems that we plan to use as well a sample of heuristics to study are given below. Grading will be determined by programming assignments and presentations.

 

1.       The Lucene search engine. We will discuss the engine, ranking, proximity and other search engine technologies. http://lucene.apache.org/java/docs/index.html.

2.       The CWIS portal. http://scout.wisc.edu/Projects/CWIS/ . We will use the portal to interface with mysql and Matlab environments as well  utilize it as our recommendation engine interface and user registration. It is written in PHP.

3.       We plan use both small user datasets from the web such as movielens http://www.grouplens.org/node/12  as well massive datasets from Yahoo music and movie data.  For crawling and indexing we plan to use  Nutch infrastructure http://nutch.apache.org/ on a movie or music site such as imdb, http://www.imdb.com/  and wikipedia.

4.       Few examples of papers to gage the level of the course material to be covered:

i.                     Slope One Predictors for Online Rating-Based Collaborative Filtering http://www.daniel-lemire.com/fr/documents/publications/lemiremaclachlan_sdm05.pdf

ii.                   Item-Based Top-N Recommendation Algorithms: http://glaros.dtc.umn.edu/gkhome/node/127

iii.                  Towards Decentralized Recommender Systems:  http://www.informatik.uni-freiburg.de/~cziegler/papers/A4-Thesis.pdf

 

iv.                 Co-clustering documents and words using Bipartite Spectral Graph Partitioning : http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_bipartite.pdf

 

 

 

If you have more questions regarding the course please send me an email at gerasoul@cs.rutgers.edu. One part of the seminar would be a music recommendation engine. y