Department of
computer Science, Rutgers University |
CS672: Recommendation and Search Engines |
A Social network approach |
|
Professor Apostolos
Gerasoulis, gerasoul@cs.rutgers.edu |
FALL-2011. |
Knowledge Needed: High level programming, Matlab, Mysql, Linux, linear algebra , numerical algorithms, machine learning, algorithms.
Description:
Search engines have had a
significant impact during the last decade. Searching has become a dominant
web activity while recommendation engines have shown some promise as part
of vertical activities. Examples are
the amazon shopping recommendation engine and the Netflix movie
recommendation engine. The research community has been active in the area,
and the KDD-Cup for 2011 is in Recommending Music Items based on the Yahoo!
Music Dataset. In this course we
plan to investigate both recommendations and search engines using a social
network approach. The users and items(movies, music, shopping etc.) are represented as
action matrices where each entry represents an action, a ranking by a user
, a click on a web page or product,
and so on. Once the action matrix is
being formed with sufficient density then the question of filling up all
entries of the matrix (prediction of the entries) with sufficient high
accuracy and recall are the problems to be considered and studied. There
are many similarities between search engines and recommendation engines, In
search engines the social network matrix is being formed by web links, user
clicks and or/ web hierarchies. In recommendation engines, the users
explicitly and/or implicitly define the social network matrix by buying a
product, watching a movie, listening to music and then ranking all
items. The major emphasis in this
course is to discuss the similarities and differences between these two
major emerging technologies in the social web. |
Organization and Expectations- There are three important parts in the course: Heuristic study and development, software ,and manipulation of large data sets. Students will participate via presentations, software writing for heuristics and system integration. Some of the main systems that we plan to use as well a sample of heuristics to study are given below. Grading will be determined by programming assignments and presentations.
1. The Lucene search engine. We will discuss the engine, ranking, proximity and other search engine technologies. http://lucene.apache.org/java/docs/index.html.
2. The CWIS portal. http://scout.wisc.edu/Projects/CWIS/ . We will use the portal to interface with mysql and Matlab environments as well utilize it as our recommendation engine interface and user registration. It is written in PHP.
3. We plan use both small user datasets from the web such as movielens http://www.grouplens.org/node/12 as well massive datasets from Yahoo music and movie data. For crawling and indexing we plan to use Nutch infrastructure http://nutch.apache.org/ on a movie or music site such as imdb, http://www.imdb.com/ and wikipedia.
4. Few examples of papers to gage the level of the course material to be covered:
i. Slope One Predictors for Online Rating-Based Collaborative Filtering http://www.daniel-lemire.com/fr/documents/publications/lemiremaclachlan_sdm05.pdf
iii. Towards Decentralized Recommender Systems: http://www.informatik.uni-freiburg.de/~cziegler/papers/A4-Thesis.pdf
iv. Co-clustering documents and words using Bipartite Spectral Graph Partitioning : http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_bipartite.pdf
If you have more
questions regarding the course please send me an email at
gerasoul@cs.rutgers.edu. One part of the seminar would be a music recommendation engine. y