Movie Recommender

Jifu Zhao, 15 August 2017

Hadoop Implementation of Movie Recommender Systems

Data source: Netflix Prize Data Set

Original Input:

image

Workflow

image

image

image

Explanation

image

  • MapReduce Job 1:
    • Mapper: read raw input information
      • input: userID, movieID, rating
      • output: < key=userID, value=movieID: rating >
    • Reducer: merge the output from Mapper according to unique userID
      • input: < key=userID, value=< movie1: rating1, movie2: rating2, … > >
      • output: < key=userID, value=”movie1: rating1, movie2: rating2, …” >
  • MapReduce Job 2:
    • Mapper: read output from MapReduce job 1
      • input: userID \t “movie1: rating1, movie2: rating2, …”
      • output: < key=”movie_A: movie_B”, value=1 >
    • Reducer: merge the output from Mapper according unique movieA: movieB
      • input: < key=”movie_A: movie_B”, value=1, 1, 1, … >
      • output: < key=”movie_A: movie_B”, value=count >
  • MapReduce Job 3:
    • Mapper: read output from MapReduce job 2 and split
      • input: movie_A: movie_B \t count
      • output: < key=movie_A, value=”movie_B=count” >
    • Reducer: calculate the normalized co-occurrence matrix value
      • input: < key=movie_A, value=< movie_B=count, movie_C=count, … > >
      • output: < key=movie_B, value=”movieA=count/total” >
  • MapReduce Job 4:
    • Mapper: read the original user rating information
      • input: userID, movieID, rating
      • output: < key=userID, value=rating >
    • Reducer: merge the output from Mapper according to unique userID
      • input: < key=userID, value=< rating1, rating2, … > >
      • output: < key=userID, value=< average rating > >
  • MapReduce Job 5:
    • Mapper 1: read co-occurrence matrix from MapReduce Job 3
      • input: movie_B \t movie_A=ratio
      • output: < key=movie_B, value=”movie_A=ratio” >
    • Mapper 2: read the original user rating information to build the rating matrix
      • input: userID, movieID, rating
      • output: < key=movie_B, value=”userID: rating” >
    • Reducer:
      • input: < key=movie_B, value=”movie_A=ratio1, movie_C=ratio2, …, user1: rating1, user2: rating2, …” >
      • output: < key=”userID: movieID”, value=ratio * rating >
      • setup: read the user average rating
  • MapReduce Job 6:
    • Mapper: read the output from MapReduce Job 5
      • input: userID: movieID \t ratio * rating
      • output: < key=”userID: movieID”, value=ratio * rating >
    • Reducer:
      • input: < key=”userID: movieID”, value=< subrating1, subrating2, … > >
      • output:< key=”userID: movieID”, value=rating prediction >

Demo

Input:

image

CooccurrenceMatrix (un-normalized)

image

User Rating Matrix

image

Output:

image

image

Run code

$: cd RecommenderSystem/
$: hdfs dfs -mkdir /input
$: hdfs dfs -put input/* /input
$: cd src/main/java/
$: hadoop com.sun.tools.javac.Main -d class/ *.java
$: cd class/
$: jar cf recommender.jar *.class
$: hadoop jar recommender.jar Driver /input /dataDividedByUser /coOccurrenceMatrix /Normalize /averageRating /Multiplication /Sum
$: hdfs dfs -cat /Sum/*