← All Projects
2025Completed · Final Year Project

Book Recommendation System

Indian literature deserves better recommendation algorithms.

PythonML/DLPandasNumPyCollaborative Filtering
View on GitHub

What

A recommendation engine focused on Indian Literature that benchmarks multiple ML and DL algorithms — collaborative filtering, content-based filtering, and hybrid models — to identify the most effective strategy for a culturally niche, sparse dataset.

Why

Most recommendation systems are trained on Western literature datasets. Indian literature — across languages, regions, and genres — is severely underrepresented. I wanted to understand how standard recommendation algorithms degrade when applied to sparse, niche datasets, and whether hybrid approaches could compensate.

How I built it

01

Sourced and cleaned a dataset of Indian literature titles, authors, genres, and reader ratings.

02

Performed exploratory data analysis to understand sparsity, rating distribution, and genre imbalance.

03

Implemented collaborative filtering (user-based and item-based), content-based filtering (TF-IDF on metadata), and a hybrid model combining both signals.

04

Benchmarked all approaches using RMSE, precision@k, and recall@k on a held-out test set.

05

Documented findings in a structured report comparing algorithm performance trade-offs for sparse cultural datasets.

Challenges

Dataset sparsity was severe — most books had very few ratings, which breaks standard collaborative filtering assumptions.

Content metadata (genre, themes, language) was inconsistent and required significant manual cleaning.

Hybrid weighting between collaborative and content signals required extensive tuning.

Outcome

The hybrid model outperformed both pure approaches on sparse subsets. The project contributed a structured analysis of recommendation algorithm performance on underrepresented cultural datasets — submitted as the MCA final year project.

Tech Stack

Language

Python

ML

Collaborative FilteringContent-Based FilteringHybrid ModelsScikit-Learn

Data

PandasNumPyTF-IDF
← Back to all projects