Book Recommendation System
Indian literature deserves better recommendation algorithms.
What
A recommendation engine focused on Indian Literature that benchmarks multiple ML and DL algorithms — collaborative filtering, content-based filtering, and hybrid models — to identify the most effective strategy for a culturally niche, sparse dataset.
Why
Most recommendation systems are trained on Western literature datasets. Indian literature — across languages, regions, and genres — is severely underrepresented. I wanted to understand how standard recommendation algorithms degrade when applied to sparse, niche datasets, and whether hybrid approaches could compensate.
How I built it
Sourced and cleaned a dataset of Indian literature titles, authors, genres, and reader ratings.
Performed exploratory data analysis to understand sparsity, rating distribution, and genre imbalance.
Implemented collaborative filtering (user-based and item-based), content-based filtering (TF-IDF on metadata), and a hybrid model combining both signals.
Benchmarked all approaches using RMSE, precision@k, and recall@k on a held-out test set.
Documented findings in a structured report comparing algorithm performance trade-offs for sparse cultural datasets.
Challenges
Dataset sparsity was severe — most books had very few ratings, which breaks standard collaborative filtering assumptions.
Content metadata (genre, themes, language) was inconsistent and required significant manual cleaning.
Hybrid weighting between collaborative and content signals required extensive tuning.
Outcome
The hybrid model outperformed both pure approaches on sparse subsets. The project contributed a structured analysis of recommendation algorithm performance on underrepresented cultural datasets — submitted as the MCA final year project.
Tech Stack
Language
ML
Data