Yelp Image Classifier

Enhancing Yelp review using Image Clustering

The Yelp Dataset challenge invites participants to explore datasets made available by Yelp in order to explore innovative ways to use this data, in hopes of finding insight and correlations within the data that cam help the business model grow.

Our motivation to our project is to enrich the user experience by creating tags for photos based on contextual information, automatically enhancing a review with an image.


User experience is one of the pillars of human-computer interaction. Successful websites improve the quality of the user's interaction with their content in many different ways. The Yelp Dataset challenge invites participants to explore the datasets made available by Yelp to find relevant insights that could be useful to the users, to the business owners or to both. Most of the Yelp reviews do not contain images that illustrate the message that the reviewer is trying to convey. Thus, showing images to the user that are related to a review could significantly enhance the user experience. In this work, we propose an automated data-mining-based framework that enhances restaurant Yelp reviews by suggesting images uploaded by other users which are relevant to a particular review. The framework developed consists of three main components: 1) a Convolutional Neural Network image classifier used to predict the label of each new image, 2) a Long Short-Term Memory neural network that generates a caption for an image in case a caption is not provided, and 3) a Latent Dirichlet Analysis, where we identify the most probable topic per review and the top words that are present in the review to map them to captions that are in the same topic and contain one or more of the top words. The results show that our framework is severely affected by the low quality of many captions and reviews, particularly with respect to the predicted captions. However, a qualitative analysis of the predicted images shows promising results. To perform the qualitative analysis we deployed a Django-based website that shows the results for the first two steps of the framework.