CAPSTONE PROJECTS

PRAXIS BUSINESS SCHOOL, BANGALORE

BATCH OF JANUARY 2019

Trouvaille: The Travel App.

Explore the most popular activities in a place

When it comes to travel, the online reviews posted by the tourists and travellers in websites like TripAdvisor can be great pieces of information for the potential travellers and tourists to choose an appropriate destination for them and plan their vacation. These reviews may give the potential visitors much valuable information about the places they would like to visit, e.g. best season to travel, the best time to visit different places, common activities, the dos and don’ts, the general moods of the places, etc. However, reading and summarizing such a massive quantity of reviews is an extremely tedious task. We identify the potential of using these reviews to design an automated framework to derive valuable insights about various places and recommend them to potential visitors. Information mining of these kinds will benefit both the visitors and service providers. We this project we aimed at extracting the most popular activities related to a place as mentioned by travellers in their reviews in platforms like Tripadvisor using NLP and Machine Learning techniques. That is not it, we have also mined some very useful information which will help a traveller plan their trip.

KGF: Kustomer Golden Feedback

Comparative Analysis of Similar Products based on Customer Reviews

The internet has changed the way we do shopping. Online shopping has become very popular due to the increased access to the internet via desktop and mobile. With the advent of various e-commerce platforms, the shopping experience of customers has improved significantly because of convenience, pricing, variety of products and time-saving. To enhance the shopping experience, online sellers encourage customers to share their feedback for the products purchased in the form of quantitative ratings, textual reviews or a combination of both. These reviews help potential customers to get an idea of the products and compare them with other similar products. In this research, we aim to suggest a framework that will enable easier comparison of similar products based on the opinions of the reviewers on different product features. The process will use NLP and Machine Learning to 1) mine the product features of the given products from the customers’ review, 2) mine the customer opinions on each product features and 3) put these opinions on the comparative scale for easy comparison.

WhatsFake: Fake Message Identifier

An Attempt to Identify Fake Messages circulating in WhatsApp

At present WhatsApp is being used by 1.5 billion people. Though the platform is used to connect with family, friends and colleagues, it has been misused by a few to spread rumours and fake messages. When a user receives a WhatsApp forward message, he has very limited options at disposal to verify the veracity of the forward message. The user can attempt to verify with the forwarder or verify using google search. However, verifying the veracity of a WhatsApp message using google search is exhausting and the user cannot resort to google for all the forwards received. This sets the motif of this research. We used Natural Language Processing to deeply explore the semantics of fake data and identify patterns in them using supervised Machine Learning techniques to distinguish between fake and genuine messages. We also extended our exploration based on behavioural and contextual patterns of fake messages.

Zoom-In Zoom-Out: Controlling Computer using Gestures

Use gestures to control the keyboard and mouse functionalities

The primary objective of this project is to achieve zoom-in & zoom-out operations using both hands. The goal is to attain high accuracy in terms of the levels of zooming, based on the inward and outward movement of hands. The project also focusses in making a computer to recognize other gestures in order to control a number of keyboard and mouse functionalities. The focus is also on addressing the challenges involved in data collection and model fitting by comparing the various techniques, which are currently in use.

Network-Based Approach to Detect Spam Reviews

Spam review detection

Spam reviews have become a widespread problem, with often spam reviewers writing fake reviews to unjustly promote or demote certain products or businesses. Existing approaches to detect spam have been successful partly because they separately utilized linguistic clues of deception, behavioural footprints, or relational ties between agents in a review system. But, none of them tied together the review text and metadata in an unsupervised fashion. In this project, an attempt towards Network-based classification approach harnesses clues from all metadata (user ID, timestamp, rating) as well as text data (word frequency, review length etc.) to spot suspicious users and reviews, as well as products targeted by spammers. In this relatively new approach, using features built on review, text and metadata, a tri-partite undirected graph network (using Markov Random Fields) is built. Using this graph, a classification algorithm is written (Loopy Belief Propagation) which uses the context of metadata to label a user, review and a product as spam or not.

Keypad Word Prediction

As you type, your device will try to predict what you will type

Our algorithm predicts text using machine learning and neural networks to figure out the words you use the most often. It tends to create a personalized dictionary of these words and it scores them based on the probability that you will use them again; higher the score the more likely the word will be suggested to you. The algorithm predicts customized words according to the frequency of the words used by the user. For example, if someone uses a particular slang or an abbreviation/word a lot like “OMW” which stands “On My Way”, then it will predict accordingly. Another example if someone uses “thanks a lot” very often and if he starts typing “th” then our algorithm will suggest “thanks a lot”.

Credit Card Default Prediction using a Highly Imbalanced Data

Techniques to handle the class imbalanced problem

The financial institutions need to do a Credit Risk Assessment of all applications received for new loans and credit cards. This allows the lenders to discern the creditworthiness of a borrower and determine whether they will be able to recover their money or not. They use data from previous applicants and their credit history over the years and apply machine learning techniques to identify potential defaulters. The problem arises when these data turns out to be imbalanced, with only a handful of defaulters to guide the algorithm. The objective of this project is to predict credit card default from a highly imbalanced dataset which includes only 4% default cases. With such a class imbalanced data, standard machine learning algorithms tend to get overwhelmed by the majority class and fail to identify the minority class. Predicting defaults becomes immensely difficult as the algorithm gets biased towards the non-default. Our primary challenge will be to improve the class imbalanced dataset which may help to improve the model performances. The performance of various machine learning algorithms by correcting through this work, we will also discuss various techniques to handle such class imbalanced problem.