Mini Project Notice for BTech-CSE(BDA – 4th and 6th)

The following lists mini-project topics for the students of BTech-CSE(Big Data Analytics)- 4th and 6th semester. All relevant information is provided with the descriptions. You are required to select one topic from the choices available. For any queries contact D.Bordoloi.

Last Date for Evaluation: 04 May,2019

6th Semester

1.     Build an application / web-page / mobile app which will perform the following tasks:

The program will take the following input: Weather (for example sunny, rainy etc), Season (e.g., summer, winter), Geographic Scene (e.g., hilly terrain, open field, crowded market etc) and other inputs which can be thought of by the students themselves. Given the input the program will generate a virtual reality scene. The generated virtual scene can be used for training ML algorithms to detect objects in varying environmental conditions. (Max Two students in a group)

2.     Build a mobile app for the following analytics task:

The app will perform analytics on the user-interaction with his mobile phone over a period of time (say 15 days or 1 month). It will specifically perform analytics on the time spent on social media, time spent on playing games, time spent of messaging services like whatsapp etc. Once the analytics are done the app will report the unhealthy practices (for example: you are spending daily 5 hours on social media) and suggest healthy ways of engaging the mind and body. (Max Two students in a group_

3.     Perform Analytics on massive scale on the and create high quality visualization from the same. This dataset is provided by Ministry of Health and Family Welfare. It contains data of National Hospital Directory with Geo Code and additional parameters like location, Category, Systems of Medicine, Contact Details, Area Pin Code, Email address, Website link, Specializations. The following objectives has to be compulsorily achieved:

i)               The hospitals have to be marked in an interactive visualization on the India Map. Provide full resolution at city/ town level granularity and provide aggregate resolution based on specific categories (for example statewise, regionwise, hospital-type wise etc). Also need to provide Filtered-Resolution, Sample-Resolution and Headline-Resolution (50)

ii)              Perform analysis based on the name of the hospital to approximate the best category type of the hospital as no such category is given in the dataset. For example, ‘Sri Prakash Eye Hospital’ will most probably be a ‘Eye Hospital’. Decide the categories yourself and use it in the visualization. (30)

iii)             Decide yourself how would you use the column ‘Location_coordinates’ that would help the end user and use it in the analytics/visualization. (20)

(Max Two students in a group)

4.     Build a prototype application / web-page / mobile app which will perform the following tasks: You are required to predict spread of seasonal disease in a non-clinical way based on three parameters: (i) user input data, (ii) analytics done on news sites, social media etc, and (iii) current weather. For this purpose, you will take the city as Dehradun and take two diseases: Swine-flu and Dengue. Most of the times an outbreak of the disease is reported only when there are confirmed reports clinically (from hospitals), but you are required to predict an outbreak even before that. One reason for not reported on time is that the symptoms are almost similar to a regular flu or fever. Your system would take inputs as symptom checker from the user on the onset of symptoms. It would also ask for additional important information like recent travels. It then analyze reports of similar outbreak in nearby places or the places the person might have visited through news sites and social media post. It also analyze the current weather condition of the city (temperature, humidity etc) which facilitates the outbreak of these diseases. Based on these inputs your model will try to predict the probability of the outbreak. Your prototype should compulsorily have the modules for reading from news sites/social media and reading for weather related data. (Max Two students in a group)


4th Semester

1.     The aim of this project is to study the impact of General Elections in the stock market. Find out the years when General Elections (Lok Sabha Elections) were held. Your task is to study the stock market four months prior to the elections and two months after the elections. Historical stock market data are available in many websites (one such website is You need not study a big number of company stocks, just take the SENSEX index or the NIFTY index. Write code (either R or Python) to calculate various statistical parameters and establish co-relation between elections and the market. The output format has to be decided by the student himself/herself.  (Single student project)

2.     The aim of this project is to perform analytics on the dataset: This dataset contains information on different cereals which are normally taken as breakfast in many countries. Your analytics should cover the following objectives

i)               Regular analytics (for example calorie vs brands, Distribution of fibres for types C & H etc)

ii)              Regular statistics (for example mean, mode, median etc of potassium)

iii)             Find a uniform method to compare different products in presence of misleading information. For example, each product advertises the useful content (like calories, protein, fiber etc) per serving. But there is no standard meaning of serving, each product uses their own definition of serving as evident from the ‘cups’ and ‘weight’ variable.

iv)             Select one question on your own which you think is important from the customer point of view and find the answer from your analytics. (For example: is there a big difference of sugar content of the different manufacturers)

You should decide for yourself how to present your results. Coding language: either R or Python. (Max Two students in a group)

3.     The aim of this project is to cover all the steps of statistical analysis starting from deciding operational definition, deciding variables, formulating survey questions, collecting data, arranging data and performing analytics to address the query. The query is: Does the quality of friendship of the hostel roommate(s) affects the overall experience in GEU? Coding Language: R or Python (Max Two students in a group. Important information: If you select this particular project you need to decide the variables, prepare the survey questions and get it approved by 12th April before proceeding further)