Kishor Kumar Sridhar

Download Resume

About Me

I am a graduate student in Information Systems with 3+ years of industry experience in data analytics, data science, business intelligence, client/stakeholder engagement, requirement elicitation, project management and database management.

Proven experience in statistical analysis, implementing ML algorithms, performing cohort analysis for user retention, handling Big Data using HDFS, and presenting compelling dashboards and data driven stories to support evidence-based business decision-makings.

Work Experience

The only source of knowledge is experience.

-Albert Einstein

Data Science Fellow - Data Science for Public Good Fellowship (May 2020 - Aug 2020)

• Led two Data Science projects to implement an end-to-end Data Science framework using statistical modeling and analyses
• Developed an R Shiny dashboard to improve public awareness of Iowa's resources related to mental and physical
health and childcare for evidence-based policy-making in substance abuse prevention & recovery programs
• Performed data collection by web-scraping, built data pipelines, assessed data quality, and spatially mapped county-level composite indicators of Social and Natural Assets related to upward economic mobility
map county-level composite indicators of Social and Natural Assets related to upward economic mobility
• Communicated analytical findings to non-technical audience using R Shiny dashboard for assessing
community well-being in Iowa
• Performed data-wrangling using Tidyverse, Dplyr package in R and utilized GoogleMaps API with R and QGIS
to map the physical addresses to Latitude and Longitude co-ordinates
• Conducted Exploratory Data Analysis on the Behavioral Risk Factor Surveillance System (BRFSS) data
to uncover insights on the binge drinking problem in Iowa
• Performed time series forecasting to estimate alcohol sales in Iowa and spatially segmented high-risk
alcohol using population
• Created a prototype of an interactive dashboard that performs sentiment analysis (NLP) on generated transcripts
from Hotlines to support improved customer service and auto-generate reports



Data Science Intern Iowa Department of Transportation (DOT) (June 2018 - May 2020)

• Implemented time series forecasting using FB Prophet model to estimate customer wait-times for
Live display on DOT website
• Predicted the deterioration of pavement conditions using Kolmogorov–Smirnov Test by conducting
hypothesis significance testing and statistical modeling based on various KPIs
• Utilized Natural Language Processing (NLP) algorithms such as LDA for topic modeling with a
coherence score of 0.45 on 10,000 public feedbacks to identify factors governing highway maintenance
• Used Python and Unix scripts to read and write on HDFS and analyzed high volume of data using
Hadoop and Spark
• Worked with Hadoop clusters of 4TB size to analyze data using AWS EMR and reduced data storage by 75%
using Parquet files
• Performed statistical analysis using MLlib in PySpark for identifying correlation between crash and weather conditions
• Improved the performance of application by 50% through Apache Spark ETL processes by transforming RDD to Spark DataFrames
• Developed efficient SQL scripts for data cleansing, transformation and performed data modeling for
ad-hoc analysis and reporting

Data Analyst Intern Iowa State University. ( May 2018- Aug 2018)

• Conducted variance analyses within departments in Tableau to identify spending gaps and story lined
the results to stakeholders
• Performed factor analysis, What-If analysis on purchase trends of departments facilitating 20% savings
in vendor contract renewals
• Assisted the Procurement Services Department of Iowa State University in making better purchase decisions
through insightful Tableau dashboards

Data Analyst Torus ( Aug 2016- Oct 2017)

• Automated insurance claims processing systems with integrated document management to reduce the
turn-around time by 30%
• Handled MongoDB database activities such as locking, transactions, indexes, Sharding, replication for advanced analytics
• Assisted in planning and implementation of experimental design for A/B testing to improve conversion rate
based on various KPIs
• Analyzed and manipulated data in SQL for client requests and presented findings using interactive
Power BI visualizations

Programmer Analyst Infosys (Feb 2014 - July 2016)

• Performed extensive data governance activities using Enablon EHS tool for a data migration project to
modernize legacy systems
• Improved database performance by reducing query execution time by 50% using efficient SQL queries
• Communicated as a liaison between the client and technical team to deliver key business solutions
using Agile methodologies
• Leveraged Microsoft SSIS to transform data for ETL processes
• Coordinated with QA teams to execute UAT cycles

Education

Education is the manifestion of perfection already in man.

-Swami Vivekananda

Master of Science

Information Systems
Minor in Statistics

Jan 2018 - Dec 2020

ISU

Iowa State University

Bachelor Of Engineering

Electrical & Electronics

Sep 2009 - May 2013


ISU

Anna University, Chennai, India

Projects

To strive, to seek, to find, and not to yield.

- Alfred Lord Tennyson

project name

Customer Segmentation using RFM analysis

The data set contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers. The aim of the project to analyze the online retail dataset to perform customer segmentation using RFM (Recency Frequency Monetary value) analysis and come up with business recommendations and marketing strategies for various customer segments.

Github


project name

Sales Forecasting using FB Prophet and SARIMAX models

The data set contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers. The goal of the project is perform sales forecasting using FB Prophet and SARIMAX time series models and evaluate them based on the Mean Absolute Percentage Error (MAPE).

Github


project name

Spatial Exploratory Data Analysis using Geopandas and Folium

The goal of this project is to perform a spatial mapping and analysis of the performance of robotic vacuum cleaners worldwide to provide business insights and design recommendations. The data consists of a series of wifi-connected robotic vacuum cleaners available for sale worldwide. These robots are capable of autonomously navigating a home to vacuum its floors. Upon mission completion, they send a summary report of the mission to cloud services, where it is processed and stored as a row in a database.

Github


project name

Cohort analysis using Python

The goal of this project is to analyze the data to show the daily active users over the month, calculate the daily retention curve for the cohort of users who used the app for the first time on specific dates, and determine if there are any differences in usage based on where the users came from. I utilized NumPy, Pandas, Matplotlib and Seaborn libraries in Python to perform cohort analysis, visualized the insights and provided business recommendations.

Github


project name

Pilot a ‘Systems of Care’ Data Infrastructure to Support State Prevention, Treatment and Safety Response Efforts

The goal of the project is to describe the services and resources, that when present in a community, provide the necessary elements to promote the successful recovery process of individuals living and engaging within that community.

Dashboard


project name

Develop a Community Capitals Data Infrastructure to Support Community Economic Mobility

Goal of this project is to utilize the Community Capitals Framework to identify, collect, and spatially map county-level indicators of human, financial, natural, and social assets related to upward economic mobility. Interactive dashboards are intended to be paired with CES administrative files, so leaders can visually identify opportunities to more effectively match resource/programming supports with community needs.

Dashboard


project name

Identify communities in greatest need of excessive alcohol prevention efforts in Iowa

Goals of the project:
a) Provide a county-level demographic profile of alcohol use in Iowa
b) Spatially analyze the alcohol using population in Iowa maps to visually represent the prevalence, density, and the distribution of the alcohol using and at-risk population by county.
c) Develop a Public Facing Dashboard to help monitoring of alcohol use by the Substance Abuse Bureau of the Iowa Department of Public Health.

Dashboard


project name

Perform sentiment analysis on the Iowa State University Extension community helpline services

The use of hotlines has increased dramatically with the onset of COVID-19. The goal of this project is to create a prototype of an interactive dashboard that performs sentiment analysis for the Iowa State University’s Cooperative Extension Services Hotline. This tool includes visualizations of call topics and outcomes that can be used by Cooperative Extension Services Hotline Team to perform real time analysis of call center patterns.

Dashboard


project name

Estimate customer wait-times at 18 major driving license stations across Iowa

Goal: The goal of this project is perform exploratory data analysis on the customer wait-times and come up with a predictive model using time-series forecasting to estimate the wait-times for live display in the official website of Iowa Department of Transportation for each Driving License (DL) Station across Iowa for each hour of the day.

Dashboard

Dashboard

Dashboard


project name

Bank Customer Churn Analysis

The goal of this project was to see which variables in the dataset played the biggest role in the customers exiting the bank. Examined the data through various plots and implmented ML classification models such as Logistic Regression, Naive Bayes, Decision Trees, RForst, XGBoost and validated using Grid Search CV to learn more about the bank's customers and what influences them leaving.

Github


project name

Implement ML classification algorithms to predict the Majors of college students

Given the historic student records, the goal of this project is to implement machine learning classification models to predict the majors of new students. Utilized SVM, Decision Tree, KNN, Random forest models to predict the student majors

Github


project name

Predict the readmission of diabetes patients

Goal of this project is to use predict whether a patient would be readmitted within 30 days or after 30 days or do not get readmitted at all. Performed data wrangling and data exploration. Implemented classification models and plotted an ROC curve to show the performance of different classification models

Github