• Led two Data Science projects to implement an end-to-end Data Science framework using statistical modeling and analyses
• Developed an R Shiny dashboard to improve public awareness of Iowa's resources related to mental and physical
health and childcare for evidence-based policy-making in substance abuse prevention & recovery programs
• Performed data collection by web-scraping, built data pipelines, assessed data quality, and spatially mapped county-level composite indicators of Social and Natural Assets related to upward economic mobility
map county-level composite indicators of Social and Natural Assets related to upward economic mobility
• Communicated analytical findings to non-technical audience using R Shiny dashboard for assessing
community well-being in Iowa
• Performed data-wrangling using Tidyverse, Dplyr package in R and utilized GoogleMaps API with R and QGIS
to map the physical addresses to Latitude and Longitude co-ordinates
• Conducted Exploratory Data Analysis on the Behavioral Risk Factor Surveillance System (BRFSS) data
to uncover insights on the binge drinking problem in Iowa
• Performed time series forecasting to estimate alcohol sales in Iowa and spatially segmented high-risk
alcohol using population
• Created a prototype of an interactive dashboard that performs sentiment analysis (NLP) on generated transcripts
from Hotlines to support improved customer service and auto-generate reports
• Implemented time series forecasting using FB Prophet model to estimate customer wait-times for
Live display on DOT website
• Predicted the deterioration of pavement conditions using Kolmogorov–Smirnov Test by conducting
hypothesis significance testing and statistical modeling based on various KPIs
• Utilized Natural Language Processing (NLP) algorithms such as LDA for topic modeling with a
coherence score of 0.45 on 10,000 public feedbacks to identify factors governing highway maintenance
• Used Python and Unix scripts to read and write on HDFS and analyzed high volume of data using
Hadoop and Spark
• Worked with Hadoop clusters of 4TB size to analyze data using AWS EMR and reduced data storage by 75%
using Parquet files
• Performed statistical analysis using MLlib in PySpark for identifying correlation between crash and weather conditions
• Improved the performance of application by 50% through Apache Spark ETL processes by transforming RDD to Spark DataFrames
• Developed efficient SQL scripts for data cleansing, transformation and performed data modeling for
ad-hoc analysis and reporting
• Conducted variance analyses within departments in Tableau to identify spending gaps and story lined
the results to stakeholders
• Performed factor analysis, What-If analysis on purchase trends of departments facilitating 20% savings
in vendor contract renewals
• Assisted the Procurement Services Department of Iowa State University in making better purchase decisions
through insightful Tableau dashboards
Data Analyst Torus ( Aug 2016- Oct 2017)
• Automated insurance claims processing systems with integrated document management to reduce the
turn-around time by 30%
• Handled MongoDB database activities such as locking, transactions, indexes, Sharding, replication for advanced analytics
• Assisted in planning and implementation of experimental design for A/B testing to improve conversion rate
based on various KPIs
• Analyzed and manipulated data in SQL for client requests and presented findings using interactive
Power BI visualizations
Programmer Analyst
Infosys (Feb 2014 - July 2016)
• Performed extensive data governance activities using Enablon EHS tool for a data migration project to
modernize legacy systems
• Improved database performance by reducing query execution time by 50% using efficient SQL queries
• Communicated as a liaison between the client and technical team to deliver key business solutions
using Agile methodologies
• Leveraged Microsoft SSIS to transform data for ETL processes
• Coordinated with QA teams to execute UAT cycles