Data

The data consists of a series of wifi-connected robotic vacuum cleaners available for sale worldwide. These robots are capable of autonomously navigating a home to vacuum its floors. Upon mission completion, they send a summary report of the mission to cloud services, where it is processed and stored as a row in a database. However, any cleaning mission performed while the robot is not connected to wifi (either by user's choice or a faulty connection) will not be saved in the database. In addition, there are occasional periods where cloud services malfunction and no missions are reported, resulting in discrete periods of data loss.

These robots are programmed with an automatic recharge and resume function, which means that when the robot detects its battery levels reaching critically low levels, it will navigate back to the charging dock if available and charge for up to 90 minutes before resuming the mission. In addition, if a robot becomes stuck on an obstacle in its environment or is manually paused by a button press, it will cease cleaning for up to 90 minutes before terminating the mission. If the user restarts the mission with a button press within 90 minutes of the pause, the robot will continue cleaning normally. The number of minutes spent cleaning, charging, or paused are reported for each mission, as is the mission outcome (a field describing whether the mission was cancelled, the robot got stuck, the battery died, or the robot completed the job successfully).

Feature Engineering

Task 1.

Exploratory Data Analysis

Insights

We notice that most of the missions performed by the robots (195337 ok missions) were successful in completing the operation which is closely followed by the missions cancelled by the users (148890 cancelled missions). There are comparatively very less number of missions that were struck or bat (where the robot's battery got too low for it to return to the dock. Based on this chart, we shall understand that the overall performance of the robots is very good!

Insights

Based on the above chart, we notice that most of the missions happened on the Asia and European continents. Interestingly enough, in Asia, the count of successful missions (ok missions) and the count of missions cancelled by the users (cncl missions) are almost close. Whereas in Europe, the count of successful missions (ok missions) and the count of missions cancelled by the users (cncl missions) have a wide gap denoting that there was not much need for a human interference in the mission completion.

Now that there are most number of missions that were performed in the Asia and Europe continents, let us look in how different countries in those two continents different in terms of the robot usage.

Asia and Europe

Insights

Asia:

In Asia, as we can notice from the above charts and tables, most robots have been sold in Israel (6065 robots), and hence Israel also has the most number of missions that happened in a country. It is also useful to note than the number of missions cancelled by the user in Israel is more than the number of successful missions. The second closest is China.

Europe:

In Europe, we notice that France has had the most number of robots (487) robots and hence has the most missions as well. The one thing that could be noted is that there are more than 1000 missions that were stuck in France which is high. By investigating further, we could figure out why and how the Roomba robots are being stuck there. The other countries with more missions in Europe are Austria, Germany, Belgium, Russia, etc.

Insights

We notice that the Roomba robots are used the most during afternoons and used the least in the evenings.

Insights

We notice that the count of times the Roomba robots are used keeps steadily increasing through the year from January till December before dropping back again in January.

We could also factor in the weather patterns that cause the Roomba robots to run longer during a certain period of time as compared to the other times. This could be because usually, the initial months of the year (January through April) are colder than the later months of the year (September through December). One of the reasons for this difference in Roomba usage could be that, during cold months, people usually keep all the door and windows closed most of the time resulting in lesser dust being accumulated in the rooms. Therefore, lesser need to use the Roomba robots.

Insights

The count of missions is almost similar on each day of the week denoting that the day of the week is not a significant factor with respect to the performance of the robots!

Average run time, charge time and pause time of robots across contients

Insights

We notice that in general average runtime for the missions in which the battery got low is high across all the continents. And the average runtime for "ok" and "cncl" missions are almost the same in all the continents.

Business Recommendation

The average charge time is way too much in Asia as compared to other countries. And also, the missions that got terminated because of batter isssues seem to be higher even with a higher average charge time. So, looking into increasing the battery capacity of the robots sent to Asia could be a possible solution.

Plotting the geographic data on the world map

Insights

We notice that the average pause time for the "ok" missions is almost similar across continents except in Asia where is it higher. This could also be because the Roomba robots are used for a longer time in those countries which is a good sign overall.

Plotting the average runtime of robots across countries

Insights

We notice that the average runtime of robots is high in the USA, Canada, Europe, Russia, and southern part of South America regions. This could mean that the robots are used for a longer duration of time in these regions as opposed to countries such as China, India where the average runtime of robots is low.

Business Recommendation

When the robot is running for a longer period of time, it is natural to experience wear and tear for the Roomba robotic parts. Hence, based on the looking at the average runtime across countries, we could send out personalized offers to customers regarding replacement parts such as brushrolls, filter, side brush, wheels, sensors, etc. This makes sure that the robots are running smoothly.

Plotting the average charge time of robots across countries

Insights

We notice that the average charge time of robots is similar in almost all the countries except for some countries in Europe and some regions in Israel

Plotting the average pause time of robots across countries

Insights

We notice that the average pause time of robots is relatively medium in the USA, Canada, Europe, Russia, and part of South America regions. This could mean that the robots are paused for relatively lesser period of time as compared to countries such as Kazakhstan, Belarus. We may need to investigate further as to whether or not these paused robots were revived to complete the mission within 90 minutes or the mission got terminated due to pause time exceeding 90 minutes.

Plotting the average run time of "ok" missions across countries

Insights

We notice that there are many countries where there is high runtime for Roomba robots missions with "ok" outcome. The countries like Finland, Singapore have higher runtimes compared to other countries.

Plotting the average run time of cancelled missions across countries

Business Recommendation

From the above chart, we notice that the average runtime is more (around 47 minutes) in most of the countries for the missions that are cancelled by the users. This could denote that the there is a huge potential for robots in these countries since they are being used for a longer period of time. But since the missions are being cancelled by the users, following up with those particular segment of customers to figure out the reasons for cancellations could be very effective in generating design recommendations for future robots. Some of the reasons sound be high noise and frequent need for replacing bins.

Plotting the average run time of stuck missions across countries

Design recommendation

There are some counties such as Argentina, Chile, Kazhakstan, Greece, Hungary, Finland that stand out in having a high runtime before they get stuck on an obstacle. These are potential regions where the Roomba robots are running for a longer period of time before being stuck by obstacles. Whereas there are countries such as Nambia, Singapore where the robots get stuck quite quick and do not get recovered. So, by including a sensor in the Roomba robots to figure out what kinds of obstacles make the robot get stuck and not get revived back and comparing the robot navigation data in these two sets of regions could help us navigate the problem (pun intended :P)

Design recommendation

Based on the above graph, we can notice that there are many countries such as Singapore, Philippines, Guinea, The Great Britain where the average runtime is much more compared to the other countries. Hence increasing the battery capacity of the Roomba robots sent to these particular countries could be a solution to solving this problem.

Plotting the total number of robots present across countries

Insights

The number of robots sold in different countries seems to be relatively the same in most countries.

Successful vs unsuccessful missions across countries

Assumption: The assumption is that the missions with the outcome "ok" are successful and the missions with either of "stuck","bat","cncl" outcomes are unsuccesful
Pecentage of succesful missions per country = total number of missions with "ok" outcome in that country/total number of missions in that country
Pecentage of unsuccesful missions per country = total number of missions with "stuck","bat","cncl" outcome in that country/total number of missions in that country

Top 10 countries in which the mission outcome of the robots were OK, bat, struck, and cancelled manually

Insight

The one country that we see topping in all the 4 charts above is Israal. But this could be due to the fact that Israel has had the most number of robots (6065 robots as we calculated earlier).

Insights

The above charge shows that "nan" outcomes where the mission outcome is not known. This could be a part of data loss during transmission to the cloud. There are most nan outcomes in Asia. However, it is minimal in number (just 136 out of over 367679 records). Nevertheless, this is still a reason for concern and needs to be investigated further.

The daily mission counts for different outcomes

Insights

The above graph shows that the number of missions with "ok" and "cncl" outcomes has been increasing over time. But this could also be because the number of robots being sold keeps growing over the years. Nevertheless, this is a good sign for the company in terms of keeping up with the consistency in the efficiency of the robots. And the number of missions with "stuck" and "bat" outcomes have been almost constant or increasing very little over time which is also a good thing!

Task 2.

We are aware that data loss exists among the mission records, but are unsure of the cause. Quantify the extent of the loss, differentiating between discrete catastrophic events and random mission loss for individual robots.

Percentage of data loss for each country

Data loss definition:

We know that upon mission completion, the robots send a summary report of the mission to cloud services, where it is processed and stored as a row in a database. However, any cleaning mission performed while the robot is not connected to wifi (either by user's choice or a faulty connection) will not be saved in the database. In addition, there are occasional periods where cloud services malfunction and no missions are reported, resulting in discrete periods of data loss.

It is also given that the max mission number per robot should reflect its total number of missions to date reported to the database.

Assumption: So, we may perceive any missing mission numbers of the "nmssn" column as the missions that were not reported to the cloud services, thereby resulting in data loss.

Based on this assumption, by taking the percentage of missing missions to the total missions in a country, we would be able to find the percentage of data loss for a particular country.

We notice that Vietnam has a data loss percentage that is less than 0. This could be an anomaly or some data point could have been misrecorded. We may need to know more about how the data is being collected to further investigate why were there more reported missions than the total number of missions in Vietnam. But for now, we shall disregard this data point.

Insights

We notice that the percentage of data loss is pretty similar across the countries, around 40-50%. This is a cause for concern because almost half of the missions that are performed by the robot are not being recorded. Venezuela and Nambia are countries where there is a relatively significant data loss. Russia fares better in this regard and has the minimum data loss.

Design recommendations to reduce data loss:

Catastrophic loss ideation

Based on the above map, we can find the countries with the highest percentage of Roomba Robot missions that were not recorded in the database. One way that I perceive a catastrophic data loss is when a big chunk of missions is missing in tandem. For each country, we could find the ratio of consecutive missing mission numbers to the total missions performed by individual robots to know the percentage of consecutive missing missions in a particular country. Based on a threshold percentage of consecutive missing missions that the business stakeholders decide, we may classify a country as a potential candidate to further investigate the catastrophic loss to analyze whether this loss is uniform or random.



I enjoyed working on the data challenge!

Thank you very much for the opportunity!

Kishor Kumar Sridhar

kishorkumarsridhar@gmail.com