Using A Netflix-Inspired Algorithm To Create A Recommendation Engine For Rock Climbers

Not sure what career in data is for you? 

Take the quizCoursesHow it WorksMentorsStudent SuccessBlog

Copyright 2021

CAREER TRACKS

RESOURCES

ABOUT US

GET SOCIAL

SCHOLARSHIPS

springboard medium
springboard instagram
springboard twitter
springboard linkedin
springboard facebook

ADDITIONAL SPRINGBOARD COURSES

Kristen Colley is a data scientist with a background in marketing and economics. One of her biggest professional achievements was creating a successful human behavior model out of seemingly unrelated numeric data. She now focuses on implementing analytics and machine learning solutions across a wide variety of enterprise problems. In her free time, she can be found climbing, reading non-fiction, or taking a new course on an upcoming technology.

Dipanjan (DJ) Sarkar is a data science lead and published author, and was recognized as a Google Developer Expert in Machine Learning by Google in 2019. Dipanjan has led advanced analytics initiatives working with Fortune 500 companies like Applied Materials and Intel. He works on leveraging data science, machine learning and deep learning to build large- scale intelligent systems. He was recognized as one of the top ten data scientists in India in 2020, and ‘40 under 40 Data Scientists’ in 2021.

About the Authors

As an avid rock climber, Kristen Colley counts reaching the summit of the Volcan Cayambe, an 18,996-foot peak in Ecuador, as one of her biggest personal achievements. She and her partner, who is also a climber, have conquered every type of terrain: ice, alpine, glacier. 

It’s not surprising, then, that when it was time for Colley to create her first capstone project while enrolled in the Data Science Career Track at Springboard, she immediately thought of ways she could use data science to help climbers plan their expeditions. “There’s no recommendation system where you can say ‘I’m this type of climber and I’m going to this area,’” she said. “You have to go to an app and look up the area, look through all this user-entered data to find out more about the route and what kind of equipment you’ll need. If it’s not on the app, you have to buy a local guide or just wing it.”

While the sport of rock climbing has grown exponentially (ranking 17th in 111 most popular sports in the U.S. in 2015), mountaineers go about route finding and navigation in a surprisingly quaint fashion: consulting guidebooks, local websites, studying maps, and reading reviews from other climbers. There are online websites where climbers can leave reviews of routes they’ve taken—think Yelp for the climbing community—but there is no simple way of receiving recommendations for the type of climber you are, gear requirements, grade (climbing difficulty) or other factors. 

Colley decided to create a recommendation system for climbers using the Single Value Decomposition (SVD) algorithm which was popularized by Simon Funk, a software developer who became renowned for his work in the Netflix Prize in 2006, a competition offering $1 million to the first individual or team to boost the accuracy of the company’s nascent movie recommendation system by 10%. A recommendation system would allow climbers to find routes better suited to their climbing experience, discover new routes, and provide inspiration for international rock climbing trips.

Recommendation systems help us make decisions every single day 

Recommendation systems have become a mainstay of modern digital life. They guide us towards certain products while we’re shopping online, furnish our newsfeeds on Facebook, Twitter and Instagram with posts we’re statistically predetermined to engage with, and serve us with personalized online ads. 

The core component of these systems is the recommender function, which synthesizes information about the user—stated preferences, usage habits, or third-party data—and predicts what rating the user might assign to a product. Some of these inferences are based on stated preferences, others are based on generalizations made about the demographic group we belong to. For example, if a Netflix viewer showed an interest in horror films, they might rate thrillers more favorably than romantic comedies. Or if a Facebook user falls into the 18-24 age range and has been searching for university degree programs, they might show interest in ads for student housing. 

Recommendation engines are different from search engines, which rely on search filters and semantics such as item descriptions, tags, category structure, and synonym recognition. Most existing rock climbing websites offer search engines to help climbers find routes—for example, Mountain Project allows users to search for routes by location, star rating, and terrain—but the results are not personalized. Additionally, climbers have to synthesize information from a range of sources to plan their route, taking into account variables such as humidity fluctuations, weather forecasts, areas that are in the shade versus exposed to the sun, and nearby options for food and supplies. Improper trip planning can result in wasted time, or worse, a serious safety hazard. 

Read more about how recommendation engines are getting smarter in this post.

Creating a recommendation system for the rock climbing industry

Colley used Kaggle data scraped from the website 8a.nu, one of the world’s largest databases of rock climbing routes focused on international climbing destinations, as a data source for her recommendation system. The database contains over four million entries of climbs and ratings entered by online users. From this data, Colley pulled a series of three SQLite tables for her analysis:

  1. Ascent table — 4 million users’ rock climbing entries. One entry per route climbed. 

  2. User table — 67,000 user profiles of climbers that logged the 4 million climbs

  3. Grade table — A reference table that translates the rock climbing grade to French or American rock difficulty ratings

Unfortunately, the data was exceptionally dirty as the submission forms for entering reviews had been poorly designed, consisting only of rich text fields without any formatting rules. There was also no spellcheck or validation once the user entered their data, resulting in entries containing discrepancies, misspellings, invalid characters, and other miscellaneous errors. “You literally typed in the country and route name, there were no dropdowns,” Colley explained. “You typed in ‘M’ for male, ‘F’ for female. So the data was very, very dirty.” 

The problems Colley described are common when working with user-entered data, but there are ways data collectors can obtain clean data by designing forms that guide the user through the data submission process. “If the data was automatically generated or there were predefined buckets or dropdowns or selected lists, the rate of error or bad data would be less,” said Dipanjan Sarkar, a data scientist, and Colley’s mentor during her Springboard course. 

Cleaning noisy, user-entered data to make it machine-readable

The average data scientist spends 45% of their time cleaning data, so it came as no surprise to Colley that data wrangling would be difficult. However, several mentors she spoke with warned her that the data was too dirty and complicated for her very first capstone project. 

“Dipanjan was the only one that encouraged me. He said, ‘you can do this,’” Colley said of her mentor. “Now looking back, I would still have chosen to tackle it, but being my first end-to-end data science project, I would have dropped it if it hadn’t been for him.”

Colley started cleaning the data by eliminating extraneous information. Essentially, she would only need three columns of data to feed the machine learning algorithm: ‘user_id,’ ‘item_id’ and ‘item_rating.’ The next step was data wrangling, which she carried out in three steps, starting with assigning standard naming conventions to the dataset to make it machine-readable.

1. Normalizing route names to ASCII (American Standard Code for Information Interchange) standards

Users would spell one route name in a variety of different ways—for example, “red rocks canyon” could be identified as “Red rock,” “red rocks,” or “red canyon.” Colley created a filter that would eliminate misspelled route names. The logic behind the algorithm was that if ‘X’ number of users entered the same route name, chances were good that it was the correct spelling. She tested a number of filters and found that the one with the greatest accuracy came from the dataset that filtered out any routes listed less than six times. 

2. Normalize the data

To eliminate invalid characters, Colley used regular expressions (also called regexes) — sequences of characters that specify  search patterns — in order to filter out accents and other special characters. She also filtered out invalid phrases such as “I don’t know,” “noname,” and “none.” 

The regexes in the code snippet above normalize the data and remove invalid characters. For example, the code would make all the columns lower-case and exchange &’ for ‘and.’ The code also removes the following items, which make for noisy data:

Spaces before, after or in between the names

  • Special characters (, . - ! ')

  • take away all accent marks in foreign language names

  • filter out any names that are less than three characters long to catch any fake names like "x", "8", or "na"

3. Group items by mode

After normalizing and cleaning all the columns, Colley created a three-tier ‘groupby’ system to sort items by mode. For example, a route listed twelve times was associated with Greece, but one user had incorrectly labeled the country as ‘USA.’ Colley grouped together three other indicator columns and computed the mode so she could catch user errors and improve the accuracy of the dataset. 

Colley then performed an exploratory data analysis to check that the data would be sufficient for her recommender system. She checked for things like the overall distribution of ratings, distribution of ratings given to each route, and overall distribution of ratings by user. An even distribution of ratings would give rise to a more accurate recommendation system.

Colley found that the majority of users provided only one rating and that there were more positive ratings (three on a scale of 1-3) than negative. When it comes to customer reviews in general, people are more likely to leave a review if they had either an exceptionally positive or exceptionally negative experience. Unfortunately, this can lead to imbalances in recommender systems that are based on user reviews. 

  • Highest rating (3 stars): 49% of reviews

  • Intermediate rating (2 stars): 36.5%

  • Lowest rating (1 star): 14%

Did You Know?

Outside of creating her recommendation system, Colley wanted to use the data to test a couple of well-known stereotypes in the climbing community. 

1. Do women tend to climb routes of the same difficulty as men?

Results: Women climb on average two grades lower than males. The hypothesis test showed this difference is not due to chance. 

2. Do taller people climb harder grades than shorter people?

There is a rock climbing stereotype that if you are taller, you are naturally better at climbing. Colley tested this hypothesis using height data. To define the boundaries of “short,” “average” and “tall,” Colley used the interquartile range of the population in the dataset, separating them by gender.

Female (in centimeters): 

#short= <158

#avg=158-170

#tall= >170

Male (in centimeters):

#short= <174

#avg= 174-183

# tall= >183

Results: Dataset shows that “short” people climb on average two grades higher than “tall” climbers and one grade higher than “average” height climbers. Thus, shorter climbers are better at climbing.

The next step for Colley was to decide what type of recommender system would be most suitable for her dataset. Recommendation systems work by establishing relationships between users and products, similar items, or similar users, which requires an established dataset of user preferences and/or user ratings. There are three types of recommender systems in use today:

1. Content-based filter—Recommending items to the user that have similar features to previously ‘liked’ items. The model is less accurate for new users that don’t have any history and grows more accurate over time.

2. Collaborative-based filter—Recommending products based on a similar user that has already rated the product. For this type of filter, it’s important to have a large explicit user rating base (doesn’t work well for a new customer base). 

3. Hybrid filter— Leverages both content and collaborative filtering. Typically, new users are given content filters and after a few interactions are switched to collaborative. 

Colley selected a user-based collaborative filtering system for her recommender. This type of filter made the most sense because of the amount of user ratings data she had—of the four million entries, half of them had a star rating for each climb. A hybrid approach would have been preferable, said Colley, but because the data submission forms had been poorly designed, the dataset did not have very detailed “item features,” making it difficult to provide an accurate content-based recommendation. 

Building the recommendations system 

To build her recommendation system, Colley used the Python Surprise Library scikit, a machine learning library for the Python programming language that is designed for building and testing recommendation systems. She tested her three filtered datasets on the 11 different algorithms provided in the scikit and found that the Single Value Decomposition++ (SVD++) algorithm performed the best. SVD is a data reduction tool for high-dimensional data, such as high-resolution images and video, or large datasets like the one Colley was using. SVD helps reduce data into key features that are necessary for analyzing, describing, and understanding the data. This type of algorithm is used for Google’s page rank, Facebook’s newsfeed, and recommender systems used by Amazon and Netflix. 

Colley selected the SVD++ algorithm because it returned the highest accuracy of predictions as indicated by the Root Mean Squared Error (RMSE). RMSE is one of the most common metrics used to measure accuracy for continuous variables, which represents the standard deviation of residuals (prediction errors). Residuals measure how far from the regression line data points are. In other words, it tells you how concentrated the data is around the line of best fit—in this case, how closely the model’s predicted ratings lined up with the actual ratings. A low RMSE (between 0.6 and 1.0) indicates that the residuals are close to the line of best fit and the model is making mostly accurate predictions. The SVD algorithm returned an RMSE of 0.66, falling within the range of acceptable values. 

The cold-start dilemma: how to give recommendations to new users

One problem with using collaborative filtering is determining what to recommend to new users with very little or no historical data—known as the cold-start problem—which is commonly encountered when building any type of recommendation system. Only after a user had left a certain number of reviews would Colley’s model have enough data to predict their future preferences. Setting a cold-start threshold means that any user under that threshold will be given an average top ten recommendation until they rate enough routes.

Implementing a threshold involves a trade-off: a higher threshold would theoretically result in more accurate predictions. However, given that the majority of users had only left one rating, setting a higher threshold would eliminate a significant portion of users from the dataset. 

Increasing the user threshold to five would increase the RMSE (a proxy for model accuracy) by a mere 0.005, while eliminating 40% of the data. Increasing the threshold to 13 would raise the RMSE by 0.0075, while losing 60% of the data. Colley decided that the RMSE improvement was too small to give up 40-60% of the data on which to train the model. Instead, she chose to keep some of the outliers to help the model train, and left the cold-start threshold at five.

Colley used a dataframe to compute the accuracy of the model’s predictions for each star rating. She did this by comparing predicted ratings from the model with the actual user rating. As expected, the data frame showed a discrepancy between the lowest rating (one star) and the highest rating (three stars). The graph shows that the model mischaracterized a higher proportion of one-star ratings than the two or three-star ratings, likely due to the fact that the initial dataset contained a disproportionately low number of one-star ratings, as mentioned before. However, the one-star ratings were not imperative to Colley’s analysis as the recommendation system would not display any routes with one-star ratings.

The final outcome

The final outcome of the project was a predictions notebook where a user could enter their user_id number and receive a list of top ten routes recommended to them based on recommendations made by climbers with a similar distribution of ratings, climb type, and number of ratings. These recommended routes would serve as a jumping-off point for climbers planning their next trip by helping to narrow their search.

This type of recommendation system could inspire climbers to discover new routes outside of their home country and under rock climbing trips internationally, thereby benefiting the tourism industry. Colley envisions that her recommendations system could inspire beginner climbers who don’t know where to start. “Right now, when you drive to a new area you’re just looking at lists,” she said. “To have a recommendation system tell you the top 10 routes from 1,000 routes in the area that it thinks you might like would be incredibly helpful and save a lot of time.” 

Financial institutions use recommendation systems to determine whether to approve an applicant for a loan and insurance companies use them to assess risk. However, recommendation systems also help people make decisions in their everyday lives. With sufficient data on user ratings and/or user preferences, recommendation systems can be made for any purpose.

“I wish there was a recommendation system to tell me the right data science articles to read,” said Sarkar. “It’s a mess out there with so much noise where everyone is publishing something, even if they’re not that knowledgeable.”

Future improvements

With nearly two years of experience as a data scientist at DISH Network, a satellite TV company, under her belt since graduating from Springboard, Colley said she sees a lot of potential ways she could improve her model now that she has more skills. One is spending more time creating a filter system where a climber can filter out the type of climb, difficulty, and country before receiving a list of recommendations, leading to more tailored results. If given the chance to redo her project with her new skills in data science, Colley said she would use natural language processing to clean the data rather than filtering out invalid characters using a cold-start threshold. 

“I just wasn’t ready to do a model within a model for my first project even though my model needed that,” she said. “Now that I have more experience, I know how to do those things.”

However, she is glad that she persevered in working with a dirty, complicated dataset for her first capstone project because it has given her the confidence to tackle other, more complex datasets and problem-solving ever since. 

“I’ve never worked a dataset that dirty since, and it’s given me confidence throughout my career since then that I can tackle whatever is thrown at me,” said Colley. “So having my mentor encourage me to persevere with this project and tackle such a hard dataset was a pretty pivotal moment in my career.”