Scraping NBA Odds
Web scraping NBA Asian Handicaps and Over/Unders from oddsportal.com to evaluate the accuracy of the NBA betting market.
In the final year of my Bachelor's degree in Statistical Sciences for Economics and Business, I had the opportunity to assist a university friend with his thesis.
As a passionate sports enthusiast, he decided to combine his love for sports with his interest in statistics, an ideal match, particularly in the context of betting markets.
His goal was to evaluate the precision of betting odds by comparing pre-match Asian Handicaps and Over/Unders to the actual match outcomes.
Asian Handicap
A form of betting in which one team is given a point advantage or disadvantage to level the playing field. A stronger team must win by more points for a bet on them to be successful.
For each match, every possible value of the Asian Handicap (AH) or Over/Under (O/U) is associated with two odds: one for the outcome being lower and one for it being higher. The AH or O/U point at which the two odds are equal, or nearly equal, represents the market's most balanced prediction.Over/Under
A type of bet that predicts the total combined score of both teams. Bettors wager on whether the actual score will be over or under a specified number.
Example
In Game 1 of the 2022 NBA Finals between the Golden State Warriors and the Boston Celtics, the AH odds were equal at -3.5 (1.90 each), implying the market's best estimate was that the Warriors would win by 3.5 points. Similarly, the O/U odds balanced at 214 points, indicating that the market predicted a total combined score of 214.The Objective
The aim was to compare the balanced AH and O/U values with the actual point differential and total score of the game to assess how accurate the betting market was.
The Scraper
Since no suitable dataset existed, I offered to help by building a web scraper. I had just completed the Scientific Computing with Python and Data Analysis with Python certifications from freeCodeCamp and was eager to apply my skills to a real-world project.The scraper, built with Selenium and BeautifulSoup, performs the following steps:
- Access
oddsportal.com
and set the desired time zone. - Loop through NBA seasons and their corresponding pages.
- For each match:
- Scrape the teams, final score, date, competition category (Pre-Season, Regular Season, Play-In, Play-Off), and link to the odds page.
- For each match's odds page:
- Scrape the Asian Handicap odds and identify the AH with the closest pair of odds.
- Scrape the Over/Under odds and identify the O/U with the closest pair of odds.
- Scrape quarter-by-quarter scores.
- Run a correction function to recover any missing or failed data.
The Final Dataset
The resulting dataset includes matches from the 2010/2011 to the 2021/2022 NBA seasons. It contains 15,868 observations across 12 variables:- Names: Teams involved in the match
- Link: URL to the odds page
- Score: Final score of the match
- Score.q: Scores at the end of each quarter
- OverTime: Boolean indicating if the game went to overtime
- Score.OT: Score at the beginning of overtime
- Date: Date of the match
- Category: Pre-Season, Regular Season, Play-In, or Play-Off
- AH: Asian Handicap with the most balanced odds
- diff.AH: Difference between the two closest AH odds
- O/U: Over/Under with the most balanced odds
- diff.OU: Difference between the two closest O/U odds