Create A CSV file From Triathlon Results Website¶
I completed the Outlaw Ironman in 2016, and wanted to extract the results to a csv file, before doing further analysis. I used BeautifulSoup, and iterated through all 22 pages of results from:https://resultsbase.net/event/3513/results?page=1
Import Libraries¶
In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
Main Code¶
In [2]:
base_url = "https://resultsbase.net/event/3513/results?page={}"
all_data_rows = []
for page_num in range(1, 23):
url = base_url.format(page_num)
print(f"Scraping data from page {page_num}...")
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
tables = soup.find_all('table', {'class': 'table table-striped smaller-on-mobiletable-striped-noborder'})
if tables:
data_rows = []
for table in tables:
rows = table.find_all('tr')
for row in rows[1:]:
columns = row.find_all('td')
if len(columns) == 11:
row_data = [column.get_text(strip=True) for column in columns]
data_rows.append(row_data)
all_data_rows.extend(data_rows)
else:
print(f"No results table found on page {page_num}")
Scraping data from page 1... Scraping data from page 2... Scraping data from page 3... Scraping data from page 4... Scraping data from page 5... Scraping data from page 6... Scraping data from page 7... Scraping data from page 8... Scraping data from page 9... Scraping data from page 10... Scraping data from page 11... Scraping data from page 12... Scraping data from page 13... Scraping data from page 14... Scraping data from page 15... Scraping data from page 16... Scraping data from page 17... Scraping data from page 18... Scraping data from page 19... Scraping data from page 20... Scraping data from page 21... Scraping data from page 22...
Build Dataframe¶
In [3]:
headers = ['Position', 'Bib Number', 'Participant', 'Category',
'Club/Company/Sponsor', 'Finish time', 'Swim Time',
'Transition1', 'Bike Time', 'Transition2', 'Run Time']
df = pd.DataFrame(all_data_rows, columns=headers)
Convert to CSV File¶
In [4]:
csv_filename = "Outlaw2016Results.csv"
df.to_csv(csv_filename, index=False)