Practice: Simple Linear Regression¶

Using the LinearRegression class to find a relationship between the Horse Power of a car and its retail price.¶

Import Libraries, File and Inspect Data¶

In [1]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
In [2]:
df = pd.read_csv('CarDataset.csv')
In [3]:
df.head()
Out[3]:
Make Model Year Engine Fuel Type Engine HP Engine Cylinders Transmission Type Driven_Wheels Number of Doors Market Category Vehicle Size Vehicle Style highway MPG city mpg Popularity MSRP
0 BMW 1 Series M 2011 premium unleaded (required) 335.0 6.0 MANUAL rear wheel drive 2.0 Factory Tuner,Luxury,High-Performance Compact Coupe 26 19 3916 46135
1 BMW 1 Series 2011 premium unleaded (required) 300.0 6.0 MANUAL rear wheel drive 2.0 Luxury,Performance Compact Convertible 28 19 3916 40650
2 BMW 1 Series 2011 premium unleaded (required) 300.0 6.0 MANUAL rear wheel drive 2.0 Luxury,High-Performance Compact Coupe 28 20 3916 36350
3 BMW 1 Series 2011 premium unleaded (required) 230.0 6.0 MANUAL rear wheel drive 2.0 Luxury,Performance Compact Coupe 28 18 3916 29450
4 BMW 1 Series 2011 premium unleaded (required) 230.0 6.0 MANUAL rear wheel drive 2.0 Luxury Compact Convertible 28 18 3916 34500
In [4]:
df.dtypes
Out[4]:
Make                  object
Model                 object
Year                   int64
Engine Fuel Type      object
Engine HP            float64
Engine Cylinders     float64
Transmission Type     object
Driven_Wheels         object
Number of Doors      float64
Market Category       object
Vehicle Size          object
Vehicle Style         object
highway MPG            int64
city mpg               int64
Popularity             int64
MSRP                   int64
dtype: object

Remove NaN rows¶

In [5]:
df.dropna(inplace=True)

Create an Array of Horse Power Values¶

In [6]:
HP = df['Engine HP']
In [7]:
HP_array = HP.to_numpy()
In [8]:
HP_array = HP_array.reshape(-1, 1)

Create an Array of Price of Car Values¶

In [9]:
Price = df['MSRP']
In [10]:
Price.dropna(inplace=True)
In [11]:
Price_array = Price.to_numpy()
In [12]:
Price_array = Price_array.reshape(-1, 1)

Plot HP Vs. Price¶

In [13]:
plt.plot(HP_array,Price_array,'o')
plt.axis([0,800,0,100000])
plt.ylabel('''Retail Price (£)''')
plt.xlabel('Engine: Horse Power')
plt.show()

Fit a Linear Regression Model to This Data¶

In [14]:
line_fitter = LinearRegression()
In [15]:
line_fitter.fit(HP_array, Price_array)
Out[15]:
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
In [16]:
PricePredict = line_fitter.predict(HP_array)
plt.plot(HP_array,Price_array,'o')
plt.plot(HP_array, PricePredict)
plt.axis([0,800,0,100000])
plt.ylabel('''Retail Price (£)''')
plt.xlabel('Engine: Horse Power')
plt.show()

Equation of the Line¶

In [17]:
gradient = line_fitter.coef_[0]
y_intercept = line_fitter.intercept_

print("Gradient:", gradient)
print("Y-Intercept:", y_intercept)

print("Cost of Car = ", gradient, " * Horsepower + ", y_intercept)
Gradient: [401.36907559]
Y-Intercept: [-60160.43766107]
Cost of Car =  [401.36907559]  * Horsepower +  [-60160.43766107]
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]: