Support Vector Machine Introduction Practice¶

Import Libraries¶

In [1]:
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
from sklearn.datasets import make_circles
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

Create 300 data points (giving each point a classification of 0 or 1)¶

In [2]:
X, y = make_classification(n_samples=300, n_features=2, n_informative=2, n_redundant=0,
                           n_clusters_per_class=2, class_sep=1.5, random_state=1)

Plot The coordinates¶

In [3]:
# Plotting points with label 1
plt.scatter(X[y == 0, 0], X[y == 0, 1], c='red', label='Label: 0')

# Plotting points with label 1
plt.scatter(X[y == 1, 0], X[y == 1, 1], c='blue', label='Label: 1')


plt.title('Linearly Separable Data')
plt.xlabel('X coordinate')
plt.ylabel('Y coordinate')
plt.legend()
plt.grid(True)
plt.show()

Create SVC Model¶

In [4]:
classifier = SVC(kernel='linear')
classifier.fit(X, y)
Out[4]:
SVC(kernel='linear')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
SVC(kernel='linear')

Replot points (with Decision Boundary included)¶

In [5]:
# Plotting points with label 0
plt.scatter(X[y == 0, 0], X[y == 0, 1], c='red', label='Label: 0')

# Plotting points with label 1
plt.scatter(X[y == 1, 0], X[y == 1, 1], c='blue', label='Label 1')

# Creating meshgrid to plot decision boundary
x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))

# Predicting labels for each point in meshgrid
Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])

# Plotting decision boundary
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.2, cmap='viridis')

plt.title('Linearly Separable Data - with SVC Decision Boundary')
plt.xlabel('X coordinate')
plt.ylabel('Y coordinate')
plt.legend()
plt.grid(True)
plt.show()

Score the Model¶

In [6]:
print(classifier.score(X, y))
0.96

Repeated Experiment but with non-linear data¶

Create 300 datapoints with 2 classifications (in the shape of the circle)¶

In [7]:
points, labels = make_circles(n_samples=300, factor=.4, noise=.05)
In [8]:
# Plotting points with label 0
plt.scatter(points[labels == 0, 0], points[labels == 0, 1], c='red', label='Label: 0')

# Plotting points with label 1
plt.scatter(points[labels == 1, 0], points[labels == 1, 1], c='blue', label='Label: 1')

plt.title('Non-Linear Classification Model')
plt.xlabel('X coordinate')
plt.ylabel('Y coordinate')
plt.legend()
plt.grid(True)
plt.show()

This Time I'm Going to Train_Test_Split the Data¶

In [9]:
training_data, validation_data, training_labels, validation_labels = train_test_split(points, labels, train_size = 0.8, test_size = 0.2, random_state = 100)

Using A Linear Kernel, The Accuracy of the Model Is Not Good¶

In [10]:
classifier = SVC(kernel = "linear")
classifier.fit(training_data, training_labels)
print(classifier.score(validation_data, validation_labels))
0.5666666666666667

Changing the Kernel To 'Poly', Gives Excellent Accuracy¶

In [11]:
classifier = SVC(kernel = 'poly', degree = 2)
classifier.fit(training_data, training_labels)
print(classifier.score(validation_data, validation_labels))
1.0

Replot Points With Decision Boundary¶

In [12]:
# The same code as above

plt.figure(figsize=(8, 6))

plt.scatter(points[labels == 0, 0], points[labels == 0, 1], c='red', label='Label: 0')

plt.scatter(points[labels == 1, 0], points[labels == 1, 1], c='blue', label='Label: 1')

plt.title('Non-Linear Classification Model (with Decision Boundary)')
plt.xlabel('X coordinate')
plt.ylabel('Y coordinate')
plt.legend()
plt.grid(True)


# Plotting the decision boundary to demonstrate the classification

x_min, x_max = points[:, 0].min() - 0.1, points[:, 0].max() + 0.1
y_min, y_max = points[:, 1].min() - 0.1, points[:, 1].max() + 0.1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))

Z = classifier.predict(np.c_[xx.ravel(), yy.ravel()])

Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.2, cmap='viridis')
plt.show()

Explanation As To the Decision Boundary Being Found.¶

First, All Points Are Converted to 3D (using the mapping (x, y)→(2^0.5xy, x^2, y^2)¶

In [13]:
new_points = [[2 ** 0.5 * pt[0] * pt[1], pt[0] ** 2, pt[1] ** 2] for pt in points]
In [14]:
# print(new_points)

This Allows The Non-Linearly Separable Coordinate To Be Divided By a Plane Decision Boundary¶

In [15]:
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')

new_points = np.array(new_points)
labels = np.array(labels)

# Plot points with label 0
ax.scatter(new_points[labels == 0, 0], new_points[labels == 0, 1], new_points[labels == 0, 2], c='red', label='Label: 0')

# Plot points with label 1
ax.scatter(new_points[labels == 1, 0], new_points[labels == 1, 1], new_points[labels == 1, 2], c='blue', label='Label: 1')

ax.set_xlabel('X coordinate')
ax.set_ylabel('Y coordinate')
ax.set_zlabel('Z coordinate')
plt.title('3D Scatter Plot To Easily Visualise Classification Boundary')
plt.legend()
plt.show()
In [ ]: