I used the California Housing Dataset (already built into SKLearn) to pratice L1 and L2 Regression. This dataset contains features of houses and thier sold prices.
I first used to a simple regression model, then compared the coefficients of the model once L1 and L2 Regression had been performed.
Regularization penalizes models for overfitting by adding a “penalty term” to the loss function.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
housing_df = pd.DataFrame(data=housing.data, columns=housing.feature_names)
housing_df['Median Price (in $100k)'] = housing.target
housing_df.head()
MedInc | HouseAge | AveRooms | AveBedrms | Population | AveOccup | Latitude | Longitude | Median Price (in $100k) | |
---|---|---|---|---|---|---|---|---|---|
0 | 8.3252 | 41.0 | 6.984127 | 1.023810 | 322.0 | 2.555556 | 37.88 | -122.23 | 4.526 |
1 | 8.3014 | 21.0 | 6.238137 | 0.971880 | 2401.0 | 2.109842 | 37.86 | -122.22 | 3.585 |
2 | 7.2574 | 52.0 | 8.288136 | 1.073446 | 496.0 | 2.802260 | 37.85 | -122.24 | 3.521 |
3 | 5.6431 | 52.0 | 5.817352 | 1.073059 | 558.0 | 2.547945 | 37.85 | -122.25 | 3.413 |
4 | 3.8462 | 52.0 | 6.281853 | 1.081081 | 565.0 | 2.181467 | 37.85 | -122.25 | 3.422 |
X = housing.data
y = (housing.target > 2.0).astype(int) # Binary classification: True if target > 2.0, False otherwise
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
feature_names = ['MedInc','HouseAge','AveRooms','AveBedrms','Population','AveOccup','Latitude','Longitude']
model = LogisticRegression()
model.fit(X_train_scaled, y_train)
LogisticRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LogisticRegression()
coefficients = model.coef_[0]
plt.figure(figsize=(10, 6))
plt.bar(feature_names, coefficients)
plt.ylabel('Coefficient')
plt.title('Coefficients of Logistic Regression Model')
plt.show()
Regularizaion addresses overfitting... a model that is performs well in training data but performs poor in test data (due to it being over-complex). We get rid of less useful features by turning the coefficient of the least important variables to 0 (automatic feature selection). The model is now simplier, and less prone to overfitting. https://www.youtube.com/watch?v=LmpBt0tenJE
lasso = Lasso()
param_grid = {
'alpha':[0.00001, 0.0001,0.001,0.01,0.1,1,10,100]
}
lasso_cv = GridSearchCV(lasso, param_grid, cv=3, n_jobs = -1)
lasso_cv.fit(X_train,y_train)
GridSearchCV(cv=3, estimator=Lasso(), n_jobs=-1, param_grid={'alpha': [1e-05, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100]})In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
GridSearchCV(cv=3, estimator=Lasso(), n_jobs=-1, param_grid={'alpha': [1e-05, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100]})
Lasso()
Lasso()
lasso_cv.best_estimator_
Lasso(alpha=1e-05)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Lasso(alpha=1e-05)
lasso1 = Lasso(alpha = 0.00001)
lasso1.fit(X_train, y_train)
Lasso(alpha=1e-05)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Lasso(alpha=1e-05)
plt.figure(figsize=(10, 6))
plt.bar(feature_names, lasso1.coef_)
plt.ylabel('Coefficient')
plt.title('Coefficients of Model (with L1 Regularization)')
plt.show()
In this model, coefficients of less useful features are reduced - but do not always go to 0.
ridge = Ridge()
param_grid = {
'alpha':[0.00001, 0.0001,0.001,0.01,0.1,1,10,100]
}
ridge_cv = GridSearchCV(ridge, param_grid, cv=3, n_jobs = -1)
ridge_cv.fit(X_train,y_train)
GridSearchCV(cv=3, estimator=Ridge(), n_jobs=-1, param_grid={'alpha': [1e-05, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100]})In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
GridSearchCV(cv=3, estimator=Ridge(), n_jobs=-1, param_grid={'alpha': [1e-05, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100]})
Ridge()
Ridge()
ridge_cv.best_estimator_
Ridge(alpha=1e-05)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Ridge(alpha=1e-05)
ridge1 = Ridge(alpha = 0.00001)
ridge1.fit(X_train, y_train)
Ridge(alpha=1e-05)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Ridge(alpha=1e-05)
plt.figure(figsize=(10, 6))
plt.bar(feature_names, ridge1.coef_)
plt.ylabel('Coefficient')
plt.title('Coefficients of Model (with L2 Regularization)')
plt.show()