In this short lesson, I will show you how to perform Logistic Regression in Python. This would be very easy. An you will have all the codes.
These are the steps:
Step 1: Import the required modules
We would import the following modules:
make_classification: available in sklearn.datasets and used to generate dataset
matplotlib.pyplot: for plotting
LogisticRegression: this is imported from sklearn.linear_model. Used for performing logistic regression
train_test_split: imported from sklearn.model_selection and used to split dataset into training and test datasets
confusion matrix: imported from sklearn.metrics and used to generate the confusion matrix of the classifiers
Pandas for managing datasets.
The complete import statement is given below:
from sklearn.datasets import make_classification from matplotlib import pyplot as plt from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.metrics import confusion_matrix import pandas as pd
Step 2: Generate the dataset
Now you need to generate the dataset using the make_classification() function. You need to specify the number of samples, the number of feature, number of classes and other parameters.
The code for the make_classification is given below:
# Generate and dataset for Logistic Regression x, y = make_classification( n_samples=100, n_features=1, n_classes=2, n_clusters_per_class=1, flip_y=0.03, n_informative=1, n_redundant=0, n_repeated=0 )
Step 3: Visualize the Data
Now we would create a simple scatter plot just to see how the data looks like. The code and the output is given below:
# Create a scatter plot plt.scatter(x, y, c=y, cmap='rainbow') plt.title('Scatter Plot of Logistic Regression') plt.show()
Step 4: Split the Dataset
Now we would split the dataset into training dataset and test dataset. The training dataset is used to train the model while the test dataset is used to test the model’s performance on new data.
# Split the dataset into training and test dataset x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=1)
Step 5: Perform Logistic Regression
Here we would create a LogistiRegression object and fit it with out dataset. (kind of similar to Linear Regression)
# Create a Logistic Regression Object, perform Logistic Regression log_reg = LogisticRegression() log_reg.fit(x_train, y_train)
The logistic regression output is given below:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
You can view the logistic regression coefficient and intercept using the code below:
# Show to Coeficient and Intercept print(lr.coef_) print(lr.intercept_)
Step 6: Make prediction using the model
We now use the model to predict the outputs given the test dataset.
# Perform prediction using the test dataset y_pred = lr.predict(x_test)
(you can view the predicted values using print(y_pred)
Step 7: Display the Confusion Matrix
The confusion matrix helps you to see how the model performed. It tells you the number of True positives, true negatives, false positives and false negatives. To see the confusion matrix, use:
# Show the Confusion Matrix
confusion_matrix(y_test, y_pred)
The output is:
array([[13, 1], [ 0, 11]], dtype=int64)
We can deduce from the confusion matrix that:
# True positive: 13 (upper-left) – Number of positives we predicted correctly
# True negative: 11(lower-right) – Number of negatives we predicted correctly
# False positive: 1 (top-right) – Number of positives we predicted wrongly
# False negative: 0(lower-left) – Number of negatives we predicted wrongly
Thanks for reading. You can find the video lesson below:
[…] How to Perform Logistic Regression in Python(Step by Step) […]
Excelent class
Thanks
this was awesome
minor code issue, “lr” should change to “log_reg”
we r not gtting output when we perform step 5
can u please tell us why