AI02, Classification
Back to the previous page|Meachine learning
List of posts to read before reading this article
Contents
- A wide variety of alternative algorithms for classification
- Implement with sklearn
- Implement with tensorflow
- Implement with pytorch
A wide variety of alternative algorithms for classification
- Logistic regression(Regression-based)
- KNN, k-nearest neighbor methods(Distance-based)
- SVM, support vector machines(Distance-based)
- Decision trees(Rule-based)
- Random forest methods(Rule-based)
- Naive Bayes(Probabilistic)
Implement with sklearn
Classification through logistic regression
Classification through logistic regression about iris-dataset
from sklearn import datasets
from sklearn import model_selection
from sklearn import linear_model
from sklearn import metrics
# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)
# classification for loaded dataset
classifier = linear_model.LogisticRegression()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)
# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))
OUTPUT
On the below confusion matrix matrix, the diagonals correspond to the number of samples that are correctly classified for each level of the category variable, and the off-diagonal elements are the number of incorrectly classified samples. More specifically, the element of the confusion matrix C is the number of samples of category i that were categorized as j.
precision recall f1-score support
0 1.00 1.00 1.00 14
1 1.00 0.93 0.97 15
2 0.94 1.00 0.97 16
accuracy 0.98 45
macro avg 0.98 0.98 0.98 45
weighted avg 0.98 0.98 0.98 45
[2 2 2 2 2 0 2 2 1 0 0 0 2 1 1 0 0 0 1 1 1 2 1 2 1 0 1 0 1 0 1 0 2 1 1 1 2
2 0 0 1 1 0 2 0]
[[12 0 0]
[ 0 13 1]
[ 0 1 18]]
SUPPLEMENT1
>>> from sklearn import datasets
>>> iris = datasets.load_iris()
>>> type(iris)
sklearn.utils.Bunch
>>> type(iris.data)
<class 'numpy.ndarray'>
>>> iris.target_names
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
>>> iris.feature_names
['sepal length (cm)',
'sepal width (cm)',
'petal length (cm)',
'petal width (cm)']
>>> iris.data.shape
(150, 4)
>>> iris.target.shape
(150,)
SUPPLEMENT2
iris dataset
import pandas as pd
from sklearn import datasets
iris = datasets.load_iris()
iris.feature_names.append('target_names')
df1 = pd.DataFrame(iris.data)
df2 = pd.DataFrame(iris.target)
df = pd.concat([df1,df2], axis=1)
df.columns = iris.feature_names
print(df)
s.length (cm) s.width (cm) ... p.width (cm) target_names
0 5.1 3.5 ... 0.2 0
1 4.9 3.0 ... 0.2 0
2 4.7 3.2 ... 0.2 0
3 4.6 3.1 ... 0.2 0
4 5.0 3.6 ... 0.2 0
.. ... ... ... ... ...
145 6.7 3.0 ... 2.3 2
146 6.3 2.5 ... 1.9 2
147 6.5 3.0 ... 2.0 2
148 6.2 3.4 ... 2.3 2
149 5.9 3.0 ... 1.8 2
[150 rows x 5 columns]
Classification through logistic regression about dataset on real world
download csv file of iris data
OUTPUT
Classification through k-nearest neighbor methods
Classification through k-nearest neighbor methods about iris-dataset
from sklearn import datasets
from sklearn import model_selection
from sklearn import neighbors
from sklearn import metrics
# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)
# classification for loaded dataset
classifier = neighbors.KNeighborsClassifier()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)
# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))
OUTPUT
precision recall f1-score support
0 1.00 1.00 1.00 17
1 0.93 0.93 0.93 15
2 0.92 0.92 0.92 13
accuracy 0.96 45
macro avg 0.95 0.95 0.95 45
weighted avg 0.96 0.96 0.96 45
[1 1 2 2 0 2 1 2 0 1 2 0 2 1 0 2 2 0 0 0 2 2 0 2 2 2 1 1 1 1 2 2 0 2 0 0 2
1 1 2 2 1 0 1 0]
[[16 0 0]
[ 0 14 2]
[ 0 0 13]]
Classification through k-nearest neighbor methods about dataset on real world
download csv file of iris data
OUTPUT
Classification through support vector machines
Classification through support vector machines about iris-dataset
from sklearn import datasets
from sklearn import model_selection
from sklearn import svm
from sklearn import metrics
# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)
# classification for loaded dataset
classifier = svm.SVC()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)
# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))
OUTPUT
precision recall f1-score support
0 1.00 1.00 1.00 17
1 1.00 1.00 1.00 17
2 1.00 1.00 1.00 11
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
[2 1 1 1 0 2 0 1 2 2 0 0 0 0 2 1 1 0 2 2 2 0 2 2 2 1 0 1 0 2 0 0 1 0 0 1 2
2 1 0 0 2 2 2 1]
[[12 0 0]
[ 0 11 0]
[ 0 7 15]]
Classification through support vector machines about dataset on real world
download csv file of iris data
OUTPUT
Classification through decision trees
Classification through decision trees about iris-dataset
from sklearn import datasets
from sklearn import model_selection
from sklearn import tree
from sklearn import metrics
# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)
# classification for loaded dataset
classifier = tree.DecisionTreeClassifier()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)
# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))
OUTPUT
precision recall f1-score support
0 1.00 1.00 1.00 15
1 0.92 0.92 0.92 13
2 0.94 0.94 0.94 17
accuracy 0.96 45
macro avg 0.95 0.95 0.95 45
weighted avg 0.96 0.96 0.96 45
[2 0 1 1 0 0 1 1 1 1 2 2 2 1 0 0 2 1 1 0 1 2 2 1 0 1 0 0 0 0 2 2 1 2 1 0 1
2 1 2 0 2 0 2 0]
[[16 0 0]
[ 0 12 0]
[ 0 2 15]]
Classification through decision trees about dataset on real world
download csv file of iris data
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
df = pd.read_csv(r'C:\Users\userd\Desktop\dataset\iris.csv')
df = df.sample(frac=1) # Shuffle, frac=1 means return all rows in random order
y = df['variety']
X = df[['sepal.length', 'sepal.width', 'petal.length', 'petal.width']]
y_train = y[0:120]
y_test = y[120:150]
X_train = X[0:120]
X_test = X[120:150]
clf= DecisionTreeClassifier()
clf.fit(X_train,y_train)
clf.predict(X_test)
OUTPUT
array(['Virginica', 'Setosa', 'Virginica', 'Virginica', 'Virginica',
'Versicolor', 'Setosa', 'Setosa', 'Setosa', 'Virginica',
'Virginica', 'Virginica', 'Versicolor', 'Versicolor', 'Setosa',
'Setosa', 'Setosa', 'Virginica', 'Setosa', 'Setosa', 'Versicolor',
'Virginica', 'Setosa', 'Virginica', 'Setosa', 'Setosa',
'Versicolor', 'Virginica', 'Virginica', 'Virginica'], dtype=object)
y_test
OUTPUT
array(['Virginica', 'Setosa', 'Virginica', 'Virginica', 'Virginica',
'Versicolor', 'Setosa', 'Setosa', 'Setosa', 'Virginica',
'Virginica', 'Virginica', 'Versicolor', 'Versicolor', 'Setosa',
'Setosa', 'Setosa', 'Virginica', 'Setosa', 'Setosa', 'Versicolor',
'Virginica', 'Setosa', 'Virginica', 'Setosa', 'Setosa',
'Versicolor', 'Virginica', 'Virginica', 'Versicolor'], dtype=object)
Classification through random forest methods
Classification through random forest methods about iris-dataset
from sklearn import datasets
from sklearn import model_selection
from sklearn import ensemble
from sklearn import metrics
# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)
# classification for loaded dataset
classifier = ensemble.RandomForestClassifier()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)
# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))
OUTPUT
precision recall f1-score support
0 1.00 1.00 1.00 15
1 1.00 1.00 1.00 14
2 1.00 1.00 1.00 16
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
[1 0 0 1 2 1 0 0 2 1 0 1 2 2 0 0 2 2 1 2 0 0 2 2 0 2 2 1 2 0 1 2 2 0 0 1 0
0 0 1 2 2 1 0 0]
[[17 0 0]
[ 0 12 1]
[ 0 3 12]]
Classification through random forest methods about dataset on real world
download csv file of iris data
OUTPUT
The resulting classification accuracy for each classifier
The resulting classification accuracy for each classifier about iris-dataset
from sklearn import datasets
from sklearn import model_selection
from sklearn import linear_model
from sklearn import metrics
from sklearn import tree
from sklearn import neighbors
from sklearn import svm
from sklearn import ensemble
import matplotlib.pyplot as plt
import numpy as np
train_size_vec = np.linspace(0.1, 0.9, 30)
classifiers = [linear_model.LogisticRegression,
neighbors.KNeighborsClassifier,
svm.SVC,
tree.DecisionTreeClassifier,
ensemble.RandomForestClassifier]
cm_diags = np.zeros((3, len(train_size_vec), len(classifiers)), dtype=float)
iris = datasets.load_iris()
for n, train_size in enumerate(train_size_vec):
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=train_size)
for m, Classifier in enumerate(classifiers):
classifier = Classifier()
classifier.fit(X_train, y_train)
y_test_p = classifier.predict(X_test)
cm_diags[:, n, m] = metrics.confusion_matrix(y_test, y_test_p).diagonal()
cm_diags[:, n, m] /= np.bincount(y_test)
fig, axes = plt.subplots(1, len(classifiers), figsize=(12, 3))
for m, Classifier in enumerate(classifiers):
axes[m].plot(train_size_vec, cm_diags[2, :, m], label=iris.target_names[2])
axes[m].plot(train_size_vec, cm_diags[1, :, m], label=iris.target_names[1])
axes[m].plot(train_size_vec, cm_diags[0, :, m], label=iris.target_names[0])
axes[m].set_title(type(Classifier()).__name__)
axes[m].set_ylim(0, 1.1)
axes[m].set_ylabel("classification accuracy")
axes[m].set_xlabel("training size ratio")
axes[m].legend(loc=4)
plt.show()
OUTPUT
Implement with tensorflow
Implement with pytorch
List of posts followed by this article
Reference