6626070
2997924

AI02, Classification

Back to the previous pageMeachine learning
List of posts to read before reading this article


Contents


A wide variety of alternative algorithms for classification

  • Logistic regression(Regression-based)
  • KNN, k-nearest neighbor methods(Distance-based)
  • SVM, support vector machines(Distance-based)
  • Decision trees(Rule-based)
  • Random forest methods(Rule-based)
  • Naive Bayes(Probabilistic)





Implement with sklearn

Classification through logistic regression

Classification through logistic regression about iris-dataset

from sklearn import datasets
from sklearn import model_selection
from sklearn import linear_model
from sklearn import metrics

# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)

# classification for loaded dataset
classifier = linear_model.LogisticRegression()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)

# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))
OUTPUT

On the below confusion matrix matrix, the diagonals correspond to the number of samples that are correctly classified for each level of the category variable, and the off-diagonal elements are the number of incorrectly classified samples. More specifically, the element of the confusion matrix C is the number of samples of category i that were categorized as j.

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        14
           1       1.00      0.93      0.97        15
           2       0.94      1.00      0.97        16

    accuracy                           0.98        45
   macro avg       0.98      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45



[2 2 2 2 2 0 2 2 1 0 0 0 2 1 1 0 0 0 1 1 1 2 1 2 1 0 1 0 1 0 1 0 2 1 1 1 2
 2 0 0 1 1 0 2 0] 

[[12  0  0]
 [ 0 13  1]
 [ 0  1 18]]

SUPPLEMENT1
>>> from sklearn import datasets
>>> iris = datasets.load_iris() 

>>> type(iris) 
sklearn.utils.Bunch

>>> type(iris.data)
<class 'numpy.ndarray'>

>>> iris.target_names
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

>>> iris.feature_names 
['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

>>> iris.data.shape 
(150, 4)

>>> iris.target.shape 
(150,)

SUPPLEMENT2

iris dataset

import pandas as pd
from sklearn import datasets

iris = datasets.load_iris()
iris.feature_names.append('target_names')

df1 = pd.DataFrame(iris.data)
df2 = pd.DataFrame(iris.target)
df = pd.concat([df1,df2], axis=1)
df.columns = iris.feature_names

print(df)
     s.length (cm)  s.width (cm)  ...  p.width (cm)  target_names
0              5.1           3.5  ...           0.2             0
1              4.9           3.0  ...           0.2             0
2              4.7           3.2  ...           0.2             0
3              4.6           3.1  ...           0.2             0
4              5.0           3.6  ...           0.2             0
..             ...           ...  ...           ...           ...
145            6.7           3.0  ...           2.3             2
146            6.3           2.5  ...           1.9             2
147            6.5           3.0  ...           2.0             2
148            6.2           3.4  ...           2.3             2
149            5.9           3.0  ...           1.8             2

[150 rows x 5 columns]




Classification through logistic regression about dataset on real world download csv file of iris data

OUTPUT





Classification through k-nearest neighbor methods

Classification through k-nearest neighbor methods about iris-dataset

from sklearn import datasets
from sklearn import model_selection
from sklearn import neighbors
from sklearn import metrics

# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)

# classification for loaded dataset
classifier = neighbors.KNeighborsClassifier()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)

# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))
OUTPUT
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        17
           1       0.93      0.93      0.93        15
           2       0.92      0.92      0.92        13

    accuracy                           0.96        45
   macro avg       0.95      0.95      0.95        45
weighted avg       0.96      0.96      0.96        45



[1 1 2 2 0 2 1 2 0 1 2 0 2 1 0 2 2 0 0 0 2 2 0 2 2 2 1 1 1 1 2 2 0 2 0 0 2
 1 1 2 2 1 0 1 0] 

[[16  0  0]
 [ 0 14  2]
 [ 0  0 13]]




Classification through k-nearest neighbor methods about dataset on real world download csv file of iris data

OUTPUT





Classification through support vector machines

Classification through support vector machines about iris-dataset

from sklearn import datasets
from sklearn import model_selection
from sklearn import svm
from sklearn import metrics

# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)

# classification for loaded dataset
classifier = svm.SVC()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)

# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))
OUTPUT
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        17
           1       1.00      1.00      1.00        17
           2       1.00      1.00      1.00        11

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



[2 1 1 1 0 2 0 1 2 2 0 0 0 0 2 1 1 0 2 2 2 0 2 2 2 1 0 1 0 2 0 0 1 0 0 1 2
 2 1 0 0 2 2 2 1] 
 
[[12  0  0]
 [ 0 11  0]
 [ 0  7 15]]




Classification through support vector machines about dataset on real world download csv file of iris data

OUTPUT





Classification through decision trees

Classification through decision trees about iris-dataset

from sklearn import datasets
from sklearn import model_selection
from sklearn import tree 
from sklearn import metrics

# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)

# classification for loaded dataset
classifier = tree.DecisionTreeClassifier()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)

# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))
OUTPUT
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       0.92      0.92      0.92        13
           2       0.94      0.94      0.94        17

    accuracy                           0.96        45
   macro avg       0.95      0.95      0.95        45
weighted avg       0.96      0.96      0.96        45



[2 0 1 1 0 0 1 1 1 1 2 2 2 1 0 0 2 1 1 0 1 2 2 1 0 1 0 0 0 0 2 2 1 2 1 0 1
 2 1 2 0 2 0 2 0] 
 
[[16  0  0]
 [ 0 12  0]
 [ 0  2 15]]




Classification through decision trees about dataset on real world download csv file of iris data

import pandas as pd
from sklearn.tree import DecisionTreeClassifier

df = pd.read_csv(r'C:\Users\userd\Desktop\dataset\iris.csv')
df = df.sample(frac=1)    # Shuffle, frac=1 means return all rows in random order
y = df['variety']
X = df[['sepal.length', 'sepal.width', 'petal.length', 'petal.width']]

y_train = y[0:120]
y_test = y[120:150]
X_train = X[0:120]
X_test = X[120:150]

clf= DecisionTreeClassifier()
clf.fit(X_train,y_train)
clf.predict(X_test)
OUTPUT
array(['Virginica', 'Setosa', 'Virginica', 'Virginica', 'Virginica',
       'Versicolor', 'Setosa', 'Setosa', 'Setosa', 'Virginica',
       'Virginica', 'Virginica', 'Versicolor', 'Versicolor', 'Setosa',
       'Setosa', 'Setosa', 'Virginica', 'Setosa', 'Setosa', 'Versicolor',
       'Virginica', 'Setosa', 'Virginica', 'Setosa', 'Setosa',
       'Versicolor', 'Virginica', 'Virginica', 'Virginica'], dtype=object)


y_test
OUTPUT
array(['Virginica', 'Setosa', 'Virginica', 'Virginica', 'Virginica',
       'Versicolor', 'Setosa', 'Setosa', 'Setosa', 'Virginica',
       'Virginica', 'Virginica', 'Versicolor', 'Versicolor', 'Setosa',
       'Setosa', 'Setosa', 'Virginica', 'Setosa', 'Setosa', 'Versicolor',
       'Virginica', 'Setosa', 'Virginica', 'Setosa', 'Setosa',
       'Versicolor', 'Virginica', 'Virginica', 'Versicolor'], dtype=object)





Classification through random forest methods

Classification through random forest methods about iris-dataset

from sklearn import datasets
from sklearn import model_selection
from sklearn import ensemble
from sklearn import metrics

# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)

# classification for loaded dataset
classifier = ensemble.RandomForestClassifier()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)

# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))
OUTPUT
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      1.00      1.00        14
           2       1.00      1.00      1.00        16

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



[1 0 0 1 2 1 0 0 2 1 0 1 2 2 0 0 2 2 1 2 0 0 2 2 0 2 2 1 2 0 1 2 2 0 0 1 0
 0 0 1 2 2 1 0 0] 

[[17  0  0]
 [ 0 12  1]
 [ 0  3 12]]




Classification through random forest methods about dataset on real world download csv file of iris data

OUTPUT





The resulting classification accuracy for each classifier

The resulting classification accuracy for each classifier about iris-dataset

from sklearn import datasets
from sklearn import model_selection
from sklearn import linear_model
from sklearn import metrics
from sklearn import tree
from sklearn import neighbors
from sklearn import svm
from sklearn import ensemble

import matplotlib.pyplot as plt
import numpy as np



train_size_vec = np.linspace(0.1, 0.9, 30)
classifiers = [linear_model.LogisticRegression,
               neighbors.KNeighborsClassifier,
               svm.SVC,
               tree.DecisionTreeClassifier,
               ensemble.RandomForestClassifier]
cm_diags = np.zeros((3, len(train_size_vec), len(classifiers)), dtype=float)


iris = datasets.load_iris()
for n, train_size in enumerate(train_size_vec):
    X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=train_size)
    for m, Classifier in enumerate(classifiers):
        classifier = Classifier()
        classifier.fit(X_train, y_train)
        y_test_p = classifier.predict(X_test)
        cm_diags[:, n, m] = metrics.confusion_matrix(y_test, y_test_p).diagonal()
        cm_diags[:, n, m] /= np.bincount(y_test)


fig, axes = plt.subplots(1, len(classifiers), figsize=(12, 3))
for m, Classifier in enumerate(classifiers):
    axes[m].plot(train_size_vec, cm_diags[2, :, m], label=iris.target_names[2])
    axes[m].plot(train_size_vec, cm_diags[1, :, m], label=iris.target_names[1])
    axes[m].plot(train_size_vec, cm_diags[0, :, m], label=iris.target_names[0])
    axes[m].set_title(type(Classifier()).__name__)
    axes[m].set_ylim(0, 1.1)
    axes[m].set_ylabel("classification accuracy")
    axes[m].set_xlabel("training size ratio")
    axes[m].legend(loc=4)

plt.show()
OUTPUT

다운로드 (1)






Implement with tensorflow





Implement with pytorch





List of posts followed by this article


Reference