AI02, Classification

Back to the previous page｜Meachine learning
List of posts to read before reading this article

A wide variety of alternative algorithms for classification
Implement with sklearn
Implement with tensorflow
Implement with pytorch

A wide variety of alternative algorithms for classification

Logistic regression(Regression-based)
KNN, k-nearest neighbor methods(Distance-based)
SVM, support vector machines(Distance-based)
Decision trees(Rule-based)
Random forest methods(Rule-based)
Naive Bayes(Probabilistic)

Implement with sklearn

Classification through logistic regression

Classification through logistic regression about iris-dataset

from sklearn import datasets
from sklearn import model_selection
from sklearn import linear_model
from sklearn import metrics

# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)

# classification for loaded dataset
classifier = linear_model.LogisticRegression()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)

# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))

OUTPUT

On the below confusion matrix matrix, the diagonals correspond to the number of samples that are correctly classified for each level of the category variable, and the off-diagonal elements are the number of incorrectly classified samples. More specifically, the element of the confusion matrix C is the number of samples of category i that were categorized as j.

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        14
           1       1.00      0.93      0.97        15
           2       0.94      1.00      0.97        16

    accuracy                           0.98        45
   macro avg       0.98      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45



[2 2 2 2 2 0 2 2 1 0 0 0 2 1 1 0 0 0 1 1 1 2 1 2 1 0 1 0 1 0 1 0 2 1 1 1 2
 2 0 0 1 1 0 2 0] 

[[12  0  0]
 [ 0 13  1]
 [ 0  1 18]]

SUPPLEMENT1

>>> from sklearn import datasets
>>> iris = datasets.load_iris() 

>>> type(iris) 
sklearn.utils.Bunch

>>> type(iris.data)
<class 'numpy.ndarray'>

>>> iris.target_names
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

>>> iris.feature_names 
['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

>>> iris.data.shape 
(150, 4)

>>> iris.target.shape 
(150,)

SUPPLEMENT2

iris dataset

import pandas as pd
from sklearn import datasets

iris = datasets.load_iris()
iris.feature_names.append('target_names')

df1 = pd.DataFrame(iris.data)
df2 = pd.DataFrame(iris.target)
df = pd.concat([df1,df2], axis=1)
df.columns = iris.feature_names

print(df)

     s.length (cm)  s.width (cm)  ...  p.width (cm)  target_names
            5.1           3.5  ...           0.2             0
            4.9           3.0  ...           0.2             0
            4.7           3.2  ...           0.2             0
            4.6           3.1  ...           0.2             0
            5.0           3.6  ...           0.2             0
..             ...           ...  ...           ...           ...
          6.7           3.0  ...           2.3             2
          6.3           2.5  ...           1.9             2
          6.5           3.0  ...           2.0             2
          6.2           3.4  ...           2.3             2
          5.9           3.0  ...           1.8             2

[150 rows x 5 columns]

Classification through logistic regression about dataset on real world download csv file of iris data

OUTPUT

Classification through k-nearest neighbor methods

Classification through k-nearest neighbor methods about iris-dataset

from sklearn import datasets
from sklearn import model_selection
from sklearn import neighbors
from sklearn import metrics

# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)

# classification for loaded dataset
classifier = neighbors.KNeighborsClassifier()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)

# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))

OUTPUT

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        17
           1       0.93      0.93      0.93        15
           2       0.92      0.92      0.92        13

    accuracy                           0.96        45
   macro avg       0.95      0.95      0.95        45
weighted avg       0.96      0.96      0.96        45



[1 1 2 2 0 2 1 2 0 1 2 0 2 1 0 2 2 0 0 0 2 2 0 2 2 2 1 1 1 1 2 2 0 2 0 0 2
 1 1 2 2 1 0 1 0] 

[[16  0  0]
 [ 0 14  2]
 [ 0  0 13]]

Classification through k-nearest neighbor methods about dataset on real world download csv file of iris data

OUTPUT

Classification through support vector machines

Classification through support vector machines about iris-dataset

from sklearn import datasets
from sklearn import model_selection
from sklearn import svm
from sklearn import metrics

# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)

# classification for loaded dataset
classifier = svm.SVC()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)

# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))

OUTPUT

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        17
           1       1.00      1.00      1.00        17
           2       1.00      1.00      1.00        11

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



[2 1 1 1 0 2 0 1 2 2 0 0 0 0 2 1 1 0 2 2 2 0 2 2 2 1 0 1 0 2 0 0 1 0 0 1 2
 2 1 0 0 2 2 2 1] 
 
[[12  0  0]
 [ 0 11  0]
 [ 0  7 15]]

Classification through support vector machines about dataset on real world download csv file of iris data

OUTPUT

Classification through decision trees

Classification through decision trees about iris-dataset

from sklearn import datasets
from sklearn import model_selection
from sklearn import tree 
from sklearn import metrics

# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)

# classification for loaded dataset
classifier = tree.DecisionTreeClassifier()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)

# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))

OUTPUT

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       0.92      0.92      0.92        13
           2       0.94      0.94      0.94        17

    accuracy                           0.96        45
   macro avg       0.95      0.95      0.95        45
weighted avg       0.96      0.96      0.96        45



[2 0 1 1 0 0 1 1 1 1 2 2 2 1 0 0 2 1 1 0 1 2 2 1 0 1 0 0 0 0 2 2 1 2 1 0 1
 2 1 2 0 2 0 2 0] 
 
[[16  0  0]
 [ 0 12  0]
 [ 0  2 15]]

Classification through decision trees about dataset on real world download csv file of iris data

import pandas as pd
from sklearn.tree import DecisionTreeClassifier

df = pd.read_csv(r'C:\Users\userd\Desktop\dataset\iris.csv')
df = df.sample(frac=1)    # Shuffle, frac=1 means return all rows in random order
y = df['variety']
X = df[['sepal.length', 'sepal.width', 'petal.length', 'petal.width']]

y_train = y[0:120]
y_test = y[120:150]
X_train = X[0:120]
X_test = X[120:150]

clf= DecisionTreeClassifier()
clf.fit(X_train,y_train)
clf.predict(X_test)

OUTPUT

array(['Virginica', 'Setosa', 'Virginica', 'Virginica', 'Virginica',
       'Versicolor', 'Setosa', 'Setosa', 'Setosa', 'Virginica',
       'Virginica', 'Virginica', 'Versicolor', 'Versicolor', 'Setosa',
       'Setosa', 'Setosa', 'Virginica', 'Setosa', 'Setosa', 'Versicolor',
       'Virginica', 'Setosa', 'Virginica', 'Setosa', 'Setosa',
       'Versicolor', 'Virginica', 'Virginica', 'Virginica'], dtype=object)

y_test

OUTPUT

array(['Virginica', 'Setosa', 'Virginica', 'Virginica', 'Virginica',
       'Versicolor', 'Setosa', 'Setosa', 'Setosa', 'Virginica',
       'Virginica', 'Virginica', 'Versicolor', 'Versicolor', 'Setosa',
       'Setosa', 'Setosa', 'Virginica', 'Setosa', 'Setosa', 'Versicolor',
       'Virginica', 'Setosa', 'Virginica', 'Setosa', 'Setosa',
       'Versicolor', 'Virginica', 'Virginica', 'Versicolor'], dtype=object)

Classification through random forest methods

Classification through random forest methods about iris-dataset

from sklearn import datasets
from sklearn import model_selection
from sklearn import ensemble
from sklearn import metrics

# loading dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=0.7)

# classification for loaded dataset
classifier = ensemble.RandomForestClassifier()
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)

# result
print(metrics.classification_report(y_test, y_test_pred), '\n\n\n')
print(y_test_pred, '\n')
print(metrics.confusion_matrix(y_test, y_test_pred))

OUTPUT

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      1.00      1.00        14
           2       1.00      1.00      1.00        16

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



[1 0 0 1 2 1 0 0 2 1 0 1 2 2 0 0 2 2 1 2 0 0 2 2 0 2 2 1 2 0 1 2 2 0 0 1 0
 0 0 1 2 2 1 0 0] 

[[17  0  0]
 [ 0 12  1]
 [ 0  3 12]]

Classification through random forest methods about dataset on real world download csv file of iris data

OUTPUT

The resulting classification accuracy for each classifier

The resulting classification accuracy for each classifier about iris-dataset

from sklearn import datasets
from sklearn import model_selection
from sklearn import linear_model
from sklearn import metrics
from sklearn import tree
from sklearn import neighbors
from sklearn import svm
from sklearn import ensemble

import matplotlib.pyplot as plt
import numpy as np



train_size_vec = np.linspace(0.1, 0.9, 30)
classifiers = [linear_model.LogisticRegression,
               neighbors.KNeighborsClassifier,
               svm.SVC,
               tree.DecisionTreeClassifier,
               ensemble.RandomForestClassifier]
cm_diags = np.zeros((3, len(train_size_vec), len(classifiers)), dtype=float)


iris = datasets.load_iris()
for n, train_size in enumerate(train_size_vec):
    X_train, X_test, y_train, y_test = model_selection.train_test_split(iris.data, iris.target, train_size=train_size)
    for m, Classifier in enumerate(classifiers):
        classifier = Classifier()
        classifier.fit(X_train, y_train)
        y_test_p = classifier.predict(X_test)
        cm_diags[:, n, m] = metrics.confusion_matrix(y_test, y_test_p).diagonal()
        cm_diags[:, n, m] /= np.bincount(y_test)


fig, axes = plt.subplots(1, len(classifiers), figsize=(12, 3))
for m, Classifier in enumerate(classifiers):
    axes[m].plot(train_size_vec, cm_diags[2, :, m], label=iris.target_names[2])
    axes[m].plot(train_size_vec, cm_diags[1, :, m], label=iris.target_names[1])
    axes[m].plot(train_size_vec, cm_diags[0, :, m], label=iris.target_names[0])
    axes[m].set_title(type(Classifier()).__name__)
    axes[m].set_ylim(0, 1.1)
    axes[m].set_ylabel("classification accuracy")
    axes[m].set_xlabel("training size ratio")
    axes[m].legend(loc=4)

plt.show()

OUTPUT

다운로드 (1)

Implement with tensorflow

Implement with pytorch

List of posts followed by this article

Reference

6626070
2997924

AI02, Classification

Contents

A wide variety of alternative algorithms for classification

Implement with sklearn

Classification through logistic regression

Classification through k-nearest neighbor methods

Classification through support vector machines

Classification through decision trees

Classification through random forest methods

The resulting classification accuracy for each classifier

Implement with tensorflow

Implement with pytorch

6626070 2997924

AI02, Classification

Contents

A wide variety of alternative algorithms for classification

Implement with sklearn

Classification through logistic regression

Classification through k-nearest neighbor methods

Classification through support vector machines

Classification through decision trees

Classification through random forest methods

The resulting classification accuracy for each classifier

Implement with tensorflow

Implement with pytorch

6626070
2997924