AI02, Regression

Back to the previous page｜Meachine learning
List of posts to read before reading this article

Python Libraries

Simple linear regression
- Model performance indicators for training dataset
- Diagnosis for regression
Multivariate linear regression
- Model performance indicators for training dataset
- Diagnosis for regression
  - Multicollinearity
Logistic regression model
- Model performance indicators
- Diagnosis for regression
Nonlinear regression
- Linearization
Penalty of regression model
Implementation with a variety of library

Simple linear regression

y=\alpha +\beta x,

y_i = \alpha + \beta x_i + \varepsilon_i.

DERIVING

${\widehat {\varepsilon }}_{i}=y_{i}-\alpha -\beta x_{i}.$ ${\text{Find }}\min _{\alpha ,\,\beta }Q(\alpha ,\beta ),\quad {\text{for }}Q(\alpha ,\beta )=\sum _{i=1}^{n}{\widehat {\varepsilon }}_{i}^{\,2}=\sum _{i=1}^{n}(y_{i}-\alpha -\beta x_{i})^{2}\ .$ ${\begin{aligned}{\widehat {\alpha }}&={\bar {y}}-{\widehat {\beta }}\,{\bar {x}},\\[5pt]{\widehat {\beta }}&={\frac {\sum _{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}}\\[6pt]&={\frac {s_{x,y}}{s_{x}^{2}}}\\[5pt]&=r_{xy}{\frac {s_{y}}{s_{x}}}.\\[6pt]\end{aligned}}$

RESULTS

${\begin{aligned}{\widehat {\alpha }}&={\bar {y}}-{\widehat {\beta }}\,{\bar {x}},\\[5pt]{\widehat {\beta }}&={\frac {\sum _{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}}\\[6pt]&={\frac {s_{x,y}}{s_{x}^{2}}}\\[5pt]&=r_{xy}{\frac {s_{y}}{s_{x}}}.\\[6pt]\end{aligned}}$ $r_{xy}={\frac -{\bar {x}}{\bar {y}}}{\sqrt {\left({\overline {x^{2}}}-{\bar {x}}^{2}\right)\left({\overline {y^{2}}}-{\bar {y}}^{2}\right)}}}.$ ${\overline {xy}}={\frac {1}{n}}\sum _{i=1}^{n}x_{i}y_{i}.$

Model performance indicators for training dataset

Mean squared error (MSE) $\operatorname {MSE} ={\frac {1}{n}}\sum _{i=1}^{n}(Y_{i}-{\hat {Y_{i}}})^{2}.$
Mean absolute percentage error (MAPE) ${\mbox{M}}={\frac {100\%}{n}}\sum _{t=1}^{n}\left|{\frac {A_{t}-F_{t}}{A_{t}}}\right|,$
Adjusted R-squared ${\bar {R}}^{2}=1-(1-R^{2}){n-1 \over n-p-1}$
Akaike information criterion (AIC) $\mathrm {AIC} \,=\,2k-2\ln({\hat {L}})$
Bayesian information criterion (BIC) $\mathrm {BIC} =\ln(n)k-2\ln({\widehat {L}}).\$

Diagnosis for regression

Residuals Scatter plot
Normal Q-Q Plot
Residual vs Fitted plot

Multivariate linear regression

y_{i}=\beta _{1}x_{i1}+\cdots +\beta _{p}x_{ip}+\varepsilon _{i}=\mathbf {x} _{i}^{\rm {T}}{\boldsymbol {\beta }}+\varepsilon _{i},\qquad i=1,\ldots ,n,

\mathbf {y} =X{\boldsymbol {\beta }}+{\boldsymbol {\varepsilon }},\,

\mathbf {y} ={\begin{pmatrix}y_{1}\\y_{2}\\\vdots \\y_{n}\end{pmatrix}},\quad \mathbf {X} ={\begin{pmatrix}\mathbf {x} _{1}^{\rm {T}}\\\mathbf {x} _{2}^{\rm {T}}\\\vdots \\\mathbf {x} _{n}^{\rm {T}}\end{pmatrix}}={\begin{pmatrix}x_{11}&\cdots &x_{1p}\\x_{21}&\cdots &x_{2p}\\\vdots &\ddots &\vdots \\x_{n1}&\cdots &x_{np}\end{pmatrix}},\quad {\boldsymbol {\beta }}={\begin{pmatrix}\beta _{1}\\\beta _{2}\\\vdots \\\beta _{p}\end{pmatrix}},\quad {\boldsymbol {\varepsilon }}={\begin{pmatrix}\varepsilon _{1}\\\varepsilon _{2}\\\vdots \\\varepsilon _{n}\end{pmatrix}}.

DERIVING

${\hat {\boldsymbol {\beta }}}={\underset {\boldsymbol {\beta }}{\operatorname {arg\,min} }}\,S({\boldsymbol {\beta }}),$ $S({\boldsymbol {\beta }})=\sum _{i=1}^{n}{\bigl |}y_{i}-\sum _{j=1}^{p}X_{ij}\beta _{j}{\bigr |}^{2}={\bigl \|}\mathbf {y} -\mathbf {X} {\boldsymbol {\beta }}{\bigr \|}^{2}.$ $(\mathbf {X} ^{\rm {T}}\mathbf {X} ){\hat {\boldsymbol {\beta }}}=\mathbf {X} ^{\rm {T}}\mathbf {y} .$ ${\hat {\boldsymbol {\beta }}}=(\mathbf {X} ^{\rm {T}}\mathbf {X} )^{-1}\mathbf {X} ^{\rm {T}}\mathbf {y} .$

RESULTS

${\hat {\boldsymbol {\beta }}}=(\mathbf {X} ^{\rm {T}}\mathbf {X} )^{-1}\mathbf {X} ^{\rm {T}}\mathbf {y} .$

Model performance indicators for training dataset

Mean squared error (MSE) $\operatorname {MSE} ={\frac {1}{n}}\sum _{i=1}^{n}(Y_{i}-{\hat {Y_{i}}})^{2}.$
Mean absolute percentage error (MAPE) ${\mbox{M}}={\frac {100\%}{n}}\sum _{t=1}^{n}\left|{\frac {A_{t}-F_{t}}{A_{t}}}\right|,$
Adjusted R-squared ${\bar {R}}^{2}=1-(1-R^{2}){n-1 \over n-p-1}$
Akaike information criterion (AIC) $\mathrm {AIC} \,=\,2k-2\ln({\hat {L}})$
Bayesian information criterion (BIC) $\mathrm {BIC} =\ln(n)k-2\ln({\widehat {L}}).\$

Diagnosis for regression

Residuals Scatter plot
Normal Q-Q Plot
Residual vs Fitted plot

Multicollinearity

Multicollinearity refers to a situation in which two or more explanatory variables in a multiple regression model are highly linearly related.

Detection of multicollinearity : Variance inflation factor(VIF) $\mathrm {VIF} _{i}={\frac {1}{1-R_{i}^{2}}}$

Way to relieve multicollinearity

with eliminatation of any variables(Feature Selection)
- Variables selection
  - Feedforward selection
  - Backward selection
  - Stepwise
- Correlation coefficient
- Lasso
- etc
without eliminatation of any variables
- AutoEncoder
- PCA
- Ridge

Logistic regression model

OUTPUT

Model performance indicators

Mean squared error (MSE) $\operatorname {MSE} ={\frac {1}{n}}\sum _{i=1}^{n}(Y_{i}-{\hat {Y_{i}}})^{2}.$
Mean absolute percentage error (MAPE) ${\mbox{M}}={\frac {100\%}{n}}\sum _{t=1}^{n}\left|{\frac {A_{t}-F_{t}}{A_{t}}}\right|,$
Confusion matrix
Receiver operating characteristic curve, or ROC curve
Area under the curve, or AUC

Diagnosis for regression

Residuals Scatter plot
Normal Q-Q Plot
Residual vs Fitted plot

Nonlinear regression

\mathbf {y} \sim f(\mathbf {x} ,{\boldsymbol {\beta }})

f(x_{i},{\boldsymbol {\beta }})\approx f(x_{i},0)+\sum _{j}J_{ij}\beta _{j}

J_{ij}={\frac {\partial f(x_{i},{\boldsymbol {\beta }})}{\partial \beta _{j}}}

{\hat }}\approx {\mathbf {(J^{T}J)^1J^{T}y}}.

Linearization

y=ae^U\,\!

\ln {(y)}=\ln {(a)}+bx+u,\,\!

Penalty of regression model

OUTPUT

Implementation with a variety of library

Regression with statsmodel

STEP	INPUT	PROCESS	OUTPUT
1	csv file	Data preprocessing	train dataset, test dataset
2	train dataset, test dataset	Regression analysis	full model
3	full model	Modify regression model	forward model, backward model, stepwise model

Simple linear regression about artificial dataset

Data preprocessing

import numpy as np
import pandas as pd
import statsmodels.api as sm

def f(x,a,b):
    return a*x + b

x = np.random.random(1000)
a = 3
b = 5

target = f(x,a,b)
df_input = pd.DataFrame(x)
df_target = pd.DataFrame(target)
df = pd.concat([df_input, df_target], axis=1)
df.columns = ['input','target']
Input = df['input']
Target = df['target']
constant_input = sm.add_constant(Input, has_constant='add')

Data : Input

Input.head()

OUTPUT

  0.830166
  0.542949
  0.357683
  0.688297
  0.645634
Name: input, dtype: float64

constant_input.head()

OUTPUT

	const	input
1.0	0.830166
1.0	0.542949
1.0	0.357683
1.0	0.688297
1.0	0.645634

Data : Target

Target.head()

OUTPUT

  7.490499
  6.628847
  6.073050
  7.064890
  6.936902
Name: target, dtype: float64

Regression analysis

model = sm.OLS(Target, constant_input)
fitted_model = model.fit()
fitted_model.summary()

OUTPUT : Model results

OUTPUT

# Regression coefficients
fitted_model.params

const    5.0
input    3.0
dtype: float64

Estimated values v.s. Original values for target

Estimated values : $\hat{y} = \hat{a}x + \hat{b} \to A\vec{X}$

np.dot(constant_input, fitted_model.params)

array([7.49049949, 6.62884716, 6.07305033, 7.0648904 , 6.93690197,
       6.04064573, 6.5576149 , 6.74231639, 6.73183572, 7.07796106,
       ...
       5.74719815, 6.58978836, 6.25943715, 5.88547536, 7.40743629,
       5.77773424, 5.99074449, 6.12113732, 6.13392177, 6.92979226])

Original values : $y = ax + b$

f(x,a,b)

array([7.49049949, 6.62884716, 6.07305033, 7.0648904 , 6.93690197,
       6.04064573, 6.5576149 , 6.74231639, 6.73183572, 7.07796106,
       ...
       5.74719815, 6.58978836, 6.25943715, 5.88547536, 7.40743629,
       5.77773424, 5.99074449, 6.12113732, 6.13392177, 6.92979226])

Model diagnosis

Residual

fitted_model.resid

0     -1.776357e-15
1     -2.664535e-15
2     -2.664535e-15
...
...
998   -3.552714e-15
999   -1.776357e-15
Length: 1000, dtype: float64

Residual summation

np.sum(fitted_model.resid)

-2.652988939644274e-12

Visualization for residue

fitted_model.resid.plot()

다운로드 (3)

Model prediction

Prediction

sample = np.random.random(10)
constant_sample = sm.add_constant(sample, has_constant='add')
fitted_model.predict(constant_sample)

array([5.20371122, 6.07617745, 7.77126507, 5.35615965, 7.44019585,
       5.94592521, 5.94306959, 6.56256376, 6.09420242, 6.39866773])

Verification

f(sample,a,b)

array([5.20371122, 6.07617745, 7.77126507, 5.35615965, 7.44019585,
       5.94592521, 5.94306959, 6.56256376, 6.09420242, 6.39866773])

Curve fitting

import matplotlib.pyplot as plt

plt.plot(x, f(x,a,b), 'x', lw=0, label="data")
plt.plot(x, 3*x + 5, label='result')            # from fitted_model.params
plt.ylim(0,10)
plt.legend()
plt.show()

다운로드 (2)

Multivariate linear regression about artificial dataset

Data preprocessing

import numpy as np
import pandas as pd
import statsmodels.api as sm

def f(x,y,z,a,b,c,r):
    return a*x + b*y + c*z + r

x = np.random.random(100)
y = np.random.random(100)
z = np.random.random(100)
a = 20
b = 50
c = 7
r = 3

target = f(x,y,z,a,b,c,r)
df_input1 = pd.DataFrame(x)
df_input2 = pd.DataFrame(y)
df_input3 = pd.DataFrame(z)
df_target = pd.DataFrame(target)
df = pd.concat([df_input1, df_input2, df_input3, df_target], axis=1)
df.columns = ['input1', 'input2', 'input3', 'target']
Input = df[['input1', 'input2', 'input3']]
Target = df['target']
constant_input = sm.add_constant(Input, has_constant='add')

Data : Input

Input.head()

OUTPUT

	input1		input2		input3
0.957632	0.276408	0.345041
0.821460	0.653252	0.549964
0.506590	0.261659	0.393543
0.500052	0.056861	0.041176
0.267245	0.639603	0.769945

constant_input.head()

	const	input1		input2		input3
1.0	0.957632	0.276408	0.345041
1.0	0.821460	0.653252	0.549964
1.0	0.506590	0.261659	0.393543
1.0	0.500052	0.056861	0.041176
1.0	0.267245	0.639603	0.769945

Data : Target

Target.head()

  38.388309
  55.941549
  28.969561
  16.132320
  45.714644
Name: target, dtype: float64

Regression analysis

model = sm.OLS(Target, constant_input)
fitted_model = model.fit()
fitted_model.summary()

OUTPUT : Model results

OUTPUT

# Regression coefficients
fitted_model.params

const      3.0
input1    20.0
input2    50.0
input3     7.0
dtype: float64

Verification

${\hat {\boldsymbol {\beta }}}=(\mathbf {X} ^{\rm {T}}\mathbf {X} )^{-1}\mathbf {X} ^{\rm {T}}\mathbf {y} .$

from numpy import linalg

B = linalg.inv(np.dot(constant_input.T, constant_input))
np.dot(np.dot(B, constant_input.T),target)

array([ 3., 20., 50.,  7.])

Estimated values v.s. Original values for target

Estimated values : $\hat{s} = \hat{a}x + \hat{b}y + \hat{c}z + \hat{r} \to \hat{S}=\hat{A}X$

np.dot(constant_input, fitted_model.params)

array([38.38830915, 55.94154925, 28.96956111, 16.13232006, 45.71464433,
       35.66915115, 54.48721376, 35.3255576 , 17.57414208, 12.2024595 ,
       40.89621614, 33.05053896, 14.50158372, 38.67065445, 53.48709859,
       42.59911466, 54.53748705, 46.69193071, 18.38867267, 45.87908774,
       40.6693773 , 36.01122162, 11.68815215, 44.31558167, 41.80645497,
       49.37841447, 47.09113841, 53.96541726, 36.77556825, 23.52950327,
       38.64777777, 34.16965497, 50.26840963, 40.02741955, 44.16716928,
       42.3150182 , 25.99497711, 41.40530879, 27.36066677, 47.86915385,
       25.70932186, 24.86294199, 55.0745327 , 22.98417126, 32.50294778,
       17.8420005 , 61.35284467, 36.43911886, 49.76839721, 50.56165004,
       40.71292581, 36.41847389, 23.38460759, 59.30680731, 39.40085223,
       25.87053451, 40.11977913, 24.80379252, 53.38541514, 60.33980335,
       45.01501126, 51.37600515, 48.30658941, 30.00273352, 42.44824437,
       52.17219373, 21.72628098, 74.51174471, 47.41694199, 16.47748332,
       16.18670621, 26.77202999, 67.7470938 , 46.24996358, 41.99306012,
       35.44894821, 28.65531671, 29.65139668, 53.31971577, 22.99141254,
       51.20655459, 50.54080656, 66.4153275 , 39.5569899 , 39.35911854,
       39.014512  , 34.51325153, 35.5253818 , 50.8264082 , 18.76223046,
       66.14916028, 37.23867282, 28.3269569 , 53.50468595, 55.85972521,
       54.48370671, 61.87997791, 24.69145197, 47.79432371, 41.2612825 ])

Original values : $s = ax + by + cz + r$

f(x,y,z,a,b,c,r)

array([38.38830915, 55.94154925, 28.96956111, 16.13232006, 45.71464433,
       35.66915115, 54.48721376, 35.3255576 , 17.57414208, 12.2024595 ,
       40.89621614, 33.05053896, 14.50158372, 38.67065445, 53.48709859,
       42.59911466, 54.53748705, 46.69193071, 18.38867267, 45.87908774,
       40.6693773 , 36.01122162, 11.68815215, 44.31558167, 41.80645497,
       49.37841447, 47.09113841, 53.96541726, 36.77556825, 23.52950327,
       38.64777777, 34.16965497, 50.26840963, 40.02741955, 44.16716928,
       42.3150182 , 25.99497711, 41.40530879, 27.36066677, 47.86915385,
       25.70932186, 24.86294199, 55.0745327 , 22.98417126, 32.50294778,
       17.8420005 , 61.35284467, 36.43911886, 49.76839721, 50.56165004,
       40.71292581, 36.41847389, 23.38460759, 59.30680731, 39.40085223,
       25.87053451, 40.11977913, 24.80379252, 53.38541514, 60.33980335,
       45.01501126, 51.37600515, 48.30658941, 30.00273352, 42.44824437,
       52.17219373, 21.72628098, 74.51174471, 47.41694199, 16.47748332,
       16.18670621, 26.77202999, 67.7470938 , 46.24996358, 41.99306012,
       35.44894821, 28.65531671, 29.65139668, 53.31971577, 22.99141254,
       51.20655459, 50.54080656, 66.4153275 , 39.5569899 , 39.35911854,
       39.014512  , 34.51325153, 35.5253818 , 50.8264082 , 18.76223046,
       66.14916028, 37.23867282, 28.3269569 , 53.50468595, 55.85972521,
       54.48370671, 61.87997791, 24.69145197, 47.79432371, 41.2612825 ])

Model diagnosis

Residual

fitted_model.resid

   0.000000e+00
  -7.105427e-15
   1.776357e-14
   2.486900e-14
   7.105427e-15
   7.105427e-15
          ...     
  0.000000e+00
 -1.421085e-14
  2.486900e-14
  1.421085e-14
  7.105427e-15
Length: 100, dtype: float64

Visualization for residue

import matplotlib.pyplot as plt

fitted_model.resid.plot()
plt.show()

다운로드 (4)

Model prediction

Prediction

sample = np.random.random((10,3))
constant_sample = sm.add_constant(sample, has_constant='add')
fitted_model.predict(constant_sample)

array([47.18460385, 29.42685672, 45.34542694, 21.18367219, 60.83667819,
       31.51219742, 16.92413439, 31.70573065, 19.8877936 , 47.38519353])

Verification

f(sample[:,0],sample[:,1],sample[:,2],a,b,c,r)

array([47.18460385, 29.42685672, 45.34542694, 21.18367219, 60.83667819,
       31.51219742, 16.92413439, 31.70573065, 19.8877936 , 47.38519353])

Curve fitting

Multivariate linear regression about dataset on real world

Dataset download ｜ URL

Dataset Description

CRIM - per capita crime rate by town
ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS - proportion of non-retail business acres per town.
CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
NOX - nitric oxides concentration (parts per 10 million)
RM - average number of rooms per dwelling
AGE - proportion of owner-occupied units built prior to 1940
DIS - weighted distances to five Boston employment centres
RAD - index of accessibility to radial highways
TAX - full-value property-tax rate per $10,000
PTRATIO - pupil-teacher ratio by town
B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT - % lower status of the population
MEDV - Median value of owner-occupied homes in $1000’s

Data preprocessing

import pandas as pd
import numpy as np
import statsmodels.api as sm

boston = pd.read_csv(r'C:\Users\userd\Desktop\dataset\boston_house.csv')
Input_s = boston[['CRIM', 'RM', 'LSTAT']]
Input_L = boston[['CRIM', 'RM', 'LSTAT', 'B', 'TAX', 'AGE', 'ZN', 'NOX', 'INDUS']]
Target = boston['MEDV']
constant_Input_s = sm.add_constant(Input_s, has_constant='add')
constant_Input_L = sm.add_constant(Input_L, has_constant='add')

Data : Input

constant_Input_s.head()

	const	CRIM	RM	LSTAT
1.0	0.00632	6.575	4.98
1.0	0.02731	6.421	9.14
1.0	0.02729	7.185	4.03
1.0	0.03237	6.998	2.94
1.0	0.06905	7.147	5.33

constant_Input_L.head()

	const	CRIM	RM	LSTAT	B	TAX	AGE	ZN	NOX	INDUS
1.0	0.00632	6.575	4.98	396.90	296	65.2	18.0	0.538	2.31
1.0	0.02731	6.421	9.14	396.90	242	78.9	0.0	0.469	7.07
1.0	0.02729	7.185	4.03	392.83	242	61.1	0.0	0.469	7.07
1.0	0.03237	6.998	2.94	394.63	222	45.8	0.0	0.458	2.18
1.0	0.06905	7.147	5.33	396.90	222	54.2	0.0	0.458	2.18

Data diagnosis

Multicollinearity : Variance inflation factor(VIF)

from statsmodels.stats.outliers_influence import variance_inflation_factor

vif = pd.DataFrame()
vif['VIF Factor'] = [variance_inflation_factor(Input_L.values, i) for i in range(Input_L.shape[1])]
vif['features'] = Input_L.columns
vif

	VIF Factor	features
1.917332	CRIM
46.535369	RM
8.844137	LSTAT
16.856737	B
19.923044	TAX
18.457503	AGE
2.086502	ZN
72.439753	NOX
12.642137	INDUS

Multicollinearity : Correlation coefficient

Input_L.corr()

	CRIM		RM		LSTAT		B		TAX		AGE		ZN		NOX		INDUS
CRIM	1.000000	-0.219247	0.455621	-0.385064	0.582764	0.352734	-0.200469	0.420972	0.406583
RM	-0.219247	1.000000	-0.613808	0.128069	-0.292048	-0.240265	0.311991	-0.302188	-0.391676
LSTAT	0.455621	-0.613808	1.000000	-0.366087	0.543993	0.602339	-0.412995	0.590879	0.603800
B	-0.385064	0.128069	-0.366087	1.000000	-0.441808	-0.273534	0.175520	-0.380051	-0.356977
TAX	0.582764	-0.292048	0.543993	-0.441808	1.000000	0.506456	-0.314563	0.668023	0.720760
AGE	0.352734	-0.240265	0.602339	-0.273534	0.506456	1.000000	-0.569537	0.731470	0.644779
ZN	-0.200469	0.311991	-0.412995	0.175520	-0.314563	-0.569537	1.000000	-0.516604	-0.533828
NOX	0.420972	-0.302188	0.590879	-0.380051	0.668023	0.731470	-0.516604	1.000000	0.763651
INDUS	0.406583	-0.391676	0.603800	-0.356977	0.720760	0.644779	-0.533828	0.763651	1.000000

import seaborn as sns
cmap = sns.light_palette('darkgray', as_cmap=True)
sns.heatmap(Input_L.corr(), annot=True, cmap=cmap)
plt.show()

다운로드 (2)

sns.pairplot(Input_L)
plt.show()

다운로드 (3)

Regression analysis

model_s = sm.OLS(Target, constant_Input_s)
model_L = sm.OLS(Target, constant_Input_L)
fitted_model_s = model_s.fit()
fitted_model_L = model_L.fit()

fitted_model_s.summary()

OUTPUT : Model results

OUTPUT

fitted_model_s.params

const   -2.562251
CRIM    -0.102941
RM       5.216955
LSTAT   -0.578486
dtype: float64

fitted_model_L.summary()

OUTPUT : Model results

OUTPUT

fitted_model_L.params

const   -7.108827
CRIM    -0.045293
RM       5.092238
LSTAT   -0.565133
B        0.008974
TAX     -0.006025
AGE      0.023619
ZN       0.029377
NOX      3.483832
INDUS    0.029270
dtype: float64

Model diagnosis

Residual analysis

import matplotlib.pyplot as plt

fitted_model_s.resid.plot(label="base")
fitted_model_L.resid.plot(label="full")
plt.legend()
plt.show()

다운로드 (1)

Modify regression model(based on backward elimination)

from sklearn.model_selection import train_test_split

# Data preprocessing
Input1 = Input_L.drop('NOX', axis=1)
Input2 = Input_L.drop(['NOX','RM'], axis=1)
constant_Input1 = sm.add_constant(Input1, has_constant='add')
constant_Input2 = sm.add_constant(Input2, has_constant='add')
X = constant_Input_L
X1 = constant_Input1
X2 = constant_Input2
y = Target

train_x, test_x, train_y, test_y = train_test_split(X, y, train_size=0.7, test_size=0.3, random_state = 1)
train_x1, test_x1, train_y1, test_y1 = train_test_split(X1, y, train_size=0.7, test_size=0.3, random_state = 1)
train_x2, test_x2, train_y2, test_y2 = train_test_split(X2, y, train_size=0.7, test_size=0.3, random_state = 1)

# Regression analysis
model = sm.OLS(train_y, train_x)
model1 = sm.OLS(train_y1, train_x1)
model2 = sm.OLS(train_y2, train_x2)

fitted_model = model.fit()
fitted_model1 = model1.fit()
fitted_model2 = model2.fit()

Data : Input

constant_Input1.head()

	const	CRIM	RM	LSTAT	B	TAX	AGE	ZN	INDUS
1.0	0.00632	6.575	4.98	396.90	296	65.2	18.0	2.31
1.0	0.02731	6.421	9.14	396.90	242	78.9	0.0	7.07
1.0	0.02729	7.185	4.03	392.83	242	61.1	0.0	7.07
1.0	0.03237	6.998	2.94	394.63	222	45.8	0.0	2.18
1.0	0.06905	7.147	5.33	396.90	222	54.2	0.0	2.18

constant_Input2.head()

	const	CRIM	LSTAT	B	TAX	AGE	ZN	INDUS
1.0	0.00632	4.98	396.90	296	65.2	18.0	2.31
1.0	0.02731	9.14	396.90	242	78.9	0.0	7.07
1.0	0.02729	4.03	392.83	242	61.1	0.0	7.07
1.0	0.03237	2.94	394.63	222	45.8	0.0	2.18
1.0	0.06905	5.33	396.90	222	54.2	0.0	2.18

Data diagnosis

Multicollinearity : Variance inflation factor(VIF)

vif1 = pd.DataFrame()
vif2 = pd.DataFrame()
vif1['VIF1 Factor'] = [variance_inflation_factor(Input1.values, i) for i in range(Input1.shape[1])]
vif2['VIF2 Factor'] = [variance_inflation_factor(Input2.values, i) for i in range(Input2.shape[1])]
vif1['features1'] = Input1.columns
vif2['features2'] = Input2.columns
pd.concat([vif,vif1,vif2], axis=1)

	VIF Factor	features	VIF1 Factor	features1	VIF2 Factor	features2
1.917332	CRIM		1.916648	CRIM		1.907517	CRIM
46.535369	RM		30.806301	RM		7.933529	LSTAT
8.844137	LSTAT		8.171214	LSTAT		7.442569	B
16.856737	B		16.735751	B		16.233237	TAX
19.923044	TAX		18.727105	TAX		13.765377	AGE
18.457503	AGE		16.339792	AGE		1.820070	ZN
2.086502	ZN		2.074500	ZN		11.116823	INDUS
72.439753	NOX		11.217461	INDUS		NaN		NaN
12.642137	INDUS		NaN		NaN		NaN		NaN

OUTPUT : Model results

fitted_model.summary()

OUTPUT

fitted_model1.summary()

OUTPUT

fitted_model2.summary()

OUTPUT

Model prediction

plt.plot(np.array(fitted_model.predict(test_x)), label="model with full variables")
plt.plot(np.array(fitted_model1.predict(test_x1)), label="model1 eliminated 1 variable")
plt.plot(np.array(fitted_model2.predict(test_x2)), label="model2 eliminated 2 variables")
plt.plot(np.array(test_y), label="true")
plt.legend()
plt.show()

다운로드 (4)

Model diagnosis

Residual analysis

plt.plot(np.array(test_y.values-fitted_model.predict(test_x)),label='residual of model')
plt.plot(np.array(test_y.values-fitted_model1.predict(test_x1)),label='residual of model1')
plt.plot(np.array(test_y.values-fitted_model2.predict(test_x2)),label='residual; of model2')
plt.legend()
plt.show()

다운로드 (5)

Model performance

from sklearn.metrics import mean_squared_error

print(mean_squared_error(y_true=test_y.values, y_pred=fitted_model.predict(test_x)))
print(mean_squared_error(y_true=test_y.values, y_pred=fitted_model1.predict(test_x1)))
print(mean_squared_error(y_true=test_y.values, y_pred=fitted_model2.predict(test_x2)))

148631468819843
14006260984654
788453179128304

Multivariate nonlinear regression about dataset on real world

Dataset download ｜ URL

Data preprocessing

## [0] : Load libraries
import pandas as pd
import numpy as np
import statsmodels.api as sm
from sklearn.model_selection import train_test_split


## [1] : Load dataset
corolla = pd.read_csv(r'C:\Users\userd\Desktop\dataset\ToyotaCorolla.csv')
nCar = corolla.shape[0]
nVar = corolla.shape[1]


## [2] : categorical data-type > binary data-type
# Create dummy variables
dummy_p = np.repeat(0,nCar)
dummy_d = np.repeat(0,nCar)
dummy_c = np.repeat(0,nCar)

# Save index for 'Fuel_Type'
p_idx = np.array(corolla.Fuel_Type == "Petrol")
d_idx = np.array(corolla.Fuel_Type == "Diesel")
c_idx = np.array(corolla.Fuel_Type == "CNG")

# Substitute binary = 1 after slicing
dummy_p[p_idx] = 1  # Petrol
dummy_d[d_idx] = 1  # Diesel
dummy_c[c_idx] = 1  # CNG


## [3] : Eliminate unnecessary variables and add dummy variables
Fuel = pd.DataFrame({'Petrol': dummy_p, 'Diesel': dummy_d, 'CNG': dummy_c})
corolla_ = corolla.dropna().drop(['Id','Model','Fuel_Type'], axis=1, inplace=False)
mlr_data = pd.concat((corolla_, Fuel), 1)


## [4] : Add bias
constant_mlr_data = sm.add_constant(mlr_data, has_constant='add')


## [5] : Divide into input data and output data
feature_columns = list(constant_mlr_data.columns.difference(['Price']))
X = constant_mlr_data[feature_columns]
y = constant_mlr_data.Price
train_x, test_x, train_y, test_y = train_test_split(X, y, train_size=0.7, test_size=0.3)

[1] Data : Input

corolla.head()

	Id	Model						Price	Age_08_04	Mfg_Month	Mfg_Year	KM		Fuel_Type	HP	Met_Color	...	Central_Lock	Powered_Windows	Power_Steering	Radio	Mistlamps		Sport_Model	Backseat_Divider	Metallic_Rim	Radio_cassette	Tow_Bar
1	TOYOTA Corolla 2.0 D4D HATCHB TERRA 2/3-Doors	13500	23		10		2002		46986		Diesel		90	1		...	1		1		1		0	0			0		1			0		0		0
2	TOYOTA Corolla 2.0 D4D HATCHB TERRA 2/3-Doors	13750	23		10		2002		72937		Diesel		90	1		...	1		0		1		0	0			0		1			0		0		0
3	?TOYOTA Corolla 2.0 D4D HATCHB TERRA 2/3-Doors	13950	24		9		2002		41711		Diesel		90	1		...	0		0		1		0	0			0		1			0		0		0
4	TOYOTA Corolla 2.0 D4D HATCHB TERRA 2/3-Doors	14950	26		7		2002		48000		Diesel		90	0		...	0		0		1		0	0			0		1			0		0		0
5	TOYOTA Corolla 2.0 D4D HATCHB SOL 2/3-Doors	13750	30		3		2002		38500		Diesel		90	0		...	1		1		1		0	1			0		1			0		0		0
rows × 37 columns

print('nCar: %d' % nCar, 'nVar: %d' % nVar )

nCar: 1436 nVar: 37

[2] Data : Input

dummy_p

array([0, 0, 0, ..., 0, 0, 0])

dummy_d

array([0, 0, 0, ..., 0, 0, 0])

dummy_c

array([0, 0, 0, ..., 0, 0, 0])

p_idx

array([False, False, False, ...,  True,  True,  True])

d_idx

array([ True,  True,  True, ..., False, False, False])

c_idx

array([False, False, False, ..., False, False, False])

dummy_p

array([0, 0, 0, ..., 1, 1, 1])

dummy_d

array([1, 1, 1, ..., 0, 0, 0])

dummy_c

array([0, 0, 0, ..., 0, 0, 0])

[3] Data : Input

Fuel.head()

	Petrol	Diesel	CNG
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0

Fuel.shape

(1436, 3)

corolla_.head()

	Price	Age_08_04	Mfg_Month	Mfg_Year	KM	HP	Met_Color	Automatic	cc	Doors	...	Central_Lock	Powered_Windows	Power_Steering	Radio	Mistlamps	Sport_Model	Backseat_Divider	Metallic_Rim	Radio_cassette	Tow_Bar
13500	23		10		2002		46986	90	1		0		2000	3	...	1		1		1		0	0		0		1			0		0		0
13750	23		10		2002		72937	90	1		0		2000	3	...	1		0		1		0	0		0		1			0		0		0
13950	24		9		2002		41711	90	1		0		2000	3	...	0		0		1		0	0		0		1			0		0		0
14950	26		7		2002		48000	90	0		0		2000	3	...	0		0		1		0	0		0		1			0		0		0
13750	30		3		2002		38500	90	0		0		2000	3	...	1		1		1		0	1		0		1			0		0		0
rows × 34 columns

corolla_.shape

(1436, 34)

mlr_data.head()

	Price	Age_08_04	Mfg_Month	Mfg_Year	KM	HP	Met_Color	Automatic	cc	Doors	...	Radio	Mistlamps	Sport_Model	Backseat_Divider	Metallic_Rim	Radio_cassette	Tow_Bar	Petrol	Diesel	CNG
13500	23		10		2002		46986	90	1		0		2000	3	...	0	0		0		1			0		0		0	0	1	0
13750	23		10		2002		72937	90	1		0		2000	3	...	0	0		0		1			0		0		0	0	1	0
13950	24		9		2002		41711	90	1		0		2000	3	...	0	0		0		1			0		0		0	0	1	0
14950	26		7		2002		48000	90	0		0		2000	3	...	0	0		0		1			0		0		0	0	1	0
13750	30		3		2002		38500	90	0		0		2000	3	...	0	1		0		1			0		0		0	0	1	0
rows × 37 columns

mlr_data.shape

(1436, 37)

[4] Data : Input

constant_mlr_data.head()

	const	Price	Age_08_04	Mfg_Month	Mfg_Year	KM	HP	Met_Color	Automatic	cc	...	Radio	Mistlamps	Sport_Model	Backseat_Divider	Metallic_Rim	Radio_cassette	Tow_Bar	Petrol	Diesel	CNG
1.0	13500	23		10		2002		46986	90	1		0		2000	...	0	0		0		1			0		0		0	0	1	0
1.0	13750	23		10		2002		72937	90	1		0		2000	...	0	0		0		1			0		0		0	0	1	0
1.0	13950	24		9		2002		41711	90	1		0		2000	...	0	0		0		1			0		0		0	0	1	0
1.0	14950	26		7		2002		48000	90	0		0		2000	...	0	0		0		1			0		0		0	0	1	0
1.0	13750	30		3		2002		38500	90	0		0		2000	...	0	1		0		1			0		0		0	0	1	0
rows × 38 columns

[5] Data : Input

mlr_data.columns.difference(['Price'])

Index(['ABS', 'Age_08_04', 'Airbag_1', 'Airbag_2', 'Airco', 'Automatic',
       'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider',
       'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders',
       'Diesel', 'Doors', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Met_Color',
       'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Mistlamps',
       'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio',
       'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'cc', 'const'],
      dtype='object')

X.head()

	ABS	Age_08_04	Airbag_1	Airbag_2	Airco	Automatic	Automatic_airco	BOVAG_Guarantee			Backseat_Divider	Boardcomputer	...	Power_Steering	Powered_Windows	Quarterly_Tax	Radio	Radio_cassette		Sport_Model	Tow_Bar	Weight	cc	const
1	23		1		1		0	0		0		1				1			1		...	1		1		210		0	0			0		0	1165	2000	1.0
1	23		1		1		1	0		0		1				1			1		...	1		0		210		0	0			0		0	1165	2000	1.0
1	24		1		1		0	0		0		1				1			1		...	1		0		210		0	0			0		0	1165	2000	1.0
1	26		1		1		0	0		0		1				1			1		...	1		0		210		0	0			0		0	1165	2000	1.0
1	30		1		1		1	0		0		1				1			1		...	1		1		210		0	0			0		0	1170	2000	1.0
rows × 37 columns

y.head()

  13500
  13750
  13950
  14950
  13750
Name: Price, dtype: int64

>>> print(X.shape, y.shape)
(1436, 37), (1436,)

>>> print(train_x.shape, test_x.shape, train_y.shape, test_y.shape)
(1005, 37) (431, 37) (1005,) (431,)

Data diagnosis

Multicollinearity : Variance inflation factor(VIF)

from statsmodels.stats.outliers_influence import variance_inflation_factor

vif = pd.DataFrame()
vif["VIF Factor"] = [variance_inflation_factor(mlr_data.values, i) for i in range(mlr_data.shape[1])]
vif["features"] = mlr_data.columns
vif

	VIF Factor	features
10.953474	Price
inf		Age_08_04
inf		Mfg_Month
inf		Mfg_Year
2.400334	KM
2.621514	HP
1.143778	Met_Color
1.121303	Automatic
1.258641	cc
1.352288	Doors
0.000000	Cylinders
1.271814	Gears
5.496805	Quarterly_Tax
4.487491	Weight
1.210815	Mfr_Guarantee
1.392485	BOVAG_Guarantee
1.573026	Guarantee_Period
2.276617	ABS
1.612758	Airbag_1
3.106933	Airbag_2
1.846429	Airco
2.009866	Automatic_airco
2.647036	Boardcomputer
1.564446	CD_Player
4.593157	Central_Lock
4.676311	Powered_Windows
1.582829	Power_Steering
62.344621	Radio
2.076846	Mistlamps
1.510131	Sport_Model
2.702141	Backseat_Divider
1.349642	Metallic_Rim
62.172860	Radio_cassette
1.153760	Tow_Bar
inf		Petrol
inf		Diesel
inf		CNG

Regression analysis

# Train the MLR(fitting regression model)
full_model = sm.OLS(train_y, train_x)
fitted_full_model = full_model.fit()
fitted_full_model.summary()

OUTPUT : Model results

R2 is high, a majority of variables is meaningful

Model diagnosis

Normal Q-Q Plot

import matplotlib.pyplot as plt

# checking residual
res = fitted_full_model.resid  # residual

# q-q plot
fig = sm.qqplot(res, fit=True, line='45')

다운로드

Residual vs Fitted plot

import matplotlib.pyplot as plt

pred_y=fitted_full_model.predict(train_x)
res = fitted_full_model.resid  # residual

fig = plt.scatter(pred_y,res, s=4)
plt.xlim(4000,30000)
plt.xlim(4000,30000)
plt.xlabel('Fitted values')
plt.ylabel('Residual')

다운로드 (1)

Model prediction

import matplotlib.pyplot as plt

pred_y = fitted_full_model.predict(test_x) ## 검증 데이터에 대한 예측 
plt.plot(np.array(test_y-pred_y),label="pred_full")
plt.legend()
plt.show()

다운로드 (2)

Model performance

from sklearn.metrics import mean_squared_error

pred_y = fitted_full_model.predict(test_x)
mean_squared_error(y_true= test_y, y_pred= pred_y)

1441488.811437499

Modify regression model(Variables selection)

# [0]
import time
import itertools

# [1]
def processSubset(X,y, feature_set):
            model = sm.OLS(y,X[list(feature_set)]) # Modeling
            regr = model.fit() # 모델 학습
            AIC = regr.aic # 모델의 AIC
            return {"model":regr, "AIC":AIC}

# [2] getBest: 가장 낮은 AIC를 가지는 모델 선택 및 저장
def getBest(X,y,k):
    tic = time.time() # 시작시간
    results = [] # 결과 저장공간
    for combo in itertools.combinations(X.columns.difference(['const']), k): # 각 변수조합을 고려한 경우의 수
        combo=(list(combo)+['const'])
        
        results.append(processSubset(X,y,feature_set=combo))  # 모델링된 것들을 저장
    models = pd.DataFrame(results) # 데이터 프레임으로 변환
    # 가장 낮은 AIC를 가지는 모델 선택 및 저장
    best_model = models.loc[models['AIC'].argmin()] # index
    toc = time.time() # 종료시간
    print("Processed ", models.shape[0], "models on", k, "predictors in", (toc - tic),
          "seconds.")
    return best_model

print(getBest(X=train_x, y=train_y,k=2))

OUTPUT

Processed  630 models on 2 predictors in 1.8201320171356201 seconds.
AIC                                                17516.6
model    <statsmodels.regression.linear_model.Regressio...
Name: 211, dtype: object

SUPPLEMENT [1]

processSubset(X=train_x, y=train_y, feature_set = feature_columns)

{'model': <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x1fbccd16080>,
 'AIC': 16970.52868834004}

processSubset(X=train_x, y=train_y, feature_set = feature_columns[0:5])

{'model': <statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x000001FBCCDEB358>, 'AIC': 19176.91230693121}

SUPPLEMENT [2]

for combo in itertools.combinations(X.columns.difference(['const']), 2):
    print((list(combo)+['const']))

OUTPUT

['ABS', 'Age_08_04', 'const']
['ABS', 'Airbag_1', 'const']
['ABS', 'Airbag_2', 'const']
['ABS', 'Airco', 'const']
['ABS', 'Automatic', 'const']
['ABS', 'Automatic_airco', 'const']
['ABS', 'BOVAG_Guarantee', 'const']
['ABS', 'Backseat_Divider', 'const']
['ABS', 'Boardcomputer', 'const']
['ABS', 'CD_Player', 'const']
['ABS', 'CNG', 'const']
['ABS', 'Central_Lock', 'const']
['ABS', 'Cylinders', 'const']
['ABS', 'Diesel', 'const']
['ABS', 'Doors', 'const']
['ABS', 'Gears', 'const']
['ABS', 'Guarantee_Period', 'const']
['ABS', 'HP', 'const']
['ABS', 'KM', 'const']
['ABS', 'Met_Color', 'const']
['ABS', 'Metallic_Rim', 'const']
['ABS', 'Mfg_Month', 'const']
['ABS', 'Mfg_Year', 'const']
['ABS', 'Mfr_Guarantee', 'const']
['ABS', 'Mistlamps', 'const']
['ABS', 'Petrol', 'const']
['ABS', 'Power_Steering', 'const']
['ABS', 'Powered_Windows', 'const']
['ABS', 'Quarterly_Tax', 'const']
['ABS', 'Radio', 'const']
['ABS', 'Radio_cassette', 'const']
['ABS', 'Sport_Model', 'const']
['ABS', 'Tow_Bar', 'const']
['ABS', 'Weight', 'const']
['ABS', 'cc', 'const']
['Age_08_04', 'Airbag_1', 'const']
['Age_08_04', 'Airbag_2', 'const']
['Age_08_04', 'Airco', 'const']
['Age_08_04', 'Automatic', 'const']
['Age_08_04', 'Automatic_airco', 'const']
['Age_08_04', 'BOVAG_Guarantee', 'const']
['Age_08_04', 'Backseat_Divider', 'const']
['Age_08_04', 'Boardcomputer', 'const']
['Age_08_04', 'CD_Player', 'const']
['Age_08_04', 'CNG', 'const']
['Age_08_04', 'Central_Lock', 'const']
['Age_08_04', 'Cylinders', 'const']
['Age_08_04', 'Diesel', 'const']
['Age_08_04', 'Doors', 'const']
['Age_08_04', 'Gears', 'const']
['Age_08_04', 'Guarantee_Period', 'const']
['Age_08_04', 'HP', 'const']
['Age_08_04', 'KM', 'const']
['Age_08_04', 'Met_Color', 'const']
['Age_08_04', 'Metallic_Rim', 'const']
['Age_08_04', 'Mfg_Month', 'const']
['Age_08_04', 'Mfg_Year', 'const']
['Age_08_04', 'Mfr_Guarantee', 'const']
['Age_08_04', 'Mistlamps', 'const']
['Age_08_04', 'Petrol', 'const']
['Age_08_04', 'Power_Steering', 'const']
['Age_08_04', 'Powered_Windows', 'const']
['Age_08_04', 'Quarterly_Tax', 'const']
['Age_08_04', 'Radio', 'const']
['Age_08_04', 'Radio_cassette', 'const']
['Age_08_04', 'Sport_Model', 'const']
['Age_08_04', 'Tow_Bar', 'const']
['Age_08_04', 'Weight', 'const']
['Age_08_04', 'cc', 'const']
['Airbag_1', 'Airbag_2', 'const']
['Airbag_1', 'Airco', 'const']
['Airbag_1', 'Automatic', 'const']
['Airbag_1', 'Automatic_airco', 'const']
['Airbag_1', 'BOVAG_Guarantee', 'const']
['Airbag_1', 'Backseat_Divider', 'const']
['Airbag_1', 'Boardcomputer', 'const']
['Airbag_1', 'CD_Player', 'const']
['Airbag_1', 'CNG', 'const']
['Airbag_1', 'Central_Lock', 'const']
['Airbag_1', 'Cylinders', 'const']
['Airbag_1', 'Diesel', 'const']
['Airbag_1', 'Doors', 'const']
['Airbag_1', 'Gears', 'const']
['Airbag_1', 'Guarantee_Period', 'const']
['Airbag_1', 'HP', 'const']
['Airbag_1', 'KM', 'const']
['Airbag_1', 'Met_Color', 'const']
['Airbag_1', 'Metallic_Rim', 'const']
['Airbag_1', 'Mfg_Month', 'const']
['Airbag_1', 'Mfg_Year', 'const']
['Airbag_1', 'Mfr_Guarantee', 'const']
['Airbag_1', 'Mistlamps', 'const']
['Airbag_1', 'Petrol', 'const']
['Airbag_1', 'Power_Steering', 'const']
['Airbag_1', 'Powered_Windows', 'const']
['Airbag_1', 'Quarterly_Tax', 'const']
['Airbag_1', 'Radio', 'const']
['Airbag_1', 'Radio_cassette', 'const']
['Airbag_1', 'Sport_Model', 'const']
['Airbag_1', 'Tow_Bar', 'const']
['Airbag_1', 'Weight', 'const']
['Airbag_1', 'cc', 'const']
['Airbag_2', 'Airco', 'const']
['Airbag_2', 'Automatic', 'const']
['Airbag_2', 'Automatic_airco', 'const']
['Airbag_2', 'BOVAG_Guarantee', 'const']
['Airbag_2', 'Backseat_Divider', 'const']
['Airbag_2', 'Boardcomputer', 'const']
['Airbag_2', 'CD_Player', 'const']
['Airbag_2', 'CNG', 'const']
['Airbag_2', 'Central_Lock', 'const']
['Airbag_2', 'Cylinders', 'const']
['Airbag_2', 'Diesel', 'const']
['Airbag_2', 'Doors', 'const']
['Airbag_2', 'Gears', 'const']
['Airbag_2', 'Guarantee_Period', 'const']
['Airbag_2', 'HP', 'const']
['Airbag_2', 'KM', 'const']
['Airbag_2', 'Met_Color', 'const']
['Airbag_2', 'Metallic_Rim', 'const']
['Airbag_2', 'Mfg_Month', 'const']
['Airbag_2', 'Mfg_Year', 'const']
['Airbag_2', 'Mfr_Guarantee', 'const']
['Airbag_2', 'Mistlamps', 'const']
['Airbag_2', 'Petrol', 'const']
['Airbag_2', 'Power_Steering', 'const']
['Airbag_2', 'Powered_Windows', 'const']
['Airbag_2', 'Quarterly_Tax', 'const']
['Airbag_2', 'Radio', 'const']
['Airbag_2', 'Radio_cassette', 'const']
['Airbag_2', 'Sport_Model', 'const']
['Airbag_2', 'Tow_Bar', 'const']
['Airbag_2', 'Weight', 'const']
['Airbag_2', 'cc', 'const']
['Airco', 'Automatic', 'const']
['Airco', 'Automatic_airco', 'const']
['Airco', 'BOVAG_Guarantee', 'const']
['Airco', 'Backseat_Divider', 'const']
['Airco', 'Boardcomputer', 'const']
['Airco', 'CD_Player', 'const']
['Airco', 'CNG', 'const']
['Airco', 'Central_Lock', 'const']
['Airco', 'Cylinders', 'const']
['Airco', 'Diesel', 'const']
['Airco', 'Doors', 'const']
['Airco', 'Gears', 'const']
['Airco', 'Guarantee_Period', 'const']
['Airco', 'HP', 'const']
['Airco', 'KM', 'const']
['Airco', 'Met_Color', 'const']
['Airco', 'Metallic_Rim', 'const']
['Airco', 'Mfg_Month', 'const']
['Airco', 'Mfg_Year', 'const']
['Airco', 'Mfr_Guarantee', 'const']
['Airco', 'Mistlamps', 'const']
['Airco', 'Petrol', 'const']
['Airco', 'Power_Steering', 'const']
['Airco', 'Powered_Windows', 'const']
['Airco', 'Quarterly_Tax', 'const']
['Airco', 'Radio', 'const']
['Airco', 'Radio_cassette', 'const']
['Airco', 'Sport_Model', 'const']
['Airco', 'Tow_Bar', 'const']
['Airco', 'Weight', 'const']
['Airco', 'cc', 'const']
['Automatic', 'Automatic_airco', 'const']
['Automatic', 'BOVAG_Guarantee', 'const']
['Automatic', 'Backseat_Divider', 'const']
['Automatic', 'Boardcomputer', 'const']
['Automatic', 'CD_Player', 'const']
['Automatic', 'CNG', 'const']
['Automatic', 'Central_Lock', 'const']
['Automatic', 'Cylinders', 'const']
['Automatic', 'Diesel', 'const']
['Automatic', 'Doors', 'const']
['Automatic', 'Gears', 'const']
['Automatic', 'Guarantee_Period', 'const']
['Automatic', 'HP', 'const']
['Automatic', 'KM', 'const']
['Automatic', 'Met_Color', 'const']
['Automatic', 'Metallic_Rim', 'const']
['Automatic', 'Mfg_Month', 'const']
['Automatic', 'Mfg_Year', 'const']
['Automatic', 'Mfr_Guarantee', 'const']
['Automatic', 'Mistlamps', 'const']
['Automatic', 'Petrol', 'const']
['Automatic', 'Power_Steering', 'const']
['Automatic', 'Powered_Windows', 'const']
['Automatic', 'Quarterly_Tax', 'const']
['Automatic', 'Radio', 'const']
['Automatic', 'Radio_cassette', 'const']
['Automatic', 'Sport_Model', 'const']
['Automatic', 'Tow_Bar', 'const']
['Automatic', 'Weight', 'const']
['Automatic', 'cc', 'const']
['Automatic_airco', 'BOVAG_Guarantee', 'const']
['Automatic_airco', 'Backseat_Divider', 'const']
['Automatic_airco', 'Boardcomputer', 'const']
['Automatic_airco', 'CD_Player', 'const']
['Automatic_airco', 'CNG', 'const']
['Automatic_airco', 'Central_Lock', 'const']
['Automatic_airco', 'Cylinders', 'const']
['Automatic_airco', 'Diesel', 'const']
['Automatic_airco', 'Doors', 'const']
['Automatic_airco', 'Gears', 'const']
['Automatic_airco', 'Guarantee_Period', 'const']
['Automatic_airco', 'HP', 'const']
['Automatic_airco', 'KM', 'const']
['Automatic_airco', 'Met_Color', 'const']
['Automatic_airco', 'Metallic_Rim', 'const']
['Automatic_airco', 'Mfg_Month', 'const']
['Automatic_airco', 'Mfg_Year', 'const']
['Automatic_airco', 'Mfr_Guarantee', 'const']
['Automatic_airco', 'Mistlamps', 'const']
['Automatic_airco', 'Petrol', 'const']
['Automatic_airco', 'Power_Steering', 'const']
['Automatic_airco', 'Powered_Windows', 'const']
['Automatic_airco', 'Quarterly_Tax', 'const']
['Automatic_airco', 'Radio', 'const']
['Automatic_airco', 'Radio_cassette', 'const']
['Automatic_airco', 'Sport_Model', 'const']
['Automatic_airco', 'Tow_Bar', 'const']
['Automatic_airco', 'Weight', 'const']
['Automatic_airco', 'cc', 'const']
['BOVAG_Guarantee', 'Backseat_Divider', 'const']
['BOVAG_Guarantee', 'Boardcomputer', 'const']
['BOVAG_Guarantee', 'CD_Player', 'const']
['BOVAG_Guarantee', 'CNG', 'const']
['BOVAG_Guarantee', 'Central_Lock', 'const']
['BOVAG_Guarantee', 'Cylinders', 'const']
['BOVAG_Guarantee', 'Diesel', 'const']
['BOVAG_Guarantee', 'Doors', 'const']
['BOVAG_Guarantee', 'Gears', 'const']
['BOVAG_Guarantee', 'Guarantee_Period', 'const']
['BOVAG_Guarantee', 'HP', 'const']
['BOVAG_Guarantee', 'KM', 'const']
['BOVAG_Guarantee', 'Met_Color', 'const']
['BOVAG_Guarantee', 'Metallic_Rim', 'const']
['BOVAG_Guarantee', 'Mfg_Month', 'const']
['BOVAG_Guarantee', 'Mfg_Year', 'const']
['BOVAG_Guarantee', 'Mfr_Guarantee', 'const']
['BOVAG_Guarantee', 'Mistlamps', 'const']
['BOVAG_Guarantee', 'Petrol', 'const']
['BOVAG_Guarantee', 'Power_Steering', 'const']
['BOVAG_Guarantee', 'Powered_Windows', 'const']
['BOVAG_Guarantee', 'Quarterly_Tax', 'const']
['BOVAG_Guarantee', 'Radio', 'const']
['BOVAG_Guarantee', 'Radio_cassette', 'const']
['BOVAG_Guarantee', 'Sport_Model', 'const']
['BOVAG_Guarantee', 'Tow_Bar', 'const']
['BOVAG_Guarantee', 'Weight', 'const']
['BOVAG_Guarantee', 'cc', 'const']
['Backseat_Divider', 'Boardcomputer', 'const']
['Backseat_Divider', 'CD_Player', 'const']
['Backseat_Divider', 'CNG', 'const']
['Backseat_Divider', 'Central_Lock', 'const']
['Backseat_Divider', 'Cylinders', 'const']
['Backseat_Divider', 'Diesel', 'const']
['Backseat_Divider', 'Doors', 'const']
['Backseat_Divider', 'Gears', 'const']
['Backseat_Divider', 'Guarantee_Period', 'const']
['Backseat_Divider', 'HP', 'const']
['Backseat_Divider', 'KM', 'const']
['Backseat_Divider', 'Met_Color', 'const']
['Backseat_Divider', 'Metallic_Rim', 'const']
['Backseat_Divider', 'Mfg_Month', 'const']
['Backseat_Divider', 'Mfg_Year', 'const']
['Backseat_Divider', 'Mfr_Guarantee', 'const']
['Backseat_Divider', 'Mistlamps', 'const']
['Backseat_Divider', 'Petrol', 'const']
['Backseat_Divider', 'Power_Steering', 'const']
['Backseat_Divider', 'Powered_Windows', 'const']
['Backseat_Divider', 'Quarterly_Tax', 'const']
['Backseat_Divider', 'Radio', 'const']
['Backseat_Divider', 'Radio_cassette', 'const']
['Backseat_Divider', 'Sport_Model', 'const']
['Backseat_Divider', 'Tow_Bar', 'const']
['Backseat_Divider', 'Weight', 'const']
['Backseat_Divider', 'cc', 'const']
['Boardcomputer', 'CD_Player', 'const']
['Boardcomputer', 'CNG', 'const']
['Boardcomputer', 'Central_Lock', 'const']
['Boardcomputer', 'Cylinders', 'const']
['Boardcomputer', 'Diesel', 'const']
['Boardcomputer', 'Doors', 'const']
['Boardcomputer', 'Gears', 'const']
['Boardcomputer', 'Guarantee_Period', 'const']
['Boardcomputer', 'HP', 'const']
['Boardcomputer', 'KM', 'const']
['Boardcomputer', 'Met_Color', 'const']
['Boardcomputer', 'Metallic_Rim', 'const']
['Boardcomputer', 'Mfg_Month', 'const']
['Boardcomputer', 'Mfg_Year', 'const']
['Boardcomputer', 'Mfr_Guarantee', 'const']
['Boardcomputer', 'Mistlamps', 'const']
['Boardcomputer', 'Petrol', 'const']
['Boardcomputer', 'Power_Steering', 'const']
['Boardcomputer', 'Powered_Windows', 'const']
['Boardcomputer', 'Quarterly_Tax', 'const']
['Boardcomputer', 'Radio', 'const']
['Boardcomputer', 'Radio_cassette', 'const']
['Boardcomputer', 'Sport_Model', 'const']
['Boardcomputer', 'Tow_Bar', 'const']
['Boardcomputer', 'Weight', 'const']
['Boardcomputer', 'cc', 'const']
['CD_Player', 'CNG', 'const']
['CD_Player', 'Central_Lock', 'const']
['CD_Player', 'Cylinders', 'const']
['CD_Player', 'Diesel', 'const']
['CD_Player', 'Doors', 'const']
['CD_Player', 'Gears', 'const']
['CD_Player', 'Guarantee_Period', 'const']
['CD_Player', 'HP', 'const']
['CD_Player', 'KM', 'const']
['CD_Player', 'Met_Color', 'const']
['CD_Player', 'Metallic_Rim', 'const']
['CD_Player', 'Mfg_Month', 'const']
['CD_Player', 'Mfg_Year', 'const']
['CD_Player', 'Mfr_Guarantee', 'const']
['CD_Player', 'Mistlamps', 'const']
['CD_Player', 'Petrol', 'const']
['CD_Player', 'Power_Steering', 'const']
['CD_Player', 'Powered_Windows', 'const']
['CD_Player', 'Quarterly_Tax', 'const']
['CD_Player', 'Radio', 'const']
['CD_Player', 'Radio_cassette', 'const']
['CD_Player', 'Sport_Model', 'const']
['CD_Player', 'Tow_Bar', 'const']
['CD_Player', 'Weight', 'const']
['CD_Player', 'cc', 'const']
['CNG', 'Central_Lock', 'const']
['CNG', 'Cylinders', 'const']
['CNG', 'Diesel', 'const']
['CNG', 'Doors', 'const']
['CNG', 'Gears', 'const']
['CNG', 'Guarantee_Period', 'const']
['CNG', 'HP', 'const']
['CNG', 'KM', 'const']
['CNG', 'Met_Color', 'const']
['CNG', 'Metallic_Rim', 'const']
['CNG', 'Mfg_Month', 'const']
['CNG', 'Mfg_Year', 'const']
['CNG', 'Mfr_Guarantee', 'const']
['CNG', 'Mistlamps', 'const']
['CNG', 'Petrol', 'const']
['CNG', 'Power_Steering', 'const']
['CNG', 'Powered_Windows', 'const']
['CNG', 'Quarterly_Tax', 'const']
['CNG', 'Radio', 'const']
['CNG', 'Radio_cassette', 'const']
['CNG', 'Sport_Model', 'const']
['CNG', 'Tow_Bar', 'const']
['CNG', 'Weight', 'const']
['CNG', 'cc', 'const']
['Central_Lock', 'Cylinders', 'const']
['Central_Lock', 'Diesel', 'const']
['Central_Lock', 'Doors', 'const']
['Central_Lock', 'Gears', 'const']
['Central_Lock', 'Guarantee_Period', 'const']
['Central_Lock', 'HP', 'const']
['Central_Lock', 'KM', 'const']
['Central_Lock', 'Met_Color', 'const']
['Central_Lock', 'Metallic_Rim', 'const']
['Central_Lock', 'Mfg_Month', 'const']
['Central_Lock', 'Mfg_Year', 'const']
['Central_Lock', 'Mfr_Guarantee', 'const']
['Central_Lock', 'Mistlamps', 'const']
['Central_Lock', 'Petrol', 'const']
['Central_Lock', 'Power_Steering', 'const']
['Central_Lock', 'Powered_Windows', 'const']
['Central_Lock', 'Quarterly_Tax', 'const']
['Central_Lock', 'Radio', 'const']
['Central_Lock', 'Radio_cassette', 'const']
['Central_Lock', 'Sport_Model', 'const']
['Central_Lock', 'Tow_Bar', 'const']
['Central_Lock', 'Weight', 'const']
['Central_Lock', 'cc', 'const']
['Cylinders', 'Diesel', 'const']
['Cylinders', 'Doors', 'const']
['Cylinders', 'Gears', 'const']
['Cylinders', 'Guarantee_Period', 'const']
['Cylinders', 'HP', 'const']
['Cylinders', 'KM', 'const']
['Cylinders', 'Met_Color', 'const']
['Cylinders', 'Metallic_Rim', 'const']
['Cylinders', 'Mfg_Month', 'const']
['Cylinders', 'Mfg_Year', 'const']
['Cylinders', 'Mfr_Guarantee', 'const']
['Cylinders', 'Mistlamps', 'const']
['Cylinders', 'Petrol', 'const']
['Cylinders', 'Power_Steering', 'const']
['Cylinders', 'Powered_Windows', 'const']
['Cylinders', 'Quarterly_Tax', 'const']
['Cylinders', 'Radio', 'const']
['Cylinders', 'Radio_cassette', 'const']
['Cylinders', 'Sport_Model', 'const']
['Cylinders', 'Tow_Bar', 'const']
['Cylinders', 'Weight', 'const']
['Cylinders', 'cc', 'const']
['Diesel', 'Doors', 'const']
['Diesel', 'Gears', 'const']
['Diesel', 'Guarantee_Period', 'const']
['Diesel', 'HP', 'const']
['Diesel', 'KM', 'const']
['Diesel', 'Met_Color', 'const']
['Diesel', 'Metallic_Rim', 'const']
['Diesel', 'Mfg_Month', 'const']
['Diesel', 'Mfg_Year', 'const']
['Diesel', 'Mfr_Guarantee', 'const']
['Diesel', 'Mistlamps', 'const']
['Diesel', 'Petrol', 'const']
['Diesel', 'Power_Steering', 'const']
['Diesel', 'Powered_Windows', 'const']
['Diesel', 'Quarterly_Tax', 'const']
['Diesel', 'Radio', 'const']
['Diesel', 'Radio_cassette', 'const']
['Diesel', 'Sport_Model', 'const']
['Diesel', 'Tow_Bar', 'const']
['Diesel', 'Weight', 'const']
['Diesel', 'cc', 'const']
['Doors', 'Gears', 'const']
['Doors', 'Guarantee_Period', 'const']
['Doors', 'HP', 'const']
['Doors', 'KM', 'const']
['Doors', 'Met_Color', 'const']
['Doors', 'Metallic_Rim', 'const']
['Doors', 'Mfg_Month', 'const']
['Doors', 'Mfg_Year', 'const']
['Doors', 'Mfr_Guarantee', 'const']
['Doors', 'Mistlamps', 'const']
['Doors', 'Petrol', 'const']
['Doors', 'Power_Steering', 'const']
['Doors', 'Powered_Windows', 'const']
['Doors', 'Quarterly_Tax', 'const']
['Doors', 'Radio', 'const']
['Doors', 'Radio_cassette', 'const']
['Doors', 'Sport_Model', 'const']
['Doors', 'Tow_Bar', 'const']
['Doors', 'Weight', 'const']
['Doors', 'cc', 'const']
['Gears', 'Guarantee_Period', 'const']
['Gears', 'HP', 'const']
['Gears', 'KM', 'const']
['Gears', 'Met_Color', 'const']
['Gears', 'Metallic_Rim', 'const']
['Gears', 'Mfg_Month', 'const']
['Gears', 'Mfg_Year', 'const']
['Gears', 'Mfr_Guarantee', 'const']
['Gears', 'Mistlamps', 'const']
['Gears', 'Petrol', 'const']
['Gears', 'Power_Steering', 'const']
['Gears', 'Powered_Windows', 'const']
['Gears', 'Quarterly_Tax', 'const']
['Gears', 'Radio', 'const']
['Gears', 'Radio_cassette', 'const']
['Gears', 'Sport_Model', 'const']
['Gears', 'Tow_Bar', 'const']
['Gears', 'Weight', 'const']
['Gears', 'cc', 'const']
['Guarantee_Period', 'HP', 'const']
['Guarantee_Period', 'KM', 'const']
['Guarantee_Period', 'Met_Color', 'const']
['Guarantee_Period', 'Metallic_Rim', 'const']
['Guarantee_Period', 'Mfg_Month', 'const']
['Guarantee_Period', 'Mfg_Year', 'const']
['Guarantee_Period', 'Mfr_Guarantee', 'const']
['Guarantee_Period', 'Mistlamps', 'const']
['Guarantee_Period', 'Petrol', 'const']
['Guarantee_Period', 'Power_Steering', 'const']
['Guarantee_Period', 'Powered_Windows', 'const']
['Guarantee_Period', 'Quarterly_Tax', 'const']
['Guarantee_Period', 'Radio', 'const']
['Guarantee_Period', 'Radio_cassette', 'const']
['Guarantee_Period', 'Sport_Model', 'const']
['Guarantee_Period', 'Tow_Bar', 'const']
['Guarantee_Period', 'Weight', 'const']
['Guarantee_Period', 'cc', 'const']
['HP', 'KM', 'const']
['HP', 'Met_Color', 'const']
['HP', 'Metallic_Rim', 'const']
['HP', 'Mfg_Month', 'const']
['HP', 'Mfg_Year', 'const']
['HP', 'Mfr_Guarantee', 'const']
['HP', 'Mistlamps', 'const']
['HP', 'Petrol', 'const']
['HP', 'Power_Steering', 'const']
['HP', 'Powered_Windows', 'const']
['HP', 'Quarterly_Tax', 'const']
['HP', 'Radio', 'const']
['HP', 'Radio_cassette', 'const']
['HP', 'Sport_Model', 'const']
['HP', 'Tow_Bar', 'const']
['HP', 'Weight', 'const']
['HP', 'cc', 'const']
['KM', 'Met_Color', 'const']
['KM', 'Metallic_Rim', 'const']
['KM', 'Mfg_Month', 'const']
['KM', 'Mfg_Year', 'const']
['KM', 'Mfr_Guarantee', 'const']
['KM', 'Mistlamps', 'const']
['KM', 'Petrol', 'const']
['KM', 'Power_Steering', 'const']
['KM', 'Powered_Windows', 'const']
['KM', 'Quarterly_Tax', 'const']
['KM', 'Radio', 'const']
['KM', 'Radio_cassette', 'const']
['KM', 'Sport_Model', 'const']
['KM', 'Tow_Bar', 'const']
['KM', 'Weight', 'const']
['KM', 'cc', 'const']
['Met_Color', 'Metallic_Rim', 'const']
['Met_Color', 'Mfg_Month', 'const']
['Met_Color', 'Mfg_Year', 'const']
['Met_Color', 'Mfr_Guarantee', 'const']
['Met_Color', 'Mistlamps', 'const']
['Met_Color', 'Petrol', 'const']
['Met_Color', 'Power_Steering', 'const']
['Met_Color', 'Powered_Windows', 'const']
['Met_Color', 'Quarterly_Tax', 'const']
['Met_Color', 'Radio', 'const']
['Met_Color', 'Radio_cassette', 'const']
['Met_Color', 'Sport_Model', 'const']
['Met_Color', 'Tow_Bar', 'const']
['Met_Color', 'Weight', 'const']
['Met_Color', 'cc', 'const']
['Metallic_Rim', 'Mfg_Month', 'const']
['Metallic_Rim', 'Mfg_Year', 'const']
['Metallic_Rim', 'Mfr_Guarantee', 'const']
['Metallic_Rim', 'Mistlamps', 'const']
['Metallic_Rim', 'Petrol', 'const']
['Metallic_Rim', 'Power_Steering', 'const']
['Metallic_Rim', 'Powered_Windows', 'const']
['Metallic_Rim', 'Quarterly_Tax', 'const']
['Metallic_Rim', 'Radio', 'const']
['Metallic_Rim', 'Radio_cassette', 'const']
['Metallic_Rim', 'Sport_Model', 'const']
['Metallic_Rim', 'Tow_Bar', 'const']
['Metallic_Rim', 'Weight', 'const']
['Metallic_Rim', 'cc', 'const']
['Mfg_Month', 'Mfg_Year', 'const']
['Mfg_Month', 'Mfr_Guarantee', 'const']
['Mfg_Month', 'Mistlamps', 'const']
['Mfg_Month', 'Petrol', 'const']
['Mfg_Month', 'Power_Steering', 'const']
['Mfg_Month', 'Powered_Windows', 'const']
['Mfg_Month', 'Quarterly_Tax', 'const']
['Mfg_Month', 'Radio', 'const']
['Mfg_Month', 'Radio_cassette', 'const']
['Mfg_Month', 'Sport_Model', 'const']
['Mfg_Month', 'Tow_Bar', 'const']
['Mfg_Month', 'Weight', 'const']
['Mfg_Month', 'cc', 'const']
['Mfg_Year', 'Mfr_Guarantee', 'const']
['Mfg_Year', 'Mistlamps', 'const']
['Mfg_Year', 'Petrol', 'const']
['Mfg_Year', 'Power_Steering', 'const']
['Mfg_Year', 'Powered_Windows', 'const']
['Mfg_Year', 'Quarterly_Tax', 'const']
['Mfg_Year', 'Radio', 'const']
['Mfg_Year', 'Radio_cassette', 'const']
['Mfg_Year', 'Sport_Model', 'const']
['Mfg_Year', 'Tow_Bar', 'const']
['Mfg_Year', 'Weight', 'const']
['Mfg_Year', 'cc', 'const']
['Mfr_Guarantee', 'Mistlamps', 'const']
['Mfr_Guarantee', 'Petrol', 'const']
['Mfr_Guarantee', 'Power_Steering', 'const']
['Mfr_Guarantee', 'Powered_Windows', 'const']
['Mfr_Guarantee', 'Quarterly_Tax', 'const']
['Mfr_Guarantee', 'Radio', 'const']
['Mfr_Guarantee', 'Radio_cassette', 'const']
['Mfr_Guarantee', 'Sport_Model', 'const']
['Mfr_Guarantee', 'Tow_Bar', 'const']
['Mfr_Guarantee', 'Weight', 'const']
['Mfr_Guarantee', 'cc', 'const']
['Mistlamps', 'Petrol', 'const']
['Mistlamps', 'Power_Steering', 'const']
['Mistlamps', 'Powered_Windows', 'const']
['Mistlamps', 'Quarterly_Tax', 'const']
['Mistlamps', 'Radio', 'const']
['Mistlamps', 'Radio_cassette', 'const']
['Mistlamps', 'Sport_Model', 'const']
['Mistlamps', 'Tow_Bar', 'const']
['Mistlamps', 'Weight', 'const']
['Mistlamps', 'cc', 'const']
['Petrol', 'Power_Steering', 'const']
['Petrol', 'Powered_Windows', 'const']
['Petrol', 'Quarterly_Tax', 'const']
['Petrol', 'Radio', 'const']
['Petrol', 'Radio_cassette', 'const']
['Petrol', 'Sport_Model', 'const']
['Petrol', 'Tow_Bar', 'const']
['Petrol', 'Weight', 'const']
['Petrol', 'cc', 'const']
['Power_Steering', 'Powered_Windows', 'const']
['Power_Steering', 'Quarterly_Tax', 'const']
['Power_Steering', 'Radio', 'const']
['Power_Steering', 'Radio_cassette', 'const']
['Power_Steering', 'Sport_Model', 'const']
['Power_Steering', 'Tow_Bar', 'const']
['Power_Steering', 'Weight', 'const']
['Power_Steering', 'cc', 'const']
['Powered_Windows', 'Quarterly_Tax', 'const']
['Powered_Windows', 'Radio', 'const']
['Powered_Windows', 'Radio_cassette', 'const']
['Powered_Windows', 'Sport_Model', 'const']
['Powered_Windows', 'Tow_Bar', 'const']
['Powered_Windows', 'Weight', 'const']
['Powered_Windows', 'cc', 'const']
['Quarterly_Tax', 'Radio', 'const']
['Quarterly_Tax', 'Radio_cassette', 'const']
['Quarterly_Tax', 'Sport_Model', 'const']
['Quarterly_Tax', 'Tow_Bar', 'const']
['Quarterly_Tax', 'Weight', 'const']
['Quarterly_Tax', 'cc', 'const']
['Radio', 'Radio_cassette', 'const']
['Radio', 'Sport_Model', 'const']
['Radio', 'Tow_Bar', 'const']
['Radio', 'Weight', 'const']
['Radio', 'cc', 'const']
['Radio_cassette', 'Sport_Model', 'const']
['Radio_cassette', 'Tow_Bar', 'const']
['Radio_cassette', 'Weight', 'const']
['Radio_cassette', 'cc', 'const']
['Sport_Model', 'Tow_Bar', 'const']
['Sport_Model', 'Weight', 'const']
['Sport_Model', 'cc', 'const']
['Tow_Bar', 'Weight', 'const']
['Tow_Bar', 'cc', 'const']
['Weight', 'cc', 'const']

Measure training time

# 변수 선택에 따른 학습시간과 저장
models = pd.DataFrame(columns=["AIC", "model"])
tic = time.time()
for i in range(1,4):
    models.loc[i] = getBest(X=train_x,y=train_y,k=i)
toc = time.time()
print("Total elapsed time:", (toc-tic), "seconds.")

Processed  36 models on 1 predictors in 0.09873557090759277 seconds.
Processed  630 models on 2 predictors in 1.3473966121673584 seconds.
Processed  7140 models on 3 predictors in 17.01948356628418 seconds.
Total elapsed time: 18.805707454681396 seconds.

models

	AIC		model
17824.309811	<statsmodels.regression.linear_model.Regressio...
17579.120147	<statsmodels.regression.linear_model.Regressio...
17351.640619	<statsmodels.regression.linear_model.Regressio...

models.loc[3, "model"].summary()

OUTPUT

# 모든 변수들 모델링 한것과 비교 
print("full model Rsquared: ","{:.5f}".format(fitted_full_model.rsquared))
print("full model AIC: ","{:.5f}".format(fitted_full_model.aic))
print("full model MSE: ","{:.5f}".format(fitted_full_model.mse_total))
print("selected model Rsquared: ","{:.5f}".format(models.loc[3, "model"].rsquared))
print("selected model AIC: ","{:.5f}".format(models.loc[3, "model"].aic))
print("selected model MSE: ","{:.5f}".format(models.loc[3, "model"].mse_total))

full model Rsquared:  0.91141
full model AIC:  16960.68542
full model MSE:  13196639.65991
selected model Rsquared:  0.86124
selected model AIC:  17351.64062
selected model MSE:  13196639.65991

# Plot the result
plt.figure(figsize=(20,10))
plt.rcParams.update({'font.size': 18, 'lines.markersize': 10})

## Mallow Cp
plt.subplot(2, 2, 1)
Cp= models.apply(lambda row: (row[1].params.shape[0]+(row[1].mse_total-
                               fitted_full_model.mse_total)*(train_x.shape[0]-
                                row[1].params.shape[0])/fitted_full_model.mse_total
                               ), axis=1)
plt.plot(Cp)
plt.plot(Cp.argmin(), Cp.min(), "or")
plt.xlabel('# Predictors')
plt.ylabel('Cp')

# adj-rsquared plot
# adj-rsquared = Explained variation / Total variation
adj_rsquared = models.apply(lambda row: row[1].rsquared_adj, axis=1)
plt.subplot(2, 2, 2)
plt.plot(adj_rsquared)
plt.plot(adj_rsquared.argmax(), adj_rsquared.max(), "or")
plt.xlabel('# Predictors')
plt.ylabel('adjusted rsquared')

# aic
aic = models.apply(lambda row: row[1].aic, axis=1)
plt.subplot(2, 2, 3)
plt.plot(aic)
plt.plot(aic.argmin(), aic.min(), "or")
plt.xlabel('# Predictors')
plt.ylabel('AIC')

# bic
bic = models.apply(lambda row: row[1].bic, axis=1)
plt.subplot(2, 2, 4)
plt.plot(bic)
plt.plot(bic.argmin(), bic.min(), "or")
plt.xlabel(' # Predictors')
plt.ylabel('BIC')

OUTPUT

다운로드

Modify regression model(Feedforward selection)

########전진선택법(step=1)

def forward(X, y, predictors):
    # 데이터 변수들이 미리정의된 predictors에 있는지 없는지 확인 및 분류
    remaining_predictors = [p for p in X.columns.difference(['const']) if p not in predictors]
    tic = time.time()
    results = []
    for p in remaining_predictors:
        results.append(processSubset(X=X, y= y, feature_set=predictors+[p]+['const']))
    # 데이터프레임으로 변환
    models = pd.DataFrame(results)

    # AIC가 가장 낮은 것을 선택
    best_model = models.loc[models['AIC'].argmin()] # index
    toc = time.time()
    print("Processed ", models.shape[0], "models on", len(predictors)+1, "predictors in", (toc-tic))
    print('Selected predictors:',best_model['model'].model.exog_names,' AIC:',best_model[0] )
    return best_model


#### 전진선택법 모델

def forward_model(X,y):
    Fmodels = pd.DataFrame(columns=["AIC", "model"])
    tic = time.time()
    # 미리 정의된 데이터 변수
    predictors = []
    # 변수 1~10개 : 0~9 -> 1~10
    for i in range(1, len(X.columns.difference(['const'])) + 1):
        Forward_result = forward(X=X,y=y,predictors=predictors)
        if i > 1:
            if Forward_result['AIC'] > Fmodel_before:
                break
        Fmodels.loc[i] = Forward_result
        predictors = Fmodels.loc[i]["model"].model.exog_names
        Fmodel_before = Fmodels.loc[i]["AIC"]
        predictors = [ k for k in predictors if k != 'const']
    toc = time.time()
    print("Total elapsed time:", (toc - tic), "seconds.")

    return(Fmodels['model'][len(Fmodels['model'])])

OUTPUT

Forward_best_model = forward_model(X=train_x, y= train_y)

Processed  36 models on 1 predictors in 0.08973240852355957
Selected predictors: ['Mfg_Year', 'const']  AIC: 17755.072760646137
Processed  35 models on 2 predictors in 0.09027957916259766
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'const']  AIC: 17504.57948159159
Processed  34 models on 3 predictors in 0.06283736228942871
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'const']  AIC: 17398.182235131313
Processed  33 models on 4 predictors in 0.06283116340637207
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'const']  AIC: 17150.1641103143
Processed  32 models on 5 predictors in 0.07981634140014648
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'const']  AIC: 17091.096715621316
Processed  31 models on 6 predictors in 0.0840911865234375
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'const']  AIC: 17055.57896394218
Processed  30 models on 7 predictors in 0.0738370418548584
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'const']  AIC: 17033.36951099978
Processed  29 models on 8 predictors in 0.06878113746643066
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'const']  AIC: 17019.85679678918
Processed  28 models on 9 predictors in 0.09375500679016113
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'const']  AIC: 16995.322287055787
Processed  27 models on 10 predictors in 0.10174226760864258
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'const']  AIC: 16983.818299485778
Processed  26 models on 11 predictors in 0.10377311706542969
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'const']  AIC: 16964.290655626864
Processed  25 models on 12 predictors in 0.11771559715270996
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'const']  AIC: 16928.537083027266
Processed  24 models on 13 predictors in 0.12260055541992188
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'const']  AIC: 16921.374043681804
Processed  23 models on 14 predictors in 0.12865686416625977
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'const']  AIC: 16918.48093923768
Processed  22 models on 15 predictors in 0.16057229042053223
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'const']  AIC: 16916.04018485048
Processed  21 models on 16 predictors in 0.18660974502563477
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'const']  AIC: 16912.806529494097
Processed  20 models on 17 predictors in 0.11269783973693848
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'const']  AIC: 16909.805620763276
Processed  19 models on 18 predictors in 0.10549688339233398
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'const']  AIC: 16907.82736115733
Processed  18 models on 19 predictors in 0.10871052742004395
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'const']  AIC: 16907.14151076706
Processed  17 models on 20 predictors in 0.11475992202758789
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Mfg_Month', 'const']  AIC: 16906.91814803349
Processed  16 models on 21 predictors in 0.1306447982788086
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Mfg_Month', 'Gears', 'const']  AIC: 16906.641600994546
Processed  15 models on 22 predictors in 0.11366558074951172
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Mfg_Month', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994557
Total elapsed time: 2.4412221908569336 seconds.

Forward_best_model.aic

16906.641600994546

Modify regression model(Backward selection)

######## 후진선택법(step=1)
def backward(X,y,predictors):
    tic = time.time()
    results = []
    # 데이터 변수들이 미리정의된 predictors 조합 확인
    for combo in itertools.combinations(predictors, len(predictors) - 1):
        results.append(processSubset(X=X, y= y,feature_set=list(combo)+['const']))
    models = pd.DataFrame(results)
    # 가장 낮은 AIC를 가진 모델을 선택
    best_model = models.loc[models['AIC'].argmin()]
    toc = time.time()
    print("Processed ", models.shape[0], "models on", len(predictors) - 1, "predictors in",
          (toc - tic))
    print('Selected predictors:',best_model['model'].model.exog_names,' AIC:',best_model[0] )
    return best_model
    

# 후진 소거법 모델
def backward_model(X, y):
    Bmodels = pd.DataFrame(columns=["AIC", "model"], index = range(1,len(X.columns)))
    tic = time.time()
    predictors = X.columns.difference(['const'])
    Bmodel_before = processSubset(X,y,predictors)['AIC']
    while (len(predictors) > 1):
        Backward_result = backward(X=train_x, y= train_y, predictors = predictors)
        if Backward_result['AIC'] > Bmodel_before:
            break
        Bmodels.loc[len(predictors) - 1] = Backward_result
        predictors = Bmodels.loc[len(predictors) - 1]["model"].model.exog_names
        Bmodel_before = Backward_result['AIC']
        predictors = [ k for k in predictors if k != 'const']

    toc = time.time()
    print("Total elapsed time:", (toc - tic), "seconds.")
    return (Bmodels['model'].dropna().iloc[0])

OUTPUT

Backward_best_model = backward_model(X=train_x,y=train_y)

Processed  36 models on 35 predictors in 0.5307836532592773
Selected predictors: ['ABS', 'Age_08_04', 'Airbag_1', 'Airbag_2', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Doors', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Met_Color', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'cc', 'const']  AIC: 16919.554953086037
Processed  35 models on 34 predictors in 0.5086104869842529
Selected predictors: ['ABS', 'Age_08_04', 'Airbag_1', 'Airbag_2', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Doors', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'cc', 'const']  AIC: 16917.56065836032
Processed  34 models on 33 predictors in 0.47121691703796387
Selected predictors: ['ABS', 'Age_08_04', 'Airbag_2', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Doors', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'cc', 'const']  AIC: 16915.573733838028
Processed  33 models on 32 predictors in 0.3795206546783447
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Doors', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'cc', 'const']  AIC: 16913.747808225216
Processed  32 models on 31 predictors in 0.33935022354125977
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'cc', 'const']  AIC: 16912.053646583932
Processed  31 models on 30 predictors in 0.29421567916870117
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'cc', 'const']  AIC: 16910.726801088837
Processed  30 models on 29 predictors in 0.29419445991516113
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16909.60778490872
Processed  29 models on 28 predictors in 0.25033020973205566
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16908.55343667602
Processed  28 models on 27 predictors in 0.2254021167755127
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16907.502655808014
Processed  27 models on 26 predictors in 0.20220327377319336
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16906.70136854976
Processed  26 models on 25 predictors in 0.20789861679077148
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16906.676844492846
Processed  25 models on 24 predictors in 0.18823885917663574
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CNG', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16906.641600994557
Processed  24 models on 23 predictors in 0.1715404987335205
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CNG', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16906.641600994557
Processed  23 models on 22 predictors in 0.15358972549438477
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CNG', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Mfg_Month', 'Mfr_Guarantee', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16906.641600994557
Processed  22 models on 21 predictors in 0.1326441764831543
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CNG', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Mfg_Month', 'Mfr_Guarantee', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16906.64160099456
Total elapsed time: 4.432608604431152 seconds.

Backward_best_model.aic

16906.641600994557

Modify regression model(Stepwise)

def Stepwise_model(X,y):
    Stepmodels = pd.DataFrame(columns=["AIC", "model"])
    tic = time.time()
    predictors = []
    Smodel_before = processSubset(X,y,predictors+['const'])['AIC']
    # 변수 1~10개 : 0~9 -> 1~10
    for i in range(1, len(X.columns.difference(['const'])) + 1):
        Forward_result = forward(X=X, y=y, predictors=predictors) # constant added
        print('forward')
        Stepmodels.loc[i] = Forward_result
        predictors = Stepmodels.loc[i]["model"].model.exog_names
        predictors = [ k for k in predictors if k != 'const']
        Backward_result = backward(X=X, y=y, predictors=predictors)
        if Backward_result['AIC']< Forward_result['AIC']:
            Stepmodels.loc[i] = Backward_result
            predictors = Stepmodels.loc[i]["model"].model.exog_names
            Smodel_before = Stepmodels.loc[i]["AIC"]
            predictors = [ k for k in predictors if k != 'const']
            print('backward')
        if Stepmodels.loc[i]['AIC']> Smodel_before:
            break
        else:
            Smodel_before = Stepmodels.loc[i]["AIC"]
    toc = time.time()
    print("Total elapsed time:", (toc - tic), "seconds.")
    return (Stepmodels['model'][len(Stepmodels['model'])])

OUTPUT

Stepwise_best_model=Stepwise_model(X=train_x,y=train_y)

Processed  36 models on 1 predictors in 0.09873390197753906
Selected predictors: ['Mfg_Year', 'const']  AIC: 17755.072760646137
forward
Processed  1 models on 0 predictors in 0.009046554565429688
Selected predictors: ['const']  AIC: 19355.08856819785
Processed  35 models on 2 predictors in 0.130143404006958
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'const']  AIC: 17504.57948159159
forward
Processed  2 models on 1 predictors in 0.015958309173583984
Selected predictors: ['Mfg_Year', 'const']  AIC: 17755.072760646137
Processed  34 models on 3 predictors in 0.1465761661529541
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'const']  AIC: 17398.182235131313
forward
Processed  3 models on 2 predictors in 0.016946792602539062
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'const']  AIC: 17504.57948159159
Processed  33 models on 4 predictors in 0.1317136287689209
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'const']  AIC: 17150.1641103143
forward
Processed  4 models on 3 predictors in 0.015963077545166016
Selected predictors: ['Mfg_Year', 'Weight', 'KM', 'const']  AIC: 17306.79774531549
Processed  32 models on 5 predictors in 0.08627820014953613
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'const']  AIC: 17091.096715621316
forward
Processed  5 models on 4 predictors in 0.011969327926635742
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'const']  AIC: 17150.1641103143
Processed  31 models on 6 predictors in 0.07229804992675781
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'const']  AIC: 17055.57896394218
forward
Processed  6 models on 5 predictors in 0.016991615295410156
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'const']  AIC: 17091.096715621316
Processed  30 models on 7 predictors in 0.05830645561218262
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'const']  AIC: 17033.36951099978
forward
Processed  7 models on 6 predictors in 0.01599907875061035
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'const']  AIC: 17055.57896394218
Processed  29 models on 8 predictors in 0.06846237182617188
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'const']  AIC: 17019.85679678918
forward
Processed  8 models on 7 predictors in 0.017005205154418945
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'const']  AIC: 17033.36951099978
Processed  28 models on 9 predictors in 0.11175131797790527
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'const']  AIC: 16995.322287055787
forward
Processed  9 models on 8 predictors in 0.01898479461669922
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Guarantee_Period', 'BOVAG_Guarantee', 'const']  AIC: 17012.519514899912
Processed  27 models on 10 predictors in 0.1047210693359375
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'const']  AIC: 16983.818299485778
forward
Processed  10 models on 9 predictors in 0.03191518783569336
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'const']  AIC: 16995.322287055787
Processed  26 models on 11 predictors in 0.10965585708618164
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'const']  AIC: 16964.290655626864
forward
Processed  11 models on 10 predictors in 0.04288458824157715
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'CNG', 'Quarterly_Tax', 'const']  AIC: 16978.68338783714
Processed  25 models on 12 predictors in 0.15957117080688477
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'const']  AIC: 16928.537083027266
forward
Processed  12 models on 11 predictors in 0.08481073379516602
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'Quarterly_Tax', 'Petrol', 'const']  AIC: 16932.104261902947
Processed  24 models on 13 predictors in 0.17156600952148438
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'const']  AIC: 16921.374043681804
forward
Processed  13 models on 12 predictors in 0.09979891777038574
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'const']  AIC: 16924.75355369365
Processed  23 models on 14 predictors in 0.17253684997558594
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'const']  AIC: 16918.48093923768
forward
Processed  14 models on 13 predictors in 0.08875823020935059
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'const']  AIC: 16921.374043681804
Processed  22 models on 15 predictors in 0.15457653999328613
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'const']  AIC: 16916.04018485048
forward
Processed  15 models on 14 predictors in 0.10401105880737305
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'const']  AIC: 16918.48093923768
Processed  21 models on 16 predictors in 0.15857505798339844
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'const']  AIC: 16912.806529494097
forward
Processed  16 models on 15 predictors in 0.11768555641174316
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'const']  AIC: 16916.04018485048
Processed  20 models on 17 predictors in 0.13663506507873535
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'const']  AIC: 16909.805620763276
forward
Processed  17 models on 16 predictors in 0.08477330207824707
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Airco', 'ABS', 'Sport_Model', 'const']  AIC: 16912.187005800086
Processed  19 models on 18 predictors in 0.10272526741027832
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'const']  AIC: 16907.82736115733
forward
Processed  18 models on 17 predictors in 0.1127007007598877
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'const']  AIC: 16908.531987499395
Processed  18 models on 19 predictors in 0.11521244049072266
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'const']  AIC: 16907.14151076706
forward
Processed  19 models on 18 predictors in 0.15088891983032227
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'const']  AIC: 16907.82736115733
Processed  17 models on 20 predictors in 0.16663289070129395
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Mfg_Month', 'const']  AIC: 16906.91814803349
forward
Processed  20 models on 19 predictors in 0.2127993106842041
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'const']  AIC: 16907.14151076706
Processed  16 models on 21 predictors in 0.10770010948181152
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Mfg_Month', 'Gears', 'const']  AIC: 16906.641600994546
forward
Processed  21 models on 20 predictors in 0.1256864070892334
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Mfg_Month', 'const']  AIC: 16906.91814803349
Processed  15 models on 22 predictors in 0.10097765922546387
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Mfg_Month', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994557
forward
Processed  22 models on 21 predictors in 0.17354369163513184
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.15059447288513184
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.19049072265625
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.1495981216430664
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.13814973831176758
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.11270356178283691
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.15808415412902832
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.09469938278198242
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.14464545249938965
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.1326456069946289
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.1595752239227295
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.11668825149536133
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.13965892791748047
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.17457914352416992
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.19448089599609375
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.10567355155944824
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.15602421760559082
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.09773826599121094
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.1266651153564453
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.0937490463256836
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.12469983100891113
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.11266231536865234
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.14760518074035645
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.10172867774963379
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.19098138809204102
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.14887738227844238
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.15437889099121094
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.10134077072143555
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.14920735359191895
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Total elapsed time: 8.44080114364624 seconds.

Stepwise_best_model.aic

16906.641600994506

Model performance

# 모델에 의해 예측된/추정된 값 <->  test_y
pred_y_full = fitted_full_model.predict(test_x)
pred_y_forward = Forward_best_model.predict(test_x[Forward_best_model.model.exog_names])
pred_y_backward = Backward_best_model.predict(test_x[Backward_best_model.model.exog_names])
pred_y_stepwise = Stepwise_best_model.predict(test_x[Stepwise_best_model.model.exog_names])

perf_mat = pd.DataFrame(columns=["ALL", "FORWARD", "BACKWARD", "STEPWISE"],
                        index =['MSE', 'RMSE','MAE', 'MAPE'])
			
def mean_absolute_percentage_error(y_true, y_pred):
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
from sklearn import metrics

# 성능지표
perf_mat.loc['MSE']['ALL'] = metrics.mean_squared_error(test_y,pred_y_full)
perf_mat.loc['MSE']['FORWARD'] = metrics.mean_squared_error(test_y,pred_y_forward)
perf_mat.loc['MSE']['BACKWARD'] = metrics.mean_squared_error(test_y,pred_y_backward)
perf_mat.loc['MSE']['STEPWISE'] = metrics.mean_squared_error(test_y,pred_y_stepwise)

perf_mat.loc['RMSE']['ALL'] = np.sqrt(metrics.mean_squared_error(test_y, pred_y_full))
perf_mat.loc['RMSE']['FORWARD'] = np.sqrt(metrics.mean_squared_error(test_y, pred_y_forward))
perf_mat.loc['RMSE']['BACKWARD'] = np.sqrt(metrics.mean_squared_error(test_y, pred_y_backward))
perf_mat.loc['RMSE']['STEPWISE'] = np.sqrt(metrics.mean_squared_error(test_y, pred_y_stepwise))

perf_mat.loc['MAE']['ALL'] = metrics.mean_absolute_error(test_y, pred_y_full)
perf_mat.loc['MAE']['FORWARD'] = metrics.mean_absolute_error(test_y, pred_y_forward)
perf_mat.loc['MAE']['BACKWARD'] = metrics.mean_absolute_error(test_y, pred_y_backward)
perf_mat.loc['MAE']['STEPWISE'] = metrics.mean_absolute_error(test_y, pred_y_stepwise)

perf_mat.loc['MAPE']['ALL'] = mean_absolute_percentage_error(test_y, pred_y_full)
perf_mat.loc['MAPE']['FORWARD'] = mean_absolute_percentage_error(test_y, pred_y_forward)
perf_mat.loc['MAPE']['BACKWARD'] = mean_absolute_percentage_error(test_y, pred_y_backward)
perf_mat.loc['MAPE']['STEPWISE'] = mean_absolute_percentage_error(test_y, pred_y_stepwise)

print(perf_mat)

              ALL      FORWARD     BACKWARD     STEPWISE
MSE   1.44149e+06  1.46142e+06  1.46142e+06  1.46142e+06
RMSE      1200.62      1208.89      1208.89      1208.89
MAE       853.494      863.524      863.524      863.524
MAPE      8.48549      8.59054      8.59054      8.59054

The number of params

print(Forward_best_model.params.shape, Backward_best_model.params.shape, Stepwise_best_model.params.shape)

(24,) (24,) (24,)

print(len(fitted_full_model.params))
print(len(Forward_best_model.params))
print(len(Backward_best_model.params))
print(len(Stepwise_best_model.params))

Logistic regression about dataset on real world

Dataset download

Dataset Description

Experience 경력
Income 수입
Famliy 가족단위
CCAvg 월 카드사용량 
Education 교육수준 (1: undergrad; 2, Graduate; 3; Advance )
Mortgage 가계대출
Securities account 유가증권계좌유무
CD account 양도예금증서 계좌 유무
Online 온라인계좌유무
CreidtCard 신용카드유무 

Data preprocessing

# 분석에 필요한 패키지 불러오기
import os
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, roc_auc_score, roc_curve
import statsmodels.api as sm
import matplotlib.pyplot as plt
import itertools
import time

# 의미없는 변수 제거
ploan = pd.read_csv(r'C:\Users\userd\Desktop\dataset\Personal_Loan.csv')
ploan_processed = ploan.dropna().drop(['ID','ZIP Code'], axis=1, inplace=False)
ploan_processed = sm.add_constant(ploan_processed, has_constant='add')

# split into train and test
feature_columns = list(ploan_processed.columns.difference(["Personal Loan"]))
X = ploan_processed[feature_columns]
y = ploan_processed['Personal Loan'] # 대출여부: 1 or 0
train_x, test_x, train_y, test_y = train_test_split(X, y, stratify=y,train_size=0.7,test_size=0.3,random_state=42)

Data : Input

ploan

'\nExperience 경력\nIncome 수입\nFamliy 가족단위\nCCAvg 월 카드사용량 \nEducation 교육수준 (1: undergrad; 2, Graduate; 3; Advance )\nMortgage 가계대출\nSecurities account 유가증권계좌유무\nCD account 양도예금증서 계좌 유무\nOnline 온라인계좌유무\nCreidtCard 신용카드유무 \n\n'

ploan_processed

	Age	Experience	Income	Family	CCAvg	Education	Mortgage	Personal Loan	Securities Account	CD Account	Online	CreditCard
25	1		49	4	1.6	1		0		0		1			0		0	0
45	19		34	3	1.5	1		0		0		1			0		0	0
39	15		11	1	1.0	1		0		0		0			0		0	0
35	9		100	1	2.7	2		0		0		0			0		0	0
35	8		45	4	1.0	2		0		0		0			0		0	1
...	...	...	...	...	...	...	...	...	...	...	...	...
46	22		70	4	1.9	1		212		0		0			0		0	1
63	37		32	3	0.7	2		141		0		0			0		0	0
33	9		14	3	0.9	3		114		0		0			0		0	0
38	14		111	2	6.1	1		326		0		0			0		0	0
53	27		38	4	2.8	2		144		0		1			0		1	0
rows × 12 columns

constant_ploan_processed

	const	Age	Experience	Income	Family	CCAvg	Education	Mortgage	Personal Loan	Securities Account	CD Account	Online	CreditCard
1.0	25	1		49	4	1.6	1		0		0		1			0		0	0
1.0	45	19		34	3	1.5	1		0		0		1			0		0	0
1.0	39	15		11	1	1.0	1		0		0		0			0		0	0
1.0	35	9		100	1	2.7	2		0		0		0			0		0	0
1.0	35	8		45	4	1.0	2		0		0		0			0		0	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...
1.0	46	22		70	4	1.9	1		212		0		0			0		0	1
1.0	63	37		32	3	0.7	2		141		0		0			0		0	0
1.0	33	9		14	3	0.9	3		114		0		0			0		0	0
1.0	38	14		111	2	6.1	1		326		0		0			0		0	0
1.0	53	27		38	4	2.8	2		144		0		1			0		1	0
rows × 13 columns

print(train_x.shape, test_x.shape, train_y.shape, test_y.shape)

(1750, 12) (750, 12) (1750,) (750,)

Regression analysis

model = sm.Logit(train_y, train_x)
results = model.fit(method='newton')
results.summary()

OUTPUT : Model results

results.params

Age                    0.024471
CCAvg                  0.098468
CD Account             4.372577
CreditCard            -1.237447
Education              1.520329
Experience            -0.007032
Family                 0.757911
Income                 0.054695
Mortgage              -0.000133
Online                -0.440746
Securities Account    -1.852006
const                -13.920298
dtype: float64

np.exp(results.params)

Age                   1.024773e+00
CCAvg                 1.103479e+00
CD Account            7.924761e+01
CreditCard            2.901239e-01
Education             4.573729e+00
Experience            9.929928e-01
Family                2.133814e+00
Income                1.056218e+00
Mortgage              9.998665e-01
Online                6.435563e-01
Securities Account    1.569221e-01
const                 9.005163e-07
dtype: float64

Model prediction

pred_y = results.predict(test_x)
pred_y

  0.012968
   0.023841
  0.001210
  0.196245
   0.006610
  0.241812
  0.060656
  0.339803
  0.002238
   0.003269
   0.004334
   0.000976
  0.001064
  0.084981
   0.026756
  0.010442
  0.038788
  0.006997
  0.091474
   0.032079
  0.004988
   0.004391
  0.017692
  0.014201
   0.005766
  0.001604
  0.141404
  0.612456
  0.435395
  0.015946
          ...   
  0.001546
   0.000588
    0.004755
  0.001897
  0.561103
  0.472680
  0.145754
  0.002263
   0.836443
  0.000111
   0.036772
  0.977346
   0.016186
   0.000613
  0.063208
  0.000021
  0.003421
   0.008169
  0.001812
  0.009835
  0.010325
  0.073346
   0.000349
  0.046096
   0.000239
   0.019982
   0.959460
   0.005239
    0.011344
  0.084464
Length: 750, dtype: float64

def cut_off(y,threshold):
    Y = y.copy() # copy함수를 사용하여 이전의 y값이 변화지 않게 함
    Y[Y>threshold]=1
    Y[Y<=threshold]=0
    return(Y.astype(int))

pred_Y = cut_off(pred_y,0.5)
pred_Y

  0
   0
  0
  0
   0
  0
  0
  0
  0
   0
   0
   0
  0
  0
   0
  0
  0
  0
  0
   0
  0
   0
  0
  0
   0
  0
  0
  1
  0
  0
       ..
  0
   0
    0
  0
  1
  0
  0
  0
   1
  0
   0
  1
   0
   0
  0
  0
  0
   0
  0
  0
  0
  0
   0
  0
   0
   0
   1
   0
    0
  0
Length: 750, dtype: int32

Model diagnosis

print("model AIC: ","{:.5f}".format(results.aic))

model AIC:  482.69329

Model performance(1)

pred_y = results.predict(test_x)

def cut_off(y,threshold):
    Y = y.copy() # copy함수를 사용하여 이전의 y값이 변화지 않게 함
    Y[Y>threshold]=1
    Y[Y<=threshold]=0
    return(Y.astype(int))

pred_Y = cut_off(pred_y,0.5)

cfmat = confusion_matrix(test_y,pred_Y)

def acc(cfmat) :
    acc=(cfmat[0,0]+cfmat[1,1])/np.sum(cfmat) ## accuracy
    return(acc)

Accuracy

print(cfmat)

[[660  13]
 [ 29  48]]

(cfmat[0,0]+cfmat[1,1])/np.sum(cfmat) ## accuracy

0.944

Performance based on cut-off values

def cut_off(y,threshold):
    Y = y.copy() # copy함수를 사용하여 이전의 y값이 변화지 않게 함
    Y[Y>threshold]=1
    Y[Y<=threshold]=0
    return(Y.astype(int))

def acc(cfmat) :
    acc=(cfmat[0,0]+cfmat[1,1])/np.sum(cfmat) ## accuracy
    return(acc)
    
pred_y = results.predict(test_x)    
pred_Y = cut_off(pred_y,0.5)
cfmat = confusion_matrix(test_y,pred_Y)

threshold = np.arange(0,1,0.1)
table = pd.DataFrame(columns=['ACC'])
for i in threshold:
    pred_Y = cut_off(pred_y,i)
    cfmat = confusion_matrix(test_y, pred_Y)
    table.loc[i] = acc(cfmat)
table.index.name='threshold'
table.columns.name='performance'
table

performance	ACC
threshold	
0	0.102667
1	0.908000
2	0.922667
3	0.932000
4	0.936000
5	0.944000
6	0.949333
7	0.946667
8	0.941333
9	0.937333

Model performance(2)

# sklearn ROC 패키지 제공
pred_y = results.predict(test_x)    
fpr, tpr, thresholds = metrics.roc_curve(test_y, pred_y, pos_label=1)

# Print ROC curve
plt.plot(fpr,tpr)

# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)

AUC: 0.9463923891858513

다운로드 (6)

Modify regression model

feature_columns = list(ploan_processed.columns.difference(["Personal Loan","Experience",  "Mortgage"]))
X = ploan_processed[feature_columns]
y = ploan_processed['Personal Loan'] # 대출여부: 1 or 0

train_x2, test_x2, train_y, test_y = train_test_split(X, y, stratify=y,train_size=0.7,test_size=0.3,random_state=42)
model = sm.Logit(train_y, train_x2)
results2 = model.fit(method='newton')
results2.summary()

Data : Input

print(train_x.shape, test_x.shape, train_y.shape, test_y.shape)

(1750, 12) (750, 12) (1750,) (750,)

OUTPUT : Model results

Model prediction

def cut_off(y,threshold):
    Y = y.copy() # copy함수를 사용하여 이전의 y값이 변화지 않게 함
    Y[Y>threshold]=1
    Y[Y<=threshold]=0
    return(Y.astype(int))

pred_y = results2.predict(test_x2)
pred_Y = cut_off(pred_y,0.5)
pred_Y

  0
   0
  0
  0
   0
  0
  0
  0
  0
   0
   0
   0
  0
  0
   0
  0
  0
  0
  0
   0
  0
   0
  0
  0
   0
  0
  0
  1
  0
  0
       ..
  0
   0
    0
  0
  1
  0
  0
  0
   1
  0
   0
  1
   0
   0
  0
  0
  0
   0
  0
  0
  0
  0
   0
  0
   0
   0
   1
   0
    0
  0
Length: 750, dtype: int32

Model performance(1)

def cut_off(y,threshold):
    Y = y.copy() # copy함수를 사용하여 이전의 y값이 변화지 않게 함
    Y[Y>threshold]=1
    Y[Y<=threshold]=0
    return(Y.astype(int))
    
pred_y = results2.predict(test_x2)
pred_Y = cut_off(pred_y,0.5)
cfmat = confusion_matrix(test_y,pred_Y)

def acc(cfmat) :
    acc=(cfmat[0,0]+cfmat[1,1])/np.sum(cfmat) ## accuracy
    return(acc)

acc(cfmat)   ## accuracy

0.944

Confusion matrix

print(cfmat)

[[660  13]
 [ 29  48]]

Performance based on cut-off values

def cut_off(y,threshold):
    Y = y.copy() # copy함수를 사용하여 이전의 y값이 변화지 않게 함
    Y[Y>threshold]=1
    Y[Y<=threshold]=0
    return(Y.astype(int))
    
def acc(cfmat) :
    acc=(cfmat[0,0]+cfmat[1,1])/np.sum(cfmat) ## accuracy
    return(acc)

pred_y = results2.predict(test_x2)
pred_Y = cut_off(pred_y,0.5)
cfmat = confusion_matrix(test_y,pred_Y)

threshold = np.arange(0,1,0.1)
table = pd.DataFrame(columns=['ACC'])
for i in threshold:
    pred_Y = cut_off(pred_y,i)
    cfmat = confusion_matrix(test_y, pred_Y)
    table.loc[i] = acc(cfmat)
table.index.name='threshold'
table.columns.name='performance'
table

performance	ACC
threshold	
0	0.102667
1	0.908000
2	0.922667
3	0.932000
4	0.936000
5	0.944000
6	0.949333
7	0.946667
8	0.941333
9	0.937333

Model performance(2)

# sklearn ROC 패키지 제공
pred_y = results2.predict(test_x2)
fpr, tpr, thresholds = metrics.roc_curve(test_y, pred_y, pos_label=1)

# Print ROC curve
plt.plot(fpr,tpr)

# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)

AUC: 0.9465467667547905

다운로드 (7)

Modify regression model(Variables selection)

feature_columns = list(ploan_processed.columns.difference(["Personal Loan"]))
X = ploan_processed[feature_columns]
y = ploan_processed['Personal Loan'] # 대출여부: 1 or 0

train_x, test_x, train_y, test_y = train_test_split(X, y, stratify=y,train_size=0.7,test_size=0.3,random_state=42)

def processSubset(X,y, feature_set):
            model = sm.Logit(y,X[list(feature_set)])
            regr = model.fit()
            AIC = regr.aic
            return {"model":regr, "AIC":AIC}
        
'''
전진선택법
'''
def forward(X, y, predictors):
    # 데이터 변수들이 미리정의된 predictors에 있는지 없는지 확인 및 분류
    remaining_predictors = [p for p in X.columns.difference(['const']) if p not in predictors]
    tic = time.time()
    results = []
    for p in remaining_predictors:
        results.append(processSubset(X=X, y= y, feature_set=predictors+[p]+['const']))
    # 데이터프레임으로 변환
    models = pd.DataFrame(results)

    # AIC가 가장 낮은 것을 선택
    best_model = models.loc[models['AIC'].argmin()] # index
    toc = time.time()
    print("Processed ", models.shape[0], "models on", len(predictors)+1, "predictors in", (toc-tic))
    print('Selected predictors:',best_model['model'].model.exog_names,' AIC:',best_model[0] )
    return best_model

def forward_model(X,y):
    Fmodels = pd.DataFrame(columns=["AIC", "model"])
    tic = time.time()
    # 미리 정의된 데이터 변수
    predictors = []
    # 변수 1~10개 : 0~9 -> 1~10
    for i in range(1, len(X.columns.difference(['const'])) + 1):
        Forward_result = forward(X=X,y=y,predictors=predictors)
        if i > 1:
            if Forward_result['AIC'] > Fmodel_before:
                break
        Fmodels.loc[i] = Forward_result
        predictors = Fmodels.loc[i]["model"].model.exog_names
        Fmodel_before = Fmodels.loc[i]["AIC"]
        predictors = [ k for k in predictors if k != 'const']
    toc = time.time()
    print("Total elapsed time:", (toc - tic), "seconds.")

    return(Fmodels['model'][len(Fmodels['model'])])


'''
후진소거법
'''
def backward(X,y,predictors):
    tic = time.time()
    results = []
    
    # 데이터 변수들이 미리정의된 predictors 조합 확인
    for combo in itertools.combinations(predictors, len(predictors) - 1):
        results.append(processSubset(X=X, y= y,feature_set=list(combo)+['const']))
    models = pd.DataFrame(results)
    
    # 가장 낮은 AIC를 가진 모델을 선택
    best_model = models.loc[models['AIC'].argmin()]
    toc = time.time()
    print("Processed ", models.shape[0], "models on", len(predictors) - 1, "predictors in",
          (toc - tic))
    print('Selected predictors:',best_model['model'].model.exog_names,' AIC:',best_model[0] )
    return best_model


def backward_model(X, y):
    Bmodels = pd.DataFrame(columns=["AIC", "model"], index = range(1,len(X.columns)))
    tic = time.time()
    predictors = X.columns.difference(['const'])
    Bmodel_before = processSubset(X,y,predictors)['AIC']
    while (len(predictors) > 1):
        Backward_result = backward(X=train_x, y= train_y, predictors = predictors)
        if Backward_result['AIC'] > Bmodel_before:
            break
        Bmodels.loc[len(predictors) - 1] = Backward_result
        predictors = Bmodels.loc[len(predictors) - 1]["model"].model.exog_names
        Bmodel_before = Backward_result['AIC']
        predictors = [ k for k in predictors if k != 'const']

    toc = time.time()
    print("Total elapsed time:", (toc - tic), "seconds.")
    return (Bmodels['model'].dropna().iloc[0])


'''
단계적 선택법
'''
def Stepwise_model(X,y):
    Stepmodels = pd.DataFrame(columns=["AIC", "model"])
    tic = time.time()
    predictors = []
    Smodel_before = processSubset(X,y,predictors+['const'])['AIC']
    # 변수 1~10개 : 0~9 -> 1~10
    for i in range(1, len(X.columns.difference(['const'])) + 1):
        Forward_result = forward(X=X, y=y, predictors=predictors) # constant added
        print('forward')
        Stepmodels.loc[i] = Forward_result
        predictors = Stepmodels.loc[i]["model"].model.exog_names
        predictors = [ k for k in predictors if k != 'const']
        Backward_result = backward(X=X, y=y, predictors=predictors)
        if Backward_result['AIC']< Forward_result['AIC']:
            Stepmodels.loc[i] = Backward_result
            predictors = Stepmodels.loc[i]["model"].model.exog_names
            Smodel_before = Stepmodels.loc[i]["AIC"]
            predictors = [ k for k in predictors if k != 'const']
            print('backward')
        if Stepmodels.loc[i]['AIC']> Smodel_before:
            break
        else:
            Smodel_before = Stepmodels.loc[i]["AIC"]
    toc = time.time()
    print("Total elapsed time:", (toc - tic), "seconds.")
    return (Stepmodels['model'][len(Stepmodels['model'])])
    
    
def cut_off(y,threshold):
    Y = y.copy() # copy함수를 사용하여 이전의 y값이 변화지 않게 함
    Y[Y>threshold]=1
    Y[Y<=threshold]=0
    return(Y.astype(int))
    
def acc(cfmat) :
    acc=(cfmat[0,0]+cfmat[1,1])/np.sum(cfmat) ## accuracy
    return(acc)
    
    
Forward_best_model = forward_model(X=train_x, y= train_y)
Backward_best_model = backward_model(X=train_x,y=train_y)
Stepwise_best_model = Stepwise_model(X=train_x,y=train_y)

pred_y_full = results2.predict(test_x2) # full model
pred_y_forward = Forward_best_model.predict(test_x[Forward_best_model.model.exog_names])
pred_y_backward = Backward_best_model.predict(test_x[Backward_best_model.model.exog_names])
pred_y_stepwise = Stepwise_best_model.predict(test_x[Stepwise_best_model.model.exog_names])

pred_Y_full= cut_off(pred_y_full,0.5)
pred_Y_forward = cut_off(pred_y_forward,0.5)
pred_Y_backward = cut_off(pred_y_backward,0.5)
pred_Y_stepwise = cut_off(pred_y_stepwise,0.5)

cfmat_full = confusion_matrix(test_y, pred_Y_full)
cfmat_forward = confusion_matrix(test_y, pred_Y_forward)
cfmat_backward = confusion_matrix(test_y, pred_Y_backward)
cfmat_stepwise = confusion_matrix(test_y, pred_Y_stepwise)

Selected model

Forward_best_model = forward_model(X=train_x, y= train_y)

OUTPUT

Optimization terminated successfully.
         Current function value: 0.329986
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.284217
         Iterations 7
Optimization terminated successfully.
         Current function value: 0.296731
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.330062
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.322763
         Iterations 7
Optimization terminated successfully.
         Current function value: 0.329995
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.327824
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.205738
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.324953
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.329912
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.330059
         Iterations 6
Processed  11 models on 1 predictors in 0.06183505058288574
Selected predictors: ['Income', 'const']  AIC: 724.0825012461598
Optimization terminated successfully.
         Current function value: 0.205431
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205682
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.185721
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205517
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.169107
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205563
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.182286
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205735
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205561
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205167
         Iterations 8
Processed  10 models on 2 predictors in 0.05884265899658203
Selected predictors: ['Income', 'Education', 'const']  AIC: 597.8752580578658
Optimization terminated successfully.
         Current function value: 0.168881
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.168679
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152041
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.168833
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.168897
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.154924
         Iterations 9
Optimization terminated successfully.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:21: FutureWarning: 'argmin' is deprecated, use 'idxmin' instead. The behavior of 'argmin'
will be corrected to return the positional minimum in the future.
Use 'series.values.argmin' to get the position of the minimum now.

         Current function value: 0.169073
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.169052
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.168642
         Iterations 9
Processed  9 models on 3 predictors in 0.07081055641174316
Selected predictors: ['Income', 'Education', 'CD Account', 'const']  AIC: 540.1423230958794
Optimization terminated successfully.
         Current function value: 0.152028
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.151411
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.148163
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152036
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.139352
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152015
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.151151
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.150075
         Iterations 9
Processed  8 models on 4 predictors in 0.057845115661621094
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'const']  AIC: 497.73316075623126
Optimization terminated successfully.
         Current function value: 0.138887
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.138758
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136599
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.138901
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.139349
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.138959
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.137180
         Iterations 9
Processed  7 models on 5 predictors in 0.053856849670410156
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'const']  AIC: 490.0954047541096
Optimization terminated successfully.
         Current function value: 0.136127
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135996
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136142
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136574
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135928
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.133263
         Iterations 9
Processed  6 models on 6 predictors in 0.056847572326660156
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'const']  AIC: 480.41892123708624
Optimization terminated successfully.
         Current function value: 0.132630
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132650
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132646
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.133238
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132361
         Iterations 9
Processed  5 models on 7 predictors in 0.03989291191101074
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'const']  AIC: 479.2643543252462
Optimization terminated successfully.
         Current function value: 0.131791
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131772
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131803
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132343
         Iterations 9
Processed  4 models on 8 predictors in 0.03989434242248535
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'const']  AIC: 479.2012205305657
Optimization terminated successfully.
         Current function value: 0.131062
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131077
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131771
         Iterations 9
Processed  3 models on 9 predictors in 0.02792525291442871
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'Age', 'const']  AIC: 478.7181848799073
Optimization terminated successfully.
         Current function value: 0.131061
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131057
         Iterations 9
Processed  2 models on 10 predictors in 0.02393651008605957
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'Age', 'Mortgage', 'const']  AIC: 480.6980587902294
Total elapsed time: 0.5485327243804932 seconds.

Backward_best_model = backward_model(X=train_x,y=train_y)

OUTPUT

Optimization terminated successfully.
         Current function value: 0.137663
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.134821
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131859
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131061
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.214795
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.142500
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131057
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.154241
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135440
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152443
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131753
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131072
         Iterations 9
Processed  11 models on 10 predictors in 0.12366890907287598
Selected predictors: ['Age', 'CCAvg', 'CD Account', 'CreditCard', 'Education', 'Family', 'Income', 'Mortgage', 'Online', 'Securities Account', 'const']  AIC: 480.6980587902294
Optimization terminated successfully.
         Current function value: 0.134824
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131862
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131062
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.215827
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.142665
         Iterations 9
Optimization terminated successfully.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:61: FutureWarning: 'argmin' is deprecated, use 'idxmin' instead. The behavior of 'argmin'
will be corrected to return the positional minimum in the future.
Use 'series.values.argmin' to get the position of the minimum now.

         Current function value: 0.155447
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135443
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152478
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131755
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131771
         Iterations 9
Processed  10 models on 9 predictors in 0.0967409610748291
Selected predictors: ['Age', 'CCAvg', 'CD Account', 'CreditCard', 'Education', 'Family', 'Income', 'Online', 'Securities Account', 'const']  AIC: 478.7181848799073
Optimization terminated successfully.
         Current function value: 0.134831
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131871
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.218281
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.142684
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.155797
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135444
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152482
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131791
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131772
         Iterations 9
Processed  9 models on 8 predictors in 0.07679510116577148
Selected predictors: ['CCAvg', 'CD Account', 'CreditCard', 'Education', 'Family', 'Income', 'Online', 'Securities Account', 'const']  AIC: 479.2012205305657
Total elapsed time: 0.3181488513946533 seconds.

Stepwise_best_model = Stepwise_model(X=train_x,y=train_y)

OUTPUT

Optimization terminated successfully.
         Current function value: 0.330076
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.329986
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.284217
         Iterations 7
Optimization terminated successfully.
         Current function value: 0.296731
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.330062
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.322763
         Iterations 7
Optimization terminated successfully.
         Current function value: 0.329995
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.327824
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.205738
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.324953
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.329912
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.330059
         Iterations 6
Processed  11 models on 1 predictors in 0.06789159774780273
Selected predictors: ['Income', 'const']  AIC: 724.0825012461598
forward
Optimization terminated successfully.
         Current function value: 0.330076
         Iterations 6
Processed  1 models on 0 predictors in 0.008976459503173828
Selected predictors: ['const']  AIC: 1157.267296321307
Optimization terminated successfully.
         Current function value: 0.205431
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205682
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.185721
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205517
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.169107
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205563
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.182286
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205735
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205561
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205167
         Iterations 8
Processed  10 models on 2 predictors in 0.07081007957458496
Selected predictors: ['Income', 'Education', 'const']  AIC: 597.8752580578658
forward
Optimization terminated successfully.
         Current function value: 0.205738
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.322763
         Iterations 7
Processed  2 models on 1 predictors in 0.017953157424926758
Selected predictors: ['Income', 'const']  AIC: 724.0825012461598
Optimization terminated successfully.
         Current function value: 0.168881
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.168679
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152041
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.168833
         Iterations 8
Optimization terminated successfully.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:21: FutureWarning: 'argmin' is deprecated, use 'idxmin' instead. The behavior of 'argmin'
will be corrected to return the positional minimum in the future.
Use 'series.values.argmin' to get the position of the minimum now.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:61: FutureWarning: 'argmin' is deprecated, use 'idxmin' instead. The behavior of 'argmin'
will be corrected to return the positional minimum in the future.
Use 'series.values.argmin' to get the position of the minimum now.

         Current function value: 0.168897
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.154924
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.169073
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.169052
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.168642
         Iterations 9
Processed  9 models on 3 predictors in 0.06981372833251953
Selected predictors: ['Income', 'Education', 'CD Account', 'const']  AIC: 540.1423230958794
forward
Optimization terminated successfully.
         Current function value: 0.169107
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.185721
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.288940
         Iterations 7
Processed  3 models on 2 predictors in 0.02293872833251953
Selected predictors: ['Income', 'Education', 'const']  AIC: 597.8752580578658
Optimization terminated successfully.
         Current function value: 0.152028
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.151411
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.148163
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152036
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.139352
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152015
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.151151
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.150075
         Iterations 9
Processed  8 models on 4 predictors in 0.06681990623474121
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'const']  AIC: 497.73316075623126
forward
Optimization terminated successfully.
         Current function value: 0.152041
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.154924
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.164270
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.287431
         Iterations 7
Processed  4 models on 3 predictors in 0.04787254333496094
Selected predictors: ['Income', 'Education', 'CD Account', 'const']  AIC: 540.1423230958794
Optimization terminated successfully.
         Current function value: 0.138887
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.138758
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136599
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.138901
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.139349
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.138959
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.137180
         Iterations 9
Processed  7 models on 5 predictors in 0.06382942199707031
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'const']  AIC: 490.0954047541096
forward
Optimization terminated successfully.
         Current function value: 0.139352
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.148163
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.154854
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.160828
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.282426
         Iterations 7
Processed  5 models on 4 predictors in 0.04089093208312988
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'const']  AIC: 497.73316075623126
Optimization terminated successfully.
         Current function value: 0.136127
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135996
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136142
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136574
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135928
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.133263
         Iterations 9
Processed  6 models on 6 predictors in 0.06083846092224121
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'const']  AIC: 480.41892123708624
forward
Optimization terminated successfully.
         Current function value: 0.136599
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.137180
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.144927
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.154299
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.157364
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.273321
         Iterations 7
Processed  6 models on 5 predictors in 0.042886972427368164
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'const']  AIC: 490.0954047541096
Optimization terminated successfully.
         Current function value: 0.132630
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132650
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132646
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.133238
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132361
         Iterations 9
Processed  5 models on 7 predictors in 0.05186176300048828
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'const']  AIC: 479.2643543252462
forward
Optimization terminated successfully.
         Current function value: 0.133263
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135928
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136688
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.143335
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.154141
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.156593
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.271509
         Iterations 7
Processed  7 models on 6 predictors in 0.07081055641174316
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'const']  AIC: 480.41892123708624
Optimization terminated successfully.
         Current function value: 0.131791
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131772
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131803
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132343
         Iterations 9
Processed  4 models on 8 predictors in 0.03690147399902344
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'const']  AIC: 479.2012205305657
forward
Optimization terminated successfully.
         Current function value: 0.132361
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132650
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135373
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136112
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.142716
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.153670
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.156410
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.218291
         Iterations 8
Processed  8 models on 7 predictors in 0.07579731941223145
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'const']  AIC: 479.2643543252462
Optimization terminated successfully.
         Current function value: 0.131062
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131077
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131771
         Iterations 9
Processed  3 models on 9 predictors in 0.029920101165771484
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'Age', 'const']  AIC: 478.7181848799073
forward
Optimization terminated successfully.
         Current function value: 0.131772
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131791
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131871
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.134831
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135444
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.142684
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152482
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.155797
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.218281
         Iterations 8
Processed  9 models on 8 predictors in 0.07579827308654785
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'const']  AIC: 479.2012205305657
Optimization terminated successfully.
         Current function value: 0.131061
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131057
         Iterations 9
Processed  2 models on 10 predictors in 0.03091716766357422
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'Age', 'Mortgage', 'const']  AIC: 480.6980587902294
forward
Optimization terminated successfully.
         Current function value: 0.131062
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131771
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131755
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131862
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.134824
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135443
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.142665
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152478
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.155447
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.215827
         Iterations 8
Processed  10 models on 9 predictors in 0.08178138732910156
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'Age', 'const']  AIC: 478.7181848799073
backward
Optimization terminated successfully.
         Current function value: 0.131061
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131057
         Iterations 9
Processed  2 models on 10 predictors in 0.029919862747192383
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'Age', 'Mortgage', 'const']  AIC: 480.6980587902294
forward
Optimization terminated successfully.
         Current function value: 0.131062
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131771
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131755
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131862
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.134824
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135443
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.142665
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152478
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.155447
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.215827
         Iterations 8
Processed  10 models on 9 predictors in 0.08776473999023438
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'Age', 'const']  AIC: 478.7181848799073
backward
Total elapsed time: 1.2626218795776367 seconds.

Model performance(1)

print(acc(cfmat_full))
print(acc(cfmat_forward))
print(acc(cfmat_backward))
print(acc(cfmat_stepwise))

Model performance(2)

fpr, tpr, thresholds = metrics.roc_curve(test_y, pred_y_full, pos_label=1)
# Print ROC curve
plt.plot(fpr,tpr)
# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)

AUC: 0.9465467667547905

다운로드 (1)

fpr, tpr, thresholds = metrics.roc_curve(test_y, pred_y_forward, pos_label=1)
# Print ROC curve
plt.plot(fpr,tpr)
# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)

AUC: 0.9465467667547905

다운로드 (2)

fpr, tpr, thresholds = metrics.roc_curve(test_y, pred_y_backward, pos_label=1)
# Print ROC curve
plt.plot(fpr,tpr)
# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)

AUC: 0.9465467667547905

다운로드 (3)

fpr, tpr, thresholds = metrics.roc_curve(test_y, pred_y_stepwise, pos_label=1)
# Print ROC curve
plt.plot(fpr,tpr)
# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)

AUC: 0.9465467667547905

다운로드 (4)

###성능면에서는 네 모델이 큰 차이가 없음
print(len(Forward_best_model.model.exog_names))
print(len(Backward_best_model.model.exog_names))
print(len(Stepwise_best_model.model.exog_names))

10
10
10

Regression with sklearn

from sklearn import datasets
from sklearn import model_selection
from sklearn import linear_model 
import matplotlib.pyplot as plt 
import numpy as np

X_all, y_all = datasets.make_regression(n_samples=50, n_features=50, n_informative=10)
X_train, X_test, y_train, y_test = model_selection.train_test_split(X_all, y_all, train_size=0.5)
model = linear_model.LinearRegression()
model.fit(X_train, y_train)

def sse(resid):
    return np.sum(resid**2) 
    
resid_train = y_train - model.predict(X_train) 
sse_train = sse(resid_train)   
sse_train

resid_test = y_test - model.predict(X_test)  
sse_test = sse(resid_test)   
sse_test 

# R-squared score 
model.score(X_train, y_train) 
model.score(X_test, y_test) 

def plot_residuals_and_coeff(resid_train, resid_test, coeff): 
    fig, axes = plt.subplots(1, 3, figsize=(12, 3))  
    axes[0].bar(np.arange(len(resid_train)), resid_train) 
    axes[0].set_xlabel("sample number")  
    axes[0].set_ylabel("residual")    
    axes[0].set_title("training data")   
    axes[1].bar(np.arange(len(resid_test)), resid_test) 
    axes[1].set_xlabel("sample number")  
    axes[1].set_ylabel("residual")   
    axes[1].set_title("testing data")  
    axes[2].bar(np.arange(len(coeff)), coeff)  
    axes[2].set_xlabel("coefficient number")
    axes[2].set_ylabel("coefficient")   
    fig.tight_layout()   
    return fig, axes
    
fig, ax = plot_residuals_and_coeff(resid_train, resid_test,  model.coef_)

Figure_1

Regression with tensorflow

Regression with pytorch

List of posts followed by this article

Reference

6626070
2997924

AI02, Regression

Contents

Simple linear regression

Model performance indicators for training dataset

Diagnosis for regression

Multivariate linear regression

Model performance indicators for training dataset

Diagnosis for regression

Multicollinearity

Logistic regression model

Model performance indicators

Diagnosis for regression

Nonlinear regression

Linearization

Penalty of regression model

Implementation with a variety of library

Regression with statsmodel

Simple linear regression about artificial dataset

Multivariate linear regression about artificial dataset

Multivariate linear regression about dataset on real world

Multivariate nonlinear regression about dataset on real world

Logistic regression about dataset on real world

Regression with sklearn

Regression with tensorflow

Regression with pytorch

6626070 2997924

AI02, Regression

Contents

Simple linear regression

Model performance indicators for training dataset

Diagnosis for regression

Multivariate linear regression

Model performance indicators for training dataset

Diagnosis for regression

Multicollinearity

Logistic regression model

Model performance indicators

Diagnosis for regression

Nonlinear regression

Linearization

Penalty of regression model

Implementation with a variety of library

Regression with statsmodel

Simple linear regression about artificial dataset

Multivariate linear regression about artificial dataset

Multivariate linear regression about dataset on real world

Multivariate nonlinear regression about dataset on real world

Logistic regression about dataset on real world

Regression with sklearn

Regression with tensorflow

Regression with pytorch

6626070
2997924