6626070
2997924

AI02, Regression

Back to the previous pageMeachine learning
List of posts to read before reading this article


Contents


Simple linear regression

DERIVING


RESULTS





Model performance indicators for training dataset

Diagnosis for regression

  • Residuals Scatter plot
  • Normal Q-Q Plot
  • Residual vs Fitted plot





Multivariate linear regression

DERIVING


RESULTS





Model performance indicators for training dataset




Diagnosis for regression

  • Residuals Scatter plot
  • Normal Q-Q Plot
  • Residual vs Fitted plot




Multicollinearity

Multicollinearity refers to a situation in which two or more explanatory variables in a multiple regression model are highly linearly related.

Detection of multicollinearity : Variance inflation factor(VIF)


Way to relieve multicollinearity

  • with eliminatation of any variables(Feature Selection)
    • Variables selection
      • Feedforward selection
      • Backward selection
      • Stepwise
    • Correlation coefficient
    • Lasso
    • etc
  • without eliminatation of any variables





Logistic regression model

OUTPUT

Model performance indicators




Diagnosis for regression

  • Residuals Scatter plot
  • Normal Q-Q Plot
  • Residual vs Fitted plot





Nonlinear regression




Linearization





Penalty of regression model

OUTPUT





Implementation with a variety of library

Regression with statsmodel

STEP INPUT PROCESS OUTPUT
1 csv file Data preprocessing train dataset, test dataset
2 train dataset, test dataset Regression analysis full model
3 full model Modify regression model forward model, backward model, stepwise model




Simple linear regression about artificial dataset

Data preprocessing

import numpy as np
import pandas as pd
import statsmodels.api as sm

def f(x,a,b):
    return a*x + b

x = np.random.random(1000)
a = 3
b = 5

target = f(x,a,b)
df_input = pd.DataFrame(x)
df_target = pd.DataFrame(target)
df = pd.concat([df_input, df_target], axis=1)
df.columns = ['input','target']
Input = df['input']
Target = df['target']
constant_input = sm.add_constant(Input, has_constant='add')
Data : Input
Input.head()

OUTPUT

0    0.830166
1    0.542949
2    0.357683
3    0.688297
4    0.645634
Name: input, dtype: float64
constant_input.head()

OUTPUT

	const	input
0	1.0	0.830166
1	1.0	0.542949
2	1.0	0.357683
3	1.0	0.688297
4	1.0	0.645634

Data : Target
Target.head()

OUTPUT

0    7.490499
1    6.628847
2    6.073050
3    7.064890
4    6.936902
Name: target, dtype: float64


Regression analysis

model = sm.OLS(Target, constant_input)
fitted_model = model.fit()
fitted_model.summary()
OUTPUT : Model results
OUTPUT

캡처



# Regression coefficients
fitted_model.params
const    5.0
input    3.0
dtype: float64

Estimated values v.s. Original values for target

Estimated values : \(\hat{y} = \hat{a}x + \hat{b} \to A\vec{X}\)

np.dot(constant_input, fitted_model.params)
array([7.49049949, 6.62884716, 6.07305033, 7.0648904 , 6.93690197,
       6.04064573, 6.5576149 , 6.74231639, 6.73183572, 7.07796106,
       ...
       5.74719815, 6.58978836, 6.25943715, 5.88547536, 7.40743629,
       5.77773424, 5.99074449, 6.12113732, 6.13392177, 6.92979226])

Original values : \(y = ax + b\)

f(x,a,b)
array([7.49049949, 6.62884716, 6.07305033, 7.0648904 , 6.93690197,
       6.04064573, 6.5576149 , 6.74231639, 6.73183572, 7.07796106,
       ...
       5.74719815, 6.58978836, 6.25943715, 5.88547536, 7.40743629,
       5.77773424, 5.99074449, 6.12113732, 6.13392177, 6.92979226])

Model diagnosis

Residual

fitted_model.resid
0     -1.776357e-15
1     -2.664535e-15
2     -2.664535e-15
...
...
998   -3.552714e-15
999   -1.776357e-15
Length: 1000, dtype: float64

Residual summation

np.sum(fitted_model.resid)
-2.652988939644274e-12

Visualization for residue

fitted_model.resid.plot()

다운로드 (3)


Model prediction

Prediction

sample = np.random.random(10)
constant_sample = sm.add_constant(sample, has_constant='add')
fitted_model.predict(constant_sample)
array([5.20371122, 6.07617745, 7.77126507, 5.35615965, 7.44019585,
       5.94592521, 5.94306959, 6.56256376, 6.09420242, 6.39866773])

Verification

f(sample,a,b)
array([5.20371122, 6.07617745, 7.77126507, 5.35615965, 7.44019585,
       5.94592521, 5.94306959, 6.56256376, 6.09420242, 6.39866773])

Curve fitting
import matplotlib.pyplot as plt

plt.plot(x, f(x,a,b), 'x', lw=0, label="data")
plt.plot(x, 3*x + 5, label='result')            # from fitted_model.params
plt.ylim(0,10)
plt.legend()
plt.show()

다운로드 (2)





Multivariate linear regression about artificial dataset

Data preprocessing

import numpy as np
import pandas as pd
import statsmodels.api as sm

def f(x,y,z,a,b,c,r):
    return a*x + b*y + c*z + r

x = np.random.random(100)
y = np.random.random(100)
z = np.random.random(100)
a = 20
b = 50
c = 7
r = 3

target = f(x,y,z,a,b,c,r)
df_input1 = pd.DataFrame(x)
df_input2 = pd.DataFrame(y)
df_input3 = pd.DataFrame(z)
df_target = pd.DataFrame(target)
df = pd.concat([df_input1, df_input2, df_input3, df_target], axis=1)
df.columns = ['input1', 'input2', 'input3', 'target']
Input = df[['input1', 'input2', 'input3']]
Target = df['target']
constant_input = sm.add_constant(Input, has_constant='add')
Data : Input
Input.head()

OUTPUT

	input1		input2		input3
0	0.957632	0.276408	0.345041
1	0.821460	0.653252	0.549964
2	0.506590	0.261659	0.393543
3	0.500052	0.056861	0.041176
4	0.267245	0.639603	0.769945


constant_input.head()
	const	input1		input2		input3
0	1.0	0.957632	0.276408	0.345041
1	1.0	0.821460	0.653252	0.549964
2	1.0	0.506590	0.261659	0.393543
3	1.0	0.500052	0.056861	0.041176
4	1.0	0.267245	0.639603	0.769945

Data : Target
Target.head()
0    38.388309
1    55.941549
2    28.969561
3    16.132320
4    45.714644
Name: target, dtype: float64


Regression analysis

model = sm.OLS(Target, constant_input)
fitted_model = model.fit()
fitted_model.summary()
OUTPUT : Model results
OUTPUT

캡처



# Regression coefficients
fitted_model.params
const      3.0
input1    20.0
input2    50.0
input3     7.0
dtype: float64

Verification

from numpy import linalg

B = linalg.inv(np.dot(constant_input.T, constant_input))
np.dot(np.dot(B, constant_input.T),target)
array([ 3., 20., 50.,  7.])

Estimated values v.s. Original values for target

Estimated values : \(\hat{s} = \hat{a}x + \hat{b}y + \hat{c}z + \hat{r} \to \hat{S}=\hat{A}X\)

np.dot(constant_input, fitted_model.params)
array([38.38830915, 55.94154925, 28.96956111, 16.13232006, 45.71464433,
       35.66915115, 54.48721376, 35.3255576 , 17.57414208, 12.2024595 ,
       40.89621614, 33.05053896, 14.50158372, 38.67065445, 53.48709859,
       42.59911466, 54.53748705, 46.69193071, 18.38867267, 45.87908774,
       40.6693773 , 36.01122162, 11.68815215, 44.31558167, 41.80645497,
       49.37841447, 47.09113841, 53.96541726, 36.77556825, 23.52950327,
       38.64777777, 34.16965497, 50.26840963, 40.02741955, 44.16716928,
       42.3150182 , 25.99497711, 41.40530879, 27.36066677, 47.86915385,
       25.70932186, 24.86294199, 55.0745327 , 22.98417126, 32.50294778,
       17.8420005 , 61.35284467, 36.43911886, 49.76839721, 50.56165004,
       40.71292581, 36.41847389, 23.38460759, 59.30680731, 39.40085223,
       25.87053451, 40.11977913, 24.80379252, 53.38541514, 60.33980335,
       45.01501126, 51.37600515, 48.30658941, 30.00273352, 42.44824437,
       52.17219373, 21.72628098, 74.51174471, 47.41694199, 16.47748332,
       16.18670621, 26.77202999, 67.7470938 , 46.24996358, 41.99306012,
       35.44894821, 28.65531671, 29.65139668, 53.31971577, 22.99141254,
       51.20655459, 50.54080656, 66.4153275 , 39.5569899 , 39.35911854,
       39.014512  , 34.51325153, 35.5253818 , 50.8264082 , 18.76223046,
       66.14916028, 37.23867282, 28.3269569 , 53.50468595, 55.85972521,
       54.48370671, 61.87997791, 24.69145197, 47.79432371, 41.2612825 ])

Original values : \(s = ax + by + cz + r\)

f(x,y,z,a,b,c,r)
array([38.38830915, 55.94154925, 28.96956111, 16.13232006, 45.71464433,
       35.66915115, 54.48721376, 35.3255576 , 17.57414208, 12.2024595 ,
       40.89621614, 33.05053896, 14.50158372, 38.67065445, 53.48709859,
       42.59911466, 54.53748705, 46.69193071, 18.38867267, 45.87908774,
       40.6693773 , 36.01122162, 11.68815215, 44.31558167, 41.80645497,
       49.37841447, 47.09113841, 53.96541726, 36.77556825, 23.52950327,
       38.64777777, 34.16965497, 50.26840963, 40.02741955, 44.16716928,
       42.3150182 , 25.99497711, 41.40530879, 27.36066677, 47.86915385,
       25.70932186, 24.86294199, 55.0745327 , 22.98417126, 32.50294778,
       17.8420005 , 61.35284467, 36.43911886, 49.76839721, 50.56165004,
       40.71292581, 36.41847389, 23.38460759, 59.30680731, 39.40085223,
       25.87053451, 40.11977913, 24.80379252, 53.38541514, 60.33980335,
       45.01501126, 51.37600515, 48.30658941, 30.00273352, 42.44824437,
       52.17219373, 21.72628098, 74.51174471, 47.41694199, 16.47748332,
       16.18670621, 26.77202999, 67.7470938 , 46.24996358, 41.99306012,
       35.44894821, 28.65531671, 29.65139668, 53.31971577, 22.99141254,
       51.20655459, 50.54080656, 66.4153275 , 39.5569899 , 39.35911854,
       39.014512  , 34.51325153, 35.5253818 , 50.8264082 , 18.76223046,
       66.14916028, 37.23867282, 28.3269569 , 53.50468595, 55.85972521,
       54.48370671, 61.87997791, 24.69145197, 47.79432371, 41.2612825 ])

Model diagnosis

Residual

fitted_model.resid
0     0.000000e+00
1    -7.105427e-15
2     1.776357e-14
3     2.486900e-14
4     7.105427e-15
5     7.105427e-15
          ...     
95    0.000000e+00
96   -1.421085e-14
97    2.486900e-14
98    1.421085e-14
99    7.105427e-15
Length: 100, dtype: float64

Visualization for residue

import matplotlib.pyplot as plt

fitted_model.resid.plot()
plt.show()

다운로드 (4)


Model prediction

Prediction

sample = np.random.random((10,3))
constant_sample = sm.add_constant(sample, has_constant='add')
fitted_model.predict(constant_sample)
array([47.18460385, 29.42685672, 45.34542694, 21.18367219, 60.83667819,
       31.51219742, 16.92413439, 31.70573065, 19.8877936 , 47.38519353])

Verification

f(sample[:,0],sample[:,1],sample[:,2],a,b,c,r)
array([47.18460385, 29.42685672, 45.34542694, 21.18367219, 60.83667819,
       31.51219742, 16.92413439, 31.70573065, 19.8877936 , 47.38519353])

Curve fitting




Multivariate linear regression about dataset on real world

Dataset downloadURL

Dataset Description
  • CRIM - per capita crime rate by town
  • ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
  • INDUS - proportion of non-retail business acres per town.
  • CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
  • NOX - nitric oxides concentration (parts per 10 million)
  • RM - average number of rooms per dwelling
  • AGE - proportion of owner-occupied units built prior to 1940
  • DIS - weighted distances to five Boston employment centres
  • RAD - index of accessibility to radial highways
  • TAX - full-value property-tax rate per $10,000
  • PTRATIO - pupil-teacher ratio by town
  • B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
  • LSTAT - % lower status of the population
  • MEDV - Median value of owner-occupied homes in $1000’s

Data preprocessing

import pandas as pd
import numpy as np
import statsmodels.api as sm

boston = pd.read_csv(r'C:\Users\userd\Desktop\dataset\boston_house.csv')
Input_s = boston[['CRIM', 'RM', 'LSTAT']]
Input_L = boston[['CRIM', 'RM', 'LSTAT', 'B', 'TAX', 'AGE', 'ZN', 'NOX', 'INDUS']]
Target = boston['MEDV']
constant_Input_s = sm.add_constant(Input_s, has_constant='add')
constant_Input_L = sm.add_constant(Input_L, has_constant='add')
Data : Input
constant_Input_s.head()
	const	CRIM	RM	LSTAT
0	1.0	0.00632	6.575	4.98
1	1.0	0.02731	6.421	9.14
2	1.0	0.02729	7.185	4.03
3	1.0	0.03237	6.998	2.94
4	1.0	0.06905	7.147	5.33


constant_Input_L.head()
	const	CRIM	RM	LSTAT	B	TAX	AGE	ZN	NOX	INDUS
0	1.0	0.00632	6.575	4.98	396.90	296	65.2	18.0	0.538	2.31
1	1.0	0.02731	6.421	9.14	396.90	242	78.9	0.0	0.469	7.07
2	1.0	0.02729	7.185	4.03	392.83	242	61.1	0.0	0.469	7.07
3	1.0	0.03237	6.998	2.94	394.63	222	45.8	0.0	0.458	2.18
4	1.0	0.06905	7.147	5.33	396.90	222	54.2	0.0	0.458	2.18

Data diagnosis

Multicollinearity : Variance inflation factor(VIF)

from statsmodels.stats.outliers_influence import variance_inflation_factor

vif = pd.DataFrame()
vif['VIF Factor'] = [variance_inflation_factor(Input_L.values, i) for i in range(Input_L.shape[1])]
vif['features'] = Input_L.columns
vif
	VIF Factor	features
0	1.917332	CRIM
1	46.535369	RM
2	8.844137	LSTAT
3	16.856737	B
4	19.923044	TAX
5	18.457503	AGE
6	2.086502	ZN
7	72.439753	NOX
8	12.642137	INDUS


Multicollinearity : Correlation coefficient

Input_L.corr()
	CRIM		RM		LSTAT		B		TAX		AGE		ZN		NOX		INDUS
CRIM	1.000000	-0.219247	0.455621	-0.385064	0.582764	0.352734	-0.200469	0.420972	0.406583
RM	-0.219247	1.000000	-0.613808	0.128069	-0.292048	-0.240265	0.311991	-0.302188	-0.391676
LSTAT	0.455621	-0.613808	1.000000	-0.366087	0.543993	0.602339	-0.412995	0.590879	0.603800
B	-0.385064	0.128069	-0.366087	1.000000	-0.441808	-0.273534	0.175520	-0.380051	-0.356977
TAX	0.582764	-0.292048	0.543993	-0.441808	1.000000	0.506456	-0.314563	0.668023	0.720760
AGE	0.352734	-0.240265	0.602339	-0.273534	0.506456	1.000000	-0.569537	0.731470	0.644779
ZN	-0.200469	0.311991	-0.412995	0.175520	-0.314563	-0.569537	1.000000	-0.516604	-0.533828
NOX	0.420972	-0.302188	0.590879	-0.380051	0.668023	0.731470	-0.516604	1.000000	0.763651
INDUS	0.406583	-0.391676	0.603800	-0.356977	0.720760	0.644779	-0.533828	0.763651	1.000000
import seaborn as sns
cmap = sns.light_palette('darkgray', as_cmap=True)
sns.heatmap(Input_L.corr(), annot=True, cmap=cmap)
plt.show()

다운로드 (2)

sns.pairplot(Input_L)
plt.show()

다운로드 (3)



Regression analysis

model_s = sm.OLS(Target, constant_Input_s)
model_L = sm.OLS(Target, constant_Input_L)
fitted_model_s = model_s.fit()
fitted_model_L = model_L.fit()
fitted_model_s.summary()
OUTPUT : Model results
OUTPUT

캡처



fitted_model_s.params
const   -2.562251
CRIM    -0.102941
RM       5.216955
LSTAT   -0.578486
dtype: float64


fitted_model_L.summary()
OUTPUT : Model results
OUTPUT

캡처



fitted_model_L.params
const   -7.108827
CRIM    -0.045293
RM       5.092238
LSTAT   -0.565133
B        0.008974
TAX     -0.006025
AGE      0.023619
ZN       0.029377
NOX      3.483832
INDUS    0.029270
dtype: float64

Model diagnosis

Residual analysis

import matplotlib.pyplot as plt

fitted_model_s.resid.plot(label="base")
fitted_model_L.resid.plot(label="full")
plt.legend()
plt.show()

다운로드 (1)



Modify regression model(based on backward elimination)

from sklearn.model_selection import train_test_split

# Data preprocessing
Input1 = Input_L.drop('NOX', axis=1)
Input2 = Input_L.drop(['NOX','RM'], axis=1)
constant_Input1 = sm.add_constant(Input1, has_constant='add')
constant_Input2 = sm.add_constant(Input2, has_constant='add')
X = constant_Input_L
X1 = constant_Input1
X2 = constant_Input2
y = Target

train_x, test_x, train_y, test_y = train_test_split(X, y, train_size=0.7, test_size=0.3, random_state = 1)
train_x1, test_x1, train_y1, test_y1 = train_test_split(X1, y, train_size=0.7, test_size=0.3, random_state = 1)
train_x2, test_x2, train_y2, test_y2 = train_test_split(X2, y, train_size=0.7, test_size=0.3, random_state = 1)

# Regression analysis
model = sm.OLS(train_y, train_x)
model1 = sm.OLS(train_y1, train_x1)
model2 = sm.OLS(train_y2, train_x2)

fitted_model = model.fit()
fitted_model1 = model1.fit()
fitted_model2 = model2.fit()
Data : Input
constant_Input1.head()
	const	CRIM	RM	LSTAT	B	TAX	AGE	ZN	INDUS
0	1.0	0.00632	6.575	4.98	396.90	296	65.2	18.0	2.31
1	1.0	0.02731	6.421	9.14	396.90	242	78.9	0.0	7.07
2	1.0	0.02729	7.185	4.03	392.83	242	61.1	0.0	7.07
3	1.0	0.03237	6.998	2.94	394.63	222	45.8	0.0	2.18
4	1.0	0.06905	7.147	5.33	396.90	222	54.2	0.0	2.18


constant_Input2.head()
	const	CRIM	LSTAT	B	TAX	AGE	ZN	INDUS
0	1.0	0.00632	4.98	396.90	296	65.2	18.0	2.31
1	1.0	0.02731	9.14	396.90	242	78.9	0.0	7.07
2	1.0	0.02729	4.03	392.83	242	61.1	0.0	7.07
3	1.0	0.03237	2.94	394.63	222	45.8	0.0	2.18
4	1.0	0.06905	5.33	396.90	222	54.2	0.0	2.18

Data diagnosis

Multicollinearity : Variance inflation factor(VIF)

vif1 = pd.DataFrame()
vif2 = pd.DataFrame()
vif1['VIF1 Factor'] = [variance_inflation_factor(Input1.values, i) for i in range(Input1.shape[1])]
vif2['VIF2 Factor'] = [variance_inflation_factor(Input2.values, i) for i in range(Input2.shape[1])]
vif1['features1'] = Input1.columns
vif2['features2'] = Input2.columns
pd.concat([vif,vif1,vif2], axis=1)
	VIF Factor	features	VIF1 Factor	features1	VIF2 Factor	features2
0	1.917332	CRIM		1.916648	CRIM		1.907517	CRIM
1	46.535369	RM		30.806301	RM		7.933529	LSTAT
2	8.844137	LSTAT		8.171214	LSTAT		7.442569	B
3	16.856737	B		16.735751	B		16.233237	TAX
4	19.923044	TAX		18.727105	TAX		13.765377	AGE
5	18.457503	AGE		16.339792	AGE		1.820070	ZN
6	2.086502	ZN		2.074500	ZN		11.116823	INDUS
7	72.439753	NOX		11.217461	INDUS		NaN		NaN
8	12.642137	INDUS		NaN		NaN		NaN		NaN

OUTPUT : Model results
fitted_model.summary()
OUTPUT

캡처



fitted_model1.summary()
OUTPUT

캡처



fitted_model2.summary()
OUTPUT

캡처



Model prediction
plt.plot(np.array(fitted_model.predict(test_x)), label="model with full variables")
plt.plot(np.array(fitted_model1.predict(test_x1)), label="model1 eliminated 1 variable")
plt.plot(np.array(fitted_model2.predict(test_x2)), label="model2 eliminated 2 variables")
plt.plot(np.array(test_y), label="true")
plt.legend()
plt.show()

다운로드 (4)


Model diagnosis

Residual analysis

plt.plot(np.array(test_y.values-fitted_model.predict(test_x)),label='residual of model')
plt.plot(np.array(test_y.values-fitted_model1.predict(test_x1)),label='residual of model1')
plt.plot(np.array(test_y.values-fitted_model2.predict(test_x2)),label='residual; of model2')
plt.legend()
plt.show()

다운로드 (5)


Model performance
from sklearn.metrics import mean_squared_error

print(mean_squared_error(y_true=test_y.values, y_pred=fitted_model.predict(test_x)))
print(mean_squared_error(y_true=test_y.values, y_pred=fitted_model1.predict(test_x1)))
print(mean_squared_error(y_true=test_y.values, y_pred=fitted_model2.predict(test_x2)))
26.148631468819843
26.14006260984654
38.788453179128304




Multivariate nonlinear regression about dataset on real world

Dataset downloadURL

Data preprocessing

## [0] : Load libraries
import pandas as pd
import numpy as np
import statsmodels.api as sm
from sklearn.model_selection import train_test_split


## [1] : Load dataset
corolla = pd.read_csv(r'C:\Users\userd\Desktop\dataset\ToyotaCorolla.csv')
nCar = corolla.shape[0]
nVar = corolla.shape[1]


## [2] : categorical data-type > binary data-type
# Create dummy variables
dummy_p = np.repeat(0,nCar)
dummy_d = np.repeat(0,nCar)
dummy_c = np.repeat(0,nCar)

# Save index for 'Fuel_Type'
p_idx = np.array(corolla.Fuel_Type == "Petrol")
d_idx = np.array(corolla.Fuel_Type == "Diesel")
c_idx = np.array(corolla.Fuel_Type == "CNG")

# Substitute binary = 1 after slicing
dummy_p[p_idx] = 1  # Petrol
dummy_d[d_idx] = 1  # Diesel
dummy_c[c_idx] = 1  # CNG


## [3] : Eliminate unnecessary variables and add dummy variables
Fuel = pd.DataFrame({'Petrol': dummy_p, 'Diesel': dummy_d, 'CNG': dummy_c})
corolla_ = corolla.dropna().drop(['Id','Model','Fuel_Type'], axis=1, inplace=False)
mlr_data = pd.concat((corolla_, Fuel), 1)


## [4] : Add bias
constant_mlr_data = sm.add_constant(mlr_data, has_constant='add')


## [5] : Divide into input data and output data
feature_columns = list(constant_mlr_data.columns.difference(['Price']))
X = constant_mlr_data[feature_columns]
y = constant_mlr_data.Price
train_x, test_x, train_y, test_y = train_test_split(X, y, train_size=0.7, test_size=0.3)
[1] Data : Input
corolla.head()
	Id	Model						Price	Age_08_04	Mfg_Month	Mfg_Year	KM		Fuel_Type	HP	Met_Color	...	Central_Lock	Powered_Windows	Power_Steering	Radio	Mistlamps		Sport_Model	Backseat_Divider	Metallic_Rim	Radio_cassette	Tow_Bar
0	1	TOYOTA Corolla 2.0 D4D HATCHB TERRA 2/3-Doors	13500	23		10		2002		46986		Diesel		90	1		...	1		1		1		0	0			0		1			0		0		0
1	2	TOYOTA Corolla 2.0 D4D HATCHB TERRA 2/3-Doors	13750	23		10		2002		72937		Diesel		90	1		...	1		0		1		0	0			0		1			0		0		0
2	3	?TOYOTA Corolla 2.0 D4D HATCHB TERRA 2/3-Doors	13950	24		9		2002		41711		Diesel		90	1		...	0		0		1		0	0			0		1			0		0		0
3	4	TOYOTA Corolla 2.0 D4D HATCHB TERRA 2/3-Doors	14950	26		7		2002		48000		Diesel		90	0		...	0		0		1		0	0			0		1			0		0		0
4	5	TOYOTA Corolla 2.0 D4D HATCHB SOL 2/3-Doors	13750	30		3		2002		38500		Diesel		90	0		...	1		1		1		0	1			0		1			0		0		0
5 rows × 37 columns


print('nCar: %d' % nCar, 'nVar: %d' % nVar )
nCar: 1436 nVar: 37

[2] Data : Input
dummy_p
array([0, 0, 0, ..., 0, 0, 0])
dummy_d
array([0, 0, 0, ..., 0, 0, 0])
dummy_c
array([0, 0, 0, ..., 0, 0, 0])


p_idx
array([False, False, False, ...,  True,  True,  True])
d_idx
array([ True,  True,  True, ..., False, False, False])
c_idx
array([False, False, False, ..., False, False, False])


dummy_p
array([0, 0, 0, ..., 1, 1, 1])
dummy_d
array([1, 1, 1, ..., 0, 0, 0])
dummy_c
array([0, 0, 0, ..., 0, 0, 0])

[3] Data : Input
Fuel.head()
	Petrol	Diesel	CNG
0	0	1	0
1	0	1	0
2	0	1	0
3	0	1	0
4	0	1	0
Fuel.shape
(1436, 3)


corolla_.head()
	Price	Age_08_04	Mfg_Month	Mfg_Year	KM	HP	Met_Color	Automatic	cc	Doors	...	Central_Lock	Powered_Windows	Power_Steering	Radio	Mistlamps	Sport_Model	Backseat_Divider	Metallic_Rim	Radio_cassette	Tow_Bar
0	13500	23		10		2002		46986	90	1		0		2000	3	...	1		1		1		0	0		0		1			0		0		0
1	13750	23		10		2002		72937	90	1		0		2000	3	...	1		0		1		0	0		0		1			0		0		0
2	13950	24		9		2002		41711	90	1		0		2000	3	...	0		0		1		0	0		0		1			0		0		0
3	14950	26		7		2002		48000	90	0		0		2000	3	...	0		0		1		0	0		0		1			0		0		0
4	13750	30		3		2002		38500	90	0		0		2000	3	...	1		1		1		0	1		0		1			0		0		0
5 rows × 34 columns
corolla_.shape
(1436, 34)


mlr_data.head()
	Price	Age_08_04	Mfg_Month	Mfg_Year	KM	HP	Met_Color	Automatic	cc	Doors	...	Radio	Mistlamps	Sport_Model	Backseat_Divider	Metallic_Rim	Radio_cassette	Tow_Bar	Petrol	Diesel	CNG
0	13500	23		10		2002		46986	90	1		0		2000	3	...	0	0		0		1			0		0		0	0	1	0
1	13750	23		10		2002		72937	90	1		0		2000	3	...	0	0		0		1			0		0		0	0	1	0
2	13950	24		9		2002		41711	90	1		0		2000	3	...	0	0		0		1			0		0		0	0	1	0
3	14950	26		7		2002		48000	90	0		0		2000	3	...	0	0		0		1			0		0		0	0	1	0
4	13750	30		3		2002		38500	90	0		0		2000	3	...	0	1		0		1			0		0		0	0	1	0
5 rows × 37 columns
mlr_data.shape
(1436, 37)

[4] Data : Input
constant_mlr_data.head()
	const	Price	Age_08_04	Mfg_Month	Mfg_Year	KM	HP	Met_Color	Automatic	cc	...	Radio	Mistlamps	Sport_Model	Backseat_Divider	Metallic_Rim	Radio_cassette	Tow_Bar	Petrol	Diesel	CNG
0	1.0	13500	23		10		2002		46986	90	1		0		2000	...	0	0		0		1			0		0		0	0	1	0
1	1.0	13750	23		10		2002		72937	90	1		0		2000	...	0	0		0		1			0		0		0	0	1	0
2	1.0	13950	24		9		2002		41711	90	1		0		2000	...	0	0		0		1			0		0		0	0	1	0
3	1.0	14950	26		7		2002		48000	90	0		0		2000	...	0	0		0		1			0		0		0	0	1	0
4	1.0	13750	30		3		2002		38500	90	0		0		2000	...	0	1		0		1			0		0		0	0	1	0
5 rows × 38 columns

[5] Data : Input
mlr_data.columns.difference(['Price'])
Index(['ABS', 'Age_08_04', 'Airbag_1', 'Airbag_2', 'Airco', 'Automatic',
       'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider',
       'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders',
       'Diesel', 'Doors', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Met_Color',
       'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Mistlamps',
       'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio',
       'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'cc', 'const'],
      dtype='object')


X.head()
	ABS	Age_08_04	Airbag_1	Airbag_2	Airco	Automatic	Automatic_airco	BOVAG_Guarantee			Backseat_Divider	Boardcomputer	...	Power_Steering	Powered_Windows	Quarterly_Tax	Radio	Radio_cassette		Sport_Model	Tow_Bar	Weight	cc	const
0	1	23		1		1		0	0		0		1				1			1		...	1		1		210		0	0			0		0	1165	2000	1.0
1	1	23		1		1		1	0		0		1				1			1		...	1		0		210		0	0			0		0	1165	2000	1.0
2	1	24		1		1		0	0		0		1				1			1		...	1		0		210		0	0			0		0	1165	2000	1.0
3	1	26		1		1		0	0		0		1				1			1		...	1		0		210		0	0			0		0	1165	2000	1.0
4	1	30		1		1		1	0		0		1				1			1		...	1		1		210		0	0			0		0	1170	2000	1.0
5 rows × 37 columns
y.head()
0    13500
1    13750
2    13950
3    14950
4    13750
Name: Price, dtype: int64


>>> print(X.shape, y.shape)
(1436, 37), (1436,)

>>> print(train_x.shape, test_x.shape, train_y.shape, test_y.shape)
(1005, 37) (431, 37) (1005,) (431,)

Data diagnosis

Multicollinearity : Variance inflation factor(VIF)

from statsmodels.stats.outliers_influence import variance_inflation_factor

vif = pd.DataFrame()
vif["VIF Factor"] = [variance_inflation_factor(mlr_data.values, i) for i in range(mlr_data.shape[1])]
vif["features"] = mlr_data.columns
vif
	VIF Factor	features
0	10.953474	Price
1	inf		Age_08_04
2	inf		Mfg_Month
3	inf		Mfg_Year
4	2.400334	KM
5	2.621514	HP
6	1.143778	Met_Color
7	1.121303	Automatic
8	1.258641	cc
9	1.352288	Doors
10	0.000000	Cylinders
11	1.271814	Gears
12	5.496805	Quarterly_Tax
13	4.487491	Weight
14	1.210815	Mfr_Guarantee
15	1.392485	BOVAG_Guarantee
16	1.573026	Guarantee_Period
17	2.276617	ABS
18	1.612758	Airbag_1
19	3.106933	Airbag_2
20	1.846429	Airco
21	2.009866	Automatic_airco
22	2.647036	Boardcomputer
23	1.564446	CD_Player
24	4.593157	Central_Lock
25	4.676311	Powered_Windows
26	1.582829	Power_Steering
27	62.344621	Radio
28	2.076846	Mistlamps
29	1.510131	Sport_Model
30	2.702141	Backseat_Divider
31	1.349642	Metallic_Rim
32	62.172860	Radio_cassette
33	1.153760	Tow_Bar
34	inf		Petrol
35	inf		Diesel
36	inf		CNG


Regression analysis

# Train the MLR(fitting regression model)
full_model = sm.OLS(train_y, train_x)
fitted_full_model = full_model.fit()
fitted_full_model.summary()
OUTPUT : Model results

R2 is high, a majority of variables is meaningful 1 2 3


Model diagnosis

Normal Q-Q Plot

import matplotlib.pyplot as plt

# checking residual
res = fitted_full_model.resid  # residual

# q-q plot
fig = sm.qqplot(res, fit=True, line='45')

다운로드


Residual vs Fitted plot

import matplotlib.pyplot as plt

pred_y=fitted_full_model.predict(train_x)
res = fitted_full_model.resid  # residual

fig = plt.scatter(pred_y,res, s=4)
plt.xlim(4000,30000)
plt.xlim(4000,30000)
plt.xlabel('Fitted values')
plt.ylabel('Residual')

다운로드 (1)


Model prediction
import matplotlib.pyplot as plt

pred_y = fitted_full_model.predict(test_x) ## 검증 데이터에 대한 예측 
plt.plot(np.array(test_y-pred_y),label="pred_full")
plt.legend()
plt.show()

다운로드 (2)


Model performance
from sklearn.metrics import mean_squared_error

pred_y = fitted_full_model.predict(test_x)
mean_squared_error(y_true= test_y, y_pred= pred_y)

1441488.811437499



Modify regression model(Variables selection)

# [0]
import time
import itertools

# [1]
def processSubset(X,y, feature_set):
            model = sm.OLS(y,X[list(feature_set)]) # Modeling
            regr = model.fit() # 모델 학습
            AIC = regr.aic # 모델의 AIC
            return {"model":regr, "AIC":AIC}

# [2] getBest: 가장 낮은 AIC를 가지는 모델 선택 및 저장
def getBest(X,y,k):
    tic = time.time() # 시작시간
    results = [] # 결과 저장공간
    for combo in itertools.combinations(X.columns.difference(['const']), k): # 각 변수조합을 고려한 경우의 수
        combo=(list(combo)+['const'])
        
        results.append(processSubset(X,y,feature_set=combo))  # 모델링된 것들을 저장
    models = pd.DataFrame(results) # 데이터 프레임으로 변환
    # 가장 낮은 AIC를 가지는 모델 선택 및 저장
    best_model = models.loc[models['AIC'].argmin()] # index
    toc = time.time() # 종료시간
    print("Processed ", models.shape[0], "models on", k, "predictors in", (toc - tic),
          "seconds.")
    return best_model

print(getBest(X=train_x, y=train_y,k=2))
OUTPUT
Processed  630 models on 2 predictors in 1.8201320171356201 seconds.
AIC                                                17516.6
model    <statsmodels.regression.linear_model.Regressio...
Name: 211, dtype: object

SUPPLEMENT [1]
processSubset(X=train_x, y=train_y, feature_set = feature_columns)
{'model': <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x1fbccd16080>,
 'AIC': 16970.52868834004}


processSubset(X=train_x, y=train_y, feature_set = feature_columns[0:5])
{'model': <statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x000001FBCCDEB358>, 'AIC': 19176.91230693121}

SUPPLEMENT [2]
for combo in itertools.combinations(X.columns.difference(['const']), 2):
    print((list(combo)+['const']))
OUTPUT
['ABS', 'Age_08_04', 'const']
['ABS', 'Airbag_1', 'const']
['ABS', 'Airbag_2', 'const']
['ABS', 'Airco', 'const']
['ABS', 'Automatic', 'const']
['ABS', 'Automatic_airco', 'const']
['ABS', 'BOVAG_Guarantee', 'const']
['ABS', 'Backseat_Divider', 'const']
['ABS', 'Boardcomputer', 'const']
['ABS', 'CD_Player', 'const']
['ABS', 'CNG', 'const']
['ABS', 'Central_Lock', 'const']
['ABS', 'Cylinders', 'const']
['ABS', 'Diesel', 'const']
['ABS', 'Doors', 'const']
['ABS', 'Gears', 'const']
['ABS', 'Guarantee_Period', 'const']
['ABS', 'HP', 'const']
['ABS', 'KM', 'const']
['ABS', 'Met_Color', 'const']
['ABS', 'Metallic_Rim', 'const']
['ABS', 'Mfg_Month', 'const']
['ABS', 'Mfg_Year', 'const']
['ABS', 'Mfr_Guarantee', 'const']
['ABS', 'Mistlamps', 'const']
['ABS', 'Petrol', 'const']
['ABS', 'Power_Steering', 'const']
['ABS', 'Powered_Windows', 'const']
['ABS', 'Quarterly_Tax', 'const']
['ABS', 'Radio', 'const']
['ABS', 'Radio_cassette', 'const']
['ABS', 'Sport_Model', 'const']
['ABS', 'Tow_Bar', 'const']
['ABS', 'Weight', 'const']
['ABS', 'cc', 'const']
['Age_08_04', 'Airbag_1', 'const']
['Age_08_04', 'Airbag_2', 'const']
['Age_08_04', 'Airco', 'const']
['Age_08_04', 'Automatic', 'const']
['Age_08_04', 'Automatic_airco', 'const']
['Age_08_04', 'BOVAG_Guarantee', 'const']
['Age_08_04', 'Backseat_Divider', 'const']
['Age_08_04', 'Boardcomputer', 'const']
['Age_08_04', 'CD_Player', 'const']
['Age_08_04', 'CNG', 'const']
['Age_08_04', 'Central_Lock', 'const']
['Age_08_04', 'Cylinders', 'const']
['Age_08_04', 'Diesel', 'const']
['Age_08_04', 'Doors', 'const']
['Age_08_04', 'Gears', 'const']
['Age_08_04', 'Guarantee_Period', 'const']
['Age_08_04', 'HP', 'const']
['Age_08_04', 'KM', 'const']
['Age_08_04', 'Met_Color', 'const']
['Age_08_04', 'Metallic_Rim', 'const']
['Age_08_04', 'Mfg_Month', 'const']
['Age_08_04', 'Mfg_Year', 'const']
['Age_08_04', 'Mfr_Guarantee', 'const']
['Age_08_04', 'Mistlamps', 'const']
['Age_08_04', 'Petrol', 'const']
['Age_08_04', 'Power_Steering', 'const']
['Age_08_04', 'Powered_Windows', 'const']
['Age_08_04', 'Quarterly_Tax', 'const']
['Age_08_04', 'Radio', 'const']
['Age_08_04', 'Radio_cassette', 'const']
['Age_08_04', 'Sport_Model', 'const']
['Age_08_04', 'Tow_Bar', 'const']
['Age_08_04', 'Weight', 'const']
['Age_08_04', 'cc', 'const']
['Airbag_1', 'Airbag_2', 'const']
['Airbag_1', 'Airco', 'const']
['Airbag_1', 'Automatic', 'const']
['Airbag_1', 'Automatic_airco', 'const']
['Airbag_1', 'BOVAG_Guarantee', 'const']
['Airbag_1', 'Backseat_Divider', 'const']
['Airbag_1', 'Boardcomputer', 'const']
['Airbag_1', 'CD_Player', 'const']
['Airbag_1', 'CNG', 'const']
['Airbag_1', 'Central_Lock', 'const']
['Airbag_1', 'Cylinders', 'const']
['Airbag_1', 'Diesel', 'const']
['Airbag_1', 'Doors', 'const']
['Airbag_1', 'Gears', 'const']
['Airbag_1', 'Guarantee_Period', 'const']
['Airbag_1', 'HP', 'const']
['Airbag_1', 'KM', 'const']
['Airbag_1', 'Met_Color', 'const']
['Airbag_1', 'Metallic_Rim', 'const']
['Airbag_1', 'Mfg_Month', 'const']
['Airbag_1', 'Mfg_Year', 'const']
['Airbag_1', 'Mfr_Guarantee', 'const']
['Airbag_1', 'Mistlamps', 'const']
['Airbag_1', 'Petrol', 'const']
['Airbag_1', 'Power_Steering', 'const']
['Airbag_1', 'Powered_Windows', 'const']
['Airbag_1', 'Quarterly_Tax', 'const']
['Airbag_1', 'Radio', 'const']
['Airbag_1', 'Radio_cassette', 'const']
['Airbag_1', 'Sport_Model', 'const']
['Airbag_1', 'Tow_Bar', 'const']
['Airbag_1', 'Weight', 'const']
['Airbag_1', 'cc', 'const']
['Airbag_2', 'Airco', 'const']
['Airbag_2', 'Automatic', 'const']
['Airbag_2', 'Automatic_airco', 'const']
['Airbag_2', 'BOVAG_Guarantee', 'const']
['Airbag_2', 'Backseat_Divider', 'const']
['Airbag_2', 'Boardcomputer', 'const']
['Airbag_2', 'CD_Player', 'const']
['Airbag_2', 'CNG', 'const']
['Airbag_2', 'Central_Lock', 'const']
['Airbag_2', 'Cylinders', 'const']
['Airbag_2', 'Diesel', 'const']
['Airbag_2', 'Doors', 'const']
['Airbag_2', 'Gears', 'const']
['Airbag_2', 'Guarantee_Period', 'const']
['Airbag_2', 'HP', 'const']
['Airbag_2', 'KM', 'const']
['Airbag_2', 'Met_Color', 'const']
['Airbag_2', 'Metallic_Rim', 'const']
['Airbag_2', 'Mfg_Month', 'const']
['Airbag_2', 'Mfg_Year', 'const']
['Airbag_2', 'Mfr_Guarantee', 'const']
['Airbag_2', 'Mistlamps', 'const']
['Airbag_2', 'Petrol', 'const']
['Airbag_2', 'Power_Steering', 'const']
['Airbag_2', 'Powered_Windows', 'const']
['Airbag_2', 'Quarterly_Tax', 'const']
['Airbag_2', 'Radio', 'const']
['Airbag_2', 'Radio_cassette', 'const']
['Airbag_2', 'Sport_Model', 'const']
['Airbag_2', 'Tow_Bar', 'const']
['Airbag_2', 'Weight', 'const']
['Airbag_2', 'cc', 'const']
['Airco', 'Automatic', 'const']
['Airco', 'Automatic_airco', 'const']
['Airco', 'BOVAG_Guarantee', 'const']
['Airco', 'Backseat_Divider', 'const']
['Airco', 'Boardcomputer', 'const']
['Airco', 'CD_Player', 'const']
['Airco', 'CNG', 'const']
['Airco', 'Central_Lock', 'const']
['Airco', 'Cylinders', 'const']
['Airco', 'Diesel', 'const']
['Airco', 'Doors', 'const']
['Airco', 'Gears', 'const']
['Airco', 'Guarantee_Period', 'const']
['Airco', 'HP', 'const']
['Airco', 'KM', 'const']
['Airco', 'Met_Color', 'const']
['Airco', 'Metallic_Rim', 'const']
['Airco', 'Mfg_Month', 'const']
['Airco', 'Mfg_Year', 'const']
['Airco', 'Mfr_Guarantee', 'const']
['Airco', 'Mistlamps', 'const']
['Airco', 'Petrol', 'const']
['Airco', 'Power_Steering', 'const']
['Airco', 'Powered_Windows', 'const']
['Airco', 'Quarterly_Tax', 'const']
['Airco', 'Radio', 'const']
['Airco', 'Radio_cassette', 'const']
['Airco', 'Sport_Model', 'const']
['Airco', 'Tow_Bar', 'const']
['Airco', 'Weight', 'const']
['Airco', 'cc', 'const']
['Automatic', 'Automatic_airco', 'const']
['Automatic', 'BOVAG_Guarantee', 'const']
['Automatic', 'Backseat_Divider', 'const']
['Automatic', 'Boardcomputer', 'const']
['Automatic', 'CD_Player', 'const']
['Automatic', 'CNG', 'const']
['Automatic', 'Central_Lock', 'const']
['Automatic', 'Cylinders', 'const']
['Automatic', 'Diesel', 'const']
['Automatic', 'Doors', 'const']
['Automatic', 'Gears', 'const']
['Automatic', 'Guarantee_Period', 'const']
['Automatic', 'HP', 'const']
['Automatic', 'KM', 'const']
['Automatic', 'Met_Color', 'const']
['Automatic', 'Metallic_Rim', 'const']
['Automatic', 'Mfg_Month', 'const']
['Automatic', 'Mfg_Year', 'const']
['Automatic', 'Mfr_Guarantee', 'const']
['Automatic', 'Mistlamps', 'const']
['Automatic', 'Petrol', 'const']
['Automatic', 'Power_Steering', 'const']
['Automatic', 'Powered_Windows', 'const']
['Automatic', 'Quarterly_Tax', 'const']
['Automatic', 'Radio', 'const']
['Automatic', 'Radio_cassette', 'const']
['Automatic', 'Sport_Model', 'const']
['Automatic', 'Tow_Bar', 'const']
['Automatic', 'Weight', 'const']
['Automatic', 'cc', 'const']
['Automatic_airco', 'BOVAG_Guarantee', 'const']
['Automatic_airco', 'Backseat_Divider', 'const']
['Automatic_airco', 'Boardcomputer', 'const']
['Automatic_airco', 'CD_Player', 'const']
['Automatic_airco', 'CNG', 'const']
['Automatic_airco', 'Central_Lock', 'const']
['Automatic_airco', 'Cylinders', 'const']
['Automatic_airco', 'Diesel', 'const']
['Automatic_airco', 'Doors', 'const']
['Automatic_airco', 'Gears', 'const']
['Automatic_airco', 'Guarantee_Period', 'const']
['Automatic_airco', 'HP', 'const']
['Automatic_airco', 'KM', 'const']
['Automatic_airco', 'Met_Color', 'const']
['Automatic_airco', 'Metallic_Rim', 'const']
['Automatic_airco', 'Mfg_Month', 'const']
['Automatic_airco', 'Mfg_Year', 'const']
['Automatic_airco', 'Mfr_Guarantee', 'const']
['Automatic_airco', 'Mistlamps', 'const']
['Automatic_airco', 'Petrol', 'const']
['Automatic_airco', 'Power_Steering', 'const']
['Automatic_airco', 'Powered_Windows', 'const']
['Automatic_airco', 'Quarterly_Tax', 'const']
['Automatic_airco', 'Radio', 'const']
['Automatic_airco', 'Radio_cassette', 'const']
['Automatic_airco', 'Sport_Model', 'const']
['Automatic_airco', 'Tow_Bar', 'const']
['Automatic_airco', 'Weight', 'const']
['Automatic_airco', 'cc', 'const']
['BOVAG_Guarantee', 'Backseat_Divider', 'const']
['BOVAG_Guarantee', 'Boardcomputer', 'const']
['BOVAG_Guarantee', 'CD_Player', 'const']
['BOVAG_Guarantee', 'CNG', 'const']
['BOVAG_Guarantee', 'Central_Lock', 'const']
['BOVAG_Guarantee', 'Cylinders', 'const']
['BOVAG_Guarantee', 'Diesel', 'const']
['BOVAG_Guarantee', 'Doors', 'const']
['BOVAG_Guarantee', 'Gears', 'const']
['BOVAG_Guarantee', 'Guarantee_Period', 'const']
['BOVAG_Guarantee', 'HP', 'const']
['BOVAG_Guarantee', 'KM', 'const']
['BOVAG_Guarantee', 'Met_Color', 'const']
['BOVAG_Guarantee', 'Metallic_Rim', 'const']
['BOVAG_Guarantee', 'Mfg_Month', 'const']
['BOVAG_Guarantee', 'Mfg_Year', 'const']
['BOVAG_Guarantee', 'Mfr_Guarantee', 'const']
['BOVAG_Guarantee', 'Mistlamps', 'const']
['BOVAG_Guarantee', 'Petrol', 'const']
['BOVAG_Guarantee', 'Power_Steering', 'const']
['BOVAG_Guarantee', 'Powered_Windows', 'const']
['BOVAG_Guarantee', 'Quarterly_Tax', 'const']
['BOVAG_Guarantee', 'Radio', 'const']
['BOVAG_Guarantee', 'Radio_cassette', 'const']
['BOVAG_Guarantee', 'Sport_Model', 'const']
['BOVAG_Guarantee', 'Tow_Bar', 'const']
['BOVAG_Guarantee', 'Weight', 'const']
['BOVAG_Guarantee', 'cc', 'const']
['Backseat_Divider', 'Boardcomputer', 'const']
['Backseat_Divider', 'CD_Player', 'const']
['Backseat_Divider', 'CNG', 'const']
['Backseat_Divider', 'Central_Lock', 'const']
['Backseat_Divider', 'Cylinders', 'const']
['Backseat_Divider', 'Diesel', 'const']
['Backseat_Divider', 'Doors', 'const']
['Backseat_Divider', 'Gears', 'const']
['Backseat_Divider', 'Guarantee_Period', 'const']
['Backseat_Divider', 'HP', 'const']
['Backseat_Divider', 'KM', 'const']
['Backseat_Divider', 'Met_Color', 'const']
['Backseat_Divider', 'Metallic_Rim', 'const']
['Backseat_Divider', 'Mfg_Month', 'const']
['Backseat_Divider', 'Mfg_Year', 'const']
['Backseat_Divider', 'Mfr_Guarantee', 'const']
['Backseat_Divider', 'Mistlamps', 'const']
['Backseat_Divider', 'Petrol', 'const']
['Backseat_Divider', 'Power_Steering', 'const']
['Backseat_Divider', 'Powered_Windows', 'const']
['Backseat_Divider', 'Quarterly_Tax', 'const']
['Backseat_Divider', 'Radio', 'const']
['Backseat_Divider', 'Radio_cassette', 'const']
['Backseat_Divider', 'Sport_Model', 'const']
['Backseat_Divider', 'Tow_Bar', 'const']
['Backseat_Divider', 'Weight', 'const']
['Backseat_Divider', 'cc', 'const']
['Boardcomputer', 'CD_Player', 'const']
['Boardcomputer', 'CNG', 'const']
['Boardcomputer', 'Central_Lock', 'const']
['Boardcomputer', 'Cylinders', 'const']
['Boardcomputer', 'Diesel', 'const']
['Boardcomputer', 'Doors', 'const']
['Boardcomputer', 'Gears', 'const']
['Boardcomputer', 'Guarantee_Period', 'const']
['Boardcomputer', 'HP', 'const']
['Boardcomputer', 'KM', 'const']
['Boardcomputer', 'Met_Color', 'const']
['Boardcomputer', 'Metallic_Rim', 'const']
['Boardcomputer', 'Mfg_Month', 'const']
['Boardcomputer', 'Mfg_Year', 'const']
['Boardcomputer', 'Mfr_Guarantee', 'const']
['Boardcomputer', 'Mistlamps', 'const']
['Boardcomputer', 'Petrol', 'const']
['Boardcomputer', 'Power_Steering', 'const']
['Boardcomputer', 'Powered_Windows', 'const']
['Boardcomputer', 'Quarterly_Tax', 'const']
['Boardcomputer', 'Radio', 'const']
['Boardcomputer', 'Radio_cassette', 'const']
['Boardcomputer', 'Sport_Model', 'const']
['Boardcomputer', 'Tow_Bar', 'const']
['Boardcomputer', 'Weight', 'const']
['Boardcomputer', 'cc', 'const']
['CD_Player', 'CNG', 'const']
['CD_Player', 'Central_Lock', 'const']
['CD_Player', 'Cylinders', 'const']
['CD_Player', 'Diesel', 'const']
['CD_Player', 'Doors', 'const']
['CD_Player', 'Gears', 'const']
['CD_Player', 'Guarantee_Period', 'const']
['CD_Player', 'HP', 'const']
['CD_Player', 'KM', 'const']
['CD_Player', 'Met_Color', 'const']
['CD_Player', 'Metallic_Rim', 'const']
['CD_Player', 'Mfg_Month', 'const']
['CD_Player', 'Mfg_Year', 'const']
['CD_Player', 'Mfr_Guarantee', 'const']
['CD_Player', 'Mistlamps', 'const']
['CD_Player', 'Petrol', 'const']
['CD_Player', 'Power_Steering', 'const']
['CD_Player', 'Powered_Windows', 'const']
['CD_Player', 'Quarterly_Tax', 'const']
['CD_Player', 'Radio', 'const']
['CD_Player', 'Radio_cassette', 'const']
['CD_Player', 'Sport_Model', 'const']
['CD_Player', 'Tow_Bar', 'const']
['CD_Player', 'Weight', 'const']
['CD_Player', 'cc', 'const']
['CNG', 'Central_Lock', 'const']
['CNG', 'Cylinders', 'const']
['CNG', 'Diesel', 'const']
['CNG', 'Doors', 'const']
['CNG', 'Gears', 'const']
['CNG', 'Guarantee_Period', 'const']
['CNG', 'HP', 'const']
['CNG', 'KM', 'const']
['CNG', 'Met_Color', 'const']
['CNG', 'Metallic_Rim', 'const']
['CNG', 'Mfg_Month', 'const']
['CNG', 'Mfg_Year', 'const']
['CNG', 'Mfr_Guarantee', 'const']
['CNG', 'Mistlamps', 'const']
['CNG', 'Petrol', 'const']
['CNG', 'Power_Steering', 'const']
['CNG', 'Powered_Windows', 'const']
['CNG', 'Quarterly_Tax', 'const']
['CNG', 'Radio', 'const']
['CNG', 'Radio_cassette', 'const']
['CNG', 'Sport_Model', 'const']
['CNG', 'Tow_Bar', 'const']
['CNG', 'Weight', 'const']
['CNG', 'cc', 'const']
['Central_Lock', 'Cylinders', 'const']
['Central_Lock', 'Diesel', 'const']
['Central_Lock', 'Doors', 'const']
['Central_Lock', 'Gears', 'const']
['Central_Lock', 'Guarantee_Period', 'const']
['Central_Lock', 'HP', 'const']
['Central_Lock', 'KM', 'const']
['Central_Lock', 'Met_Color', 'const']
['Central_Lock', 'Metallic_Rim', 'const']
['Central_Lock', 'Mfg_Month', 'const']
['Central_Lock', 'Mfg_Year', 'const']
['Central_Lock', 'Mfr_Guarantee', 'const']
['Central_Lock', 'Mistlamps', 'const']
['Central_Lock', 'Petrol', 'const']
['Central_Lock', 'Power_Steering', 'const']
['Central_Lock', 'Powered_Windows', 'const']
['Central_Lock', 'Quarterly_Tax', 'const']
['Central_Lock', 'Radio', 'const']
['Central_Lock', 'Radio_cassette', 'const']
['Central_Lock', 'Sport_Model', 'const']
['Central_Lock', 'Tow_Bar', 'const']
['Central_Lock', 'Weight', 'const']
['Central_Lock', 'cc', 'const']
['Cylinders', 'Diesel', 'const']
['Cylinders', 'Doors', 'const']
['Cylinders', 'Gears', 'const']
['Cylinders', 'Guarantee_Period', 'const']
['Cylinders', 'HP', 'const']
['Cylinders', 'KM', 'const']
['Cylinders', 'Met_Color', 'const']
['Cylinders', 'Metallic_Rim', 'const']
['Cylinders', 'Mfg_Month', 'const']
['Cylinders', 'Mfg_Year', 'const']
['Cylinders', 'Mfr_Guarantee', 'const']
['Cylinders', 'Mistlamps', 'const']
['Cylinders', 'Petrol', 'const']
['Cylinders', 'Power_Steering', 'const']
['Cylinders', 'Powered_Windows', 'const']
['Cylinders', 'Quarterly_Tax', 'const']
['Cylinders', 'Radio', 'const']
['Cylinders', 'Radio_cassette', 'const']
['Cylinders', 'Sport_Model', 'const']
['Cylinders', 'Tow_Bar', 'const']
['Cylinders', 'Weight', 'const']
['Cylinders', 'cc', 'const']
['Diesel', 'Doors', 'const']
['Diesel', 'Gears', 'const']
['Diesel', 'Guarantee_Period', 'const']
['Diesel', 'HP', 'const']
['Diesel', 'KM', 'const']
['Diesel', 'Met_Color', 'const']
['Diesel', 'Metallic_Rim', 'const']
['Diesel', 'Mfg_Month', 'const']
['Diesel', 'Mfg_Year', 'const']
['Diesel', 'Mfr_Guarantee', 'const']
['Diesel', 'Mistlamps', 'const']
['Diesel', 'Petrol', 'const']
['Diesel', 'Power_Steering', 'const']
['Diesel', 'Powered_Windows', 'const']
['Diesel', 'Quarterly_Tax', 'const']
['Diesel', 'Radio', 'const']
['Diesel', 'Radio_cassette', 'const']
['Diesel', 'Sport_Model', 'const']
['Diesel', 'Tow_Bar', 'const']
['Diesel', 'Weight', 'const']
['Diesel', 'cc', 'const']
['Doors', 'Gears', 'const']
['Doors', 'Guarantee_Period', 'const']
['Doors', 'HP', 'const']
['Doors', 'KM', 'const']
['Doors', 'Met_Color', 'const']
['Doors', 'Metallic_Rim', 'const']
['Doors', 'Mfg_Month', 'const']
['Doors', 'Mfg_Year', 'const']
['Doors', 'Mfr_Guarantee', 'const']
['Doors', 'Mistlamps', 'const']
['Doors', 'Petrol', 'const']
['Doors', 'Power_Steering', 'const']
['Doors', 'Powered_Windows', 'const']
['Doors', 'Quarterly_Tax', 'const']
['Doors', 'Radio', 'const']
['Doors', 'Radio_cassette', 'const']
['Doors', 'Sport_Model', 'const']
['Doors', 'Tow_Bar', 'const']
['Doors', 'Weight', 'const']
['Doors', 'cc', 'const']
['Gears', 'Guarantee_Period', 'const']
['Gears', 'HP', 'const']
['Gears', 'KM', 'const']
['Gears', 'Met_Color', 'const']
['Gears', 'Metallic_Rim', 'const']
['Gears', 'Mfg_Month', 'const']
['Gears', 'Mfg_Year', 'const']
['Gears', 'Mfr_Guarantee', 'const']
['Gears', 'Mistlamps', 'const']
['Gears', 'Petrol', 'const']
['Gears', 'Power_Steering', 'const']
['Gears', 'Powered_Windows', 'const']
['Gears', 'Quarterly_Tax', 'const']
['Gears', 'Radio', 'const']
['Gears', 'Radio_cassette', 'const']
['Gears', 'Sport_Model', 'const']
['Gears', 'Tow_Bar', 'const']
['Gears', 'Weight', 'const']
['Gears', 'cc', 'const']
['Guarantee_Period', 'HP', 'const']
['Guarantee_Period', 'KM', 'const']
['Guarantee_Period', 'Met_Color', 'const']
['Guarantee_Period', 'Metallic_Rim', 'const']
['Guarantee_Period', 'Mfg_Month', 'const']
['Guarantee_Period', 'Mfg_Year', 'const']
['Guarantee_Period', 'Mfr_Guarantee', 'const']
['Guarantee_Period', 'Mistlamps', 'const']
['Guarantee_Period', 'Petrol', 'const']
['Guarantee_Period', 'Power_Steering', 'const']
['Guarantee_Period', 'Powered_Windows', 'const']
['Guarantee_Period', 'Quarterly_Tax', 'const']
['Guarantee_Period', 'Radio', 'const']
['Guarantee_Period', 'Radio_cassette', 'const']
['Guarantee_Period', 'Sport_Model', 'const']
['Guarantee_Period', 'Tow_Bar', 'const']
['Guarantee_Period', 'Weight', 'const']
['Guarantee_Period', 'cc', 'const']
['HP', 'KM', 'const']
['HP', 'Met_Color', 'const']
['HP', 'Metallic_Rim', 'const']
['HP', 'Mfg_Month', 'const']
['HP', 'Mfg_Year', 'const']
['HP', 'Mfr_Guarantee', 'const']
['HP', 'Mistlamps', 'const']
['HP', 'Petrol', 'const']
['HP', 'Power_Steering', 'const']
['HP', 'Powered_Windows', 'const']
['HP', 'Quarterly_Tax', 'const']
['HP', 'Radio', 'const']
['HP', 'Radio_cassette', 'const']
['HP', 'Sport_Model', 'const']
['HP', 'Tow_Bar', 'const']
['HP', 'Weight', 'const']
['HP', 'cc', 'const']
['KM', 'Met_Color', 'const']
['KM', 'Metallic_Rim', 'const']
['KM', 'Mfg_Month', 'const']
['KM', 'Mfg_Year', 'const']
['KM', 'Mfr_Guarantee', 'const']
['KM', 'Mistlamps', 'const']
['KM', 'Petrol', 'const']
['KM', 'Power_Steering', 'const']
['KM', 'Powered_Windows', 'const']
['KM', 'Quarterly_Tax', 'const']
['KM', 'Radio', 'const']
['KM', 'Radio_cassette', 'const']
['KM', 'Sport_Model', 'const']
['KM', 'Tow_Bar', 'const']
['KM', 'Weight', 'const']
['KM', 'cc', 'const']
['Met_Color', 'Metallic_Rim', 'const']
['Met_Color', 'Mfg_Month', 'const']
['Met_Color', 'Mfg_Year', 'const']
['Met_Color', 'Mfr_Guarantee', 'const']
['Met_Color', 'Mistlamps', 'const']
['Met_Color', 'Petrol', 'const']
['Met_Color', 'Power_Steering', 'const']
['Met_Color', 'Powered_Windows', 'const']
['Met_Color', 'Quarterly_Tax', 'const']
['Met_Color', 'Radio', 'const']
['Met_Color', 'Radio_cassette', 'const']
['Met_Color', 'Sport_Model', 'const']
['Met_Color', 'Tow_Bar', 'const']
['Met_Color', 'Weight', 'const']
['Met_Color', 'cc', 'const']
['Metallic_Rim', 'Mfg_Month', 'const']
['Metallic_Rim', 'Mfg_Year', 'const']
['Metallic_Rim', 'Mfr_Guarantee', 'const']
['Metallic_Rim', 'Mistlamps', 'const']
['Metallic_Rim', 'Petrol', 'const']
['Metallic_Rim', 'Power_Steering', 'const']
['Metallic_Rim', 'Powered_Windows', 'const']
['Metallic_Rim', 'Quarterly_Tax', 'const']
['Metallic_Rim', 'Radio', 'const']
['Metallic_Rim', 'Radio_cassette', 'const']
['Metallic_Rim', 'Sport_Model', 'const']
['Metallic_Rim', 'Tow_Bar', 'const']
['Metallic_Rim', 'Weight', 'const']
['Metallic_Rim', 'cc', 'const']
['Mfg_Month', 'Mfg_Year', 'const']
['Mfg_Month', 'Mfr_Guarantee', 'const']
['Mfg_Month', 'Mistlamps', 'const']
['Mfg_Month', 'Petrol', 'const']
['Mfg_Month', 'Power_Steering', 'const']
['Mfg_Month', 'Powered_Windows', 'const']
['Mfg_Month', 'Quarterly_Tax', 'const']
['Mfg_Month', 'Radio', 'const']
['Mfg_Month', 'Radio_cassette', 'const']
['Mfg_Month', 'Sport_Model', 'const']
['Mfg_Month', 'Tow_Bar', 'const']
['Mfg_Month', 'Weight', 'const']
['Mfg_Month', 'cc', 'const']
['Mfg_Year', 'Mfr_Guarantee', 'const']
['Mfg_Year', 'Mistlamps', 'const']
['Mfg_Year', 'Petrol', 'const']
['Mfg_Year', 'Power_Steering', 'const']
['Mfg_Year', 'Powered_Windows', 'const']
['Mfg_Year', 'Quarterly_Tax', 'const']
['Mfg_Year', 'Radio', 'const']
['Mfg_Year', 'Radio_cassette', 'const']
['Mfg_Year', 'Sport_Model', 'const']
['Mfg_Year', 'Tow_Bar', 'const']
['Mfg_Year', 'Weight', 'const']
['Mfg_Year', 'cc', 'const']
['Mfr_Guarantee', 'Mistlamps', 'const']
['Mfr_Guarantee', 'Petrol', 'const']
['Mfr_Guarantee', 'Power_Steering', 'const']
['Mfr_Guarantee', 'Powered_Windows', 'const']
['Mfr_Guarantee', 'Quarterly_Tax', 'const']
['Mfr_Guarantee', 'Radio', 'const']
['Mfr_Guarantee', 'Radio_cassette', 'const']
['Mfr_Guarantee', 'Sport_Model', 'const']
['Mfr_Guarantee', 'Tow_Bar', 'const']
['Mfr_Guarantee', 'Weight', 'const']
['Mfr_Guarantee', 'cc', 'const']
['Mistlamps', 'Petrol', 'const']
['Mistlamps', 'Power_Steering', 'const']
['Mistlamps', 'Powered_Windows', 'const']
['Mistlamps', 'Quarterly_Tax', 'const']
['Mistlamps', 'Radio', 'const']
['Mistlamps', 'Radio_cassette', 'const']
['Mistlamps', 'Sport_Model', 'const']
['Mistlamps', 'Tow_Bar', 'const']
['Mistlamps', 'Weight', 'const']
['Mistlamps', 'cc', 'const']
['Petrol', 'Power_Steering', 'const']
['Petrol', 'Powered_Windows', 'const']
['Petrol', 'Quarterly_Tax', 'const']
['Petrol', 'Radio', 'const']
['Petrol', 'Radio_cassette', 'const']
['Petrol', 'Sport_Model', 'const']
['Petrol', 'Tow_Bar', 'const']
['Petrol', 'Weight', 'const']
['Petrol', 'cc', 'const']
['Power_Steering', 'Powered_Windows', 'const']
['Power_Steering', 'Quarterly_Tax', 'const']
['Power_Steering', 'Radio', 'const']
['Power_Steering', 'Radio_cassette', 'const']
['Power_Steering', 'Sport_Model', 'const']
['Power_Steering', 'Tow_Bar', 'const']
['Power_Steering', 'Weight', 'const']
['Power_Steering', 'cc', 'const']
['Powered_Windows', 'Quarterly_Tax', 'const']
['Powered_Windows', 'Radio', 'const']
['Powered_Windows', 'Radio_cassette', 'const']
['Powered_Windows', 'Sport_Model', 'const']
['Powered_Windows', 'Tow_Bar', 'const']
['Powered_Windows', 'Weight', 'const']
['Powered_Windows', 'cc', 'const']
['Quarterly_Tax', 'Radio', 'const']
['Quarterly_Tax', 'Radio_cassette', 'const']
['Quarterly_Tax', 'Sport_Model', 'const']
['Quarterly_Tax', 'Tow_Bar', 'const']
['Quarterly_Tax', 'Weight', 'const']
['Quarterly_Tax', 'cc', 'const']
['Radio', 'Radio_cassette', 'const']
['Radio', 'Sport_Model', 'const']
['Radio', 'Tow_Bar', 'const']
['Radio', 'Weight', 'const']
['Radio', 'cc', 'const']
['Radio_cassette', 'Sport_Model', 'const']
['Radio_cassette', 'Tow_Bar', 'const']
['Radio_cassette', 'Weight', 'const']
['Radio_cassette', 'cc', 'const']
['Sport_Model', 'Tow_Bar', 'const']
['Sport_Model', 'Weight', 'const']
['Sport_Model', 'cc', 'const']
['Tow_Bar', 'Weight', 'const']
['Tow_Bar', 'cc', 'const']
['Weight', 'cc', 'const']


Measure training time
# 변수 선택에 따른 학습시간과 저장
models = pd.DataFrame(columns=["AIC", "model"])
tic = time.time()
for i in range(1,4):
    models.loc[i] = getBest(X=train_x,y=train_y,k=i)
toc = time.time()
print("Total elapsed time:", (toc-tic), "seconds.")
Processed  36 models on 1 predictors in 0.09873557090759277 seconds.
Processed  630 models on 2 predictors in 1.3473966121673584 seconds.
Processed  7140 models on 3 predictors in 17.01948356628418 seconds.
Total elapsed time: 18.805707454681396 seconds.


models
	AIC		model
1	17824.309811	<statsmodels.regression.linear_model.Regressio...
2	17579.120147	<statsmodels.regression.linear_model.Regressio...
3	17351.640619	<statsmodels.regression.linear_model.Regressio...


models.loc[3, "model"].summary()
OUTPUT

캡처



# 모든 변수들 모델링 한것과 비교 
print("full model Rsquared: ","{:.5f}".format(fitted_full_model.rsquared))
print("full model AIC: ","{:.5f}".format(fitted_full_model.aic))
print("full model MSE: ","{:.5f}".format(fitted_full_model.mse_total))
print("selected model Rsquared: ","{:.5f}".format(models.loc[3, "model"].rsquared))
print("selected model AIC: ","{:.5f}".format(models.loc[3, "model"].aic))
print("selected model MSE: ","{:.5f}".format(models.loc[3, "model"].mse_total))
full model Rsquared:  0.91141
full model AIC:  16960.68542
full model MSE:  13196639.65991
selected model Rsquared:  0.86124
selected model AIC:  17351.64062
selected model MSE:  13196639.65991


# Plot the result
plt.figure(figsize=(20,10))
plt.rcParams.update({'font.size': 18, 'lines.markersize': 10})

## Mallow Cp
plt.subplot(2, 2, 1)
Cp= models.apply(lambda row: (row[1].params.shape[0]+(row[1].mse_total-
                               fitted_full_model.mse_total)*(train_x.shape[0]-
                                row[1].params.shape[0])/fitted_full_model.mse_total
                               ), axis=1)
plt.plot(Cp)
plt.plot(Cp.argmin(), Cp.min(), "or")
plt.xlabel('# Predictors')
plt.ylabel('Cp')

# adj-rsquared plot
# adj-rsquared = Explained variation / Total variation
adj_rsquared = models.apply(lambda row: row[1].rsquared_adj, axis=1)
plt.subplot(2, 2, 2)
plt.plot(adj_rsquared)
plt.plot(adj_rsquared.argmax(), adj_rsquared.max(), "or")
plt.xlabel('# Predictors')
plt.ylabel('adjusted rsquared')

# aic
aic = models.apply(lambda row: row[1].aic, axis=1)
plt.subplot(2, 2, 3)
plt.plot(aic)
plt.plot(aic.argmin(), aic.min(), "or")
plt.xlabel('# Predictors')
plt.ylabel('AIC')

# bic
bic = models.apply(lambda row: row[1].bic, axis=1)
plt.subplot(2, 2, 4)
plt.plot(bic)
plt.plot(bic.argmin(), bic.min(), "or")
plt.xlabel(' # Predictors')
plt.ylabel('BIC')
OUTPUT

다운로드




Modify regression model(Feedforward selection)

########전진선택법(step=1)

def forward(X, y, predictors):
    # 데이터 변수들이 미리정의된 predictors에 있는지 없는지 확인 및 분류
    remaining_predictors = [p for p in X.columns.difference(['const']) if p not in predictors]
    tic = time.time()
    results = []
    for p in remaining_predictors:
        results.append(processSubset(X=X, y= y, feature_set=predictors+[p]+['const']))
    # 데이터프레임으로 변환
    models = pd.DataFrame(results)

    # AIC가 가장 낮은 것을 선택
    best_model = models.loc[models['AIC'].argmin()] # index
    toc = time.time()
    print("Processed ", models.shape[0], "models on", len(predictors)+1, "predictors in", (toc-tic))
    print('Selected predictors:',best_model['model'].model.exog_names,' AIC:',best_model[0] )
    return best_model


#### 전진선택법 모델

def forward_model(X,y):
    Fmodels = pd.DataFrame(columns=["AIC", "model"])
    tic = time.time()
    # 미리 정의된 데이터 변수
    predictors = []
    # 변수 1~10개 : 0~9 -> 1~10
    for i in range(1, len(X.columns.difference(['const'])) + 1):
        Forward_result = forward(X=X,y=y,predictors=predictors)
        if i > 1:
            if Forward_result['AIC'] > Fmodel_before:
                break
        Fmodels.loc[i] = Forward_result
        predictors = Fmodels.loc[i]["model"].model.exog_names
        Fmodel_before = Fmodels.loc[i]["AIC"]
        predictors = [ k for k in predictors if k != 'const']
    toc = time.time()
    print("Total elapsed time:", (toc - tic), "seconds.")

    return(Fmodels['model'][len(Fmodels['model'])])
OUTPUT
Forward_best_model = forward_model(X=train_x, y= train_y)
Processed  36 models on 1 predictors in 0.08973240852355957
Selected predictors: ['Mfg_Year', 'const']  AIC: 17755.072760646137
Processed  35 models on 2 predictors in 0.09027957916259766
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'const']  AIC: 17504.57948159159
Processed  34 models on 3 predictors in 0.06283736228942871
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'const']  AIC: 17398.182235131313
Processed  33 models on 4 predictors in 0.06283116340637207
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'const']  AIC: 17150.1641103143
Processed  32 models on 5 predictors in 0.07981634140014648
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'const']  AIC: 17091.096715621316
Processed  31 models on 6 predictors in 0.0840911865234375
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'const']  AIC: 17055.57896394218
Processed  30 models on 7 predictors in 0.0738370418548584
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'const']  AIC: 17033.36951099978
Processed  29 models on 8 predictors in 0.06878113746643066
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'const']  AIC: 17019.85679678918
Processed  28 models on 9 predictors in 0.09375500679016113
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'const']  AIC: 16995.322287055787
Processed  27 models on 10 predictors in 0.10174226760864258
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'const']  AIC: 16983.818299485778
Processed  26 models on 11 predictors in 0.10377311706542969
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'const']  AIC: 16964.290655626864
Processed  25 models on 12 predictors in 0.11771559715270996
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'const']  AIC: 16928.537083027266
Processed  24 models on 13 predictors in 0.12260055541992188
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'const']  AIC: 16921.374043681804
Processed  23 models on 14 predictors in 0.12865686416625977
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'const']  AIC: 16918.48093923768
Processed  22 models on 15 predictors in 0.16057229042053223
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'const']  AIC: 16916.04018485048
Processed  21 models on 16 predictors in 0.18660974502563477
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'const']  AIC: 16912.806529494097
Processed  20 models on 17 predictors in 0.11269783973693848
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'const']  AIC: 16909.805620763276
Processed  19 models on 18 predictors in 0.10549688339233398
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'const']  AIC: 16907.82736115733
Processed  18 models on 19 predictors in 0.10871052742004395
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'const']  AIC: 16907.14151076706
Processed  17 models on 20 predictors in 0.11475992202758789
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Mfg_Month', 'const']  AIC: 16906.91814803349
Processed  16 models on 21 predictors in 0.1306447982788086
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Mfg_Month', 'Gears', 'const']  AIC: 16906.641600994546
Processed  15 models on 22 predictors in 0.11366558074951172
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Mfg_Month', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994557
Total elapsed time: 2.4412221908569336 seconds.


Forward_best_model.aic
16906.641600994546


Modify regression model(Backward selection)

######## 후진선택법(step=1)
def backward(X,y,predictors):
    tic = time.time()
    results = []
    # 데이터 변수들이 미리정의된 predictors 조합 확인
    for combo in itertools.combinations(predictors, len(predictors) - 1):
        results.append(processSubset(X=X, y= y,feature_set=list(combo)+['const']))
    models = pd.DataFrame(results)
    # 가장 낮은 AIC를 가진 모델을 선택
    best_model = models.loc[models['AIC'].argmin()]
    toc = time.time()
    print("Processed ", models.shape[0], "models on", len(predictors) - 1, "predictors in",
          (toc - tic))
    print('Selected predictors:',best_model['model'].model.exog_names,' AIC:',best_model[0] )
    return best_model
    

# 후진 소거법 모델
def backward_model(X, y):
    Bmodels = pd.DataFrame(columns=["AIC", "model"], index = range(1,len(X.columns)))
    tic = time.time()
    predictors = X.columns.difference(['const'])
    Bmodel_before = processSubset(X,y,predictors)['AIC']
    while (len(predictors) > 1):
        Backward_result = backward(X=train_x, y= train_y, predictors = predictors)
        if Backward_result['AIC'] > Bmodel_before:
            break
        Bmodels.loc[len(predictors) - 1] = Backward_result
        predictors = Bmodels.loc[len(predictors) - 1]["model"].model.exog_names
        Bmodel_before = Backward_result['AIC']
        predictors = [ k for k in predictors if k != 'const']

    toc = time.time()
    print("Total elapsed time:", (toc - tic), "seconds.")
    return (Bmodels['model'].dropna().iloc[0])
OUTPUT
Backward_best_model = backward_model(X=train_x,y=train_y)
Processed  36 models on 35 predictors in 0.5307836532592773
Selected predictors: ['ABS', 'Age_08_04', 'Airbag_1', 'Airbag_2', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Doors', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Met_Color', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'cc', 'const']  AIC: 16919.554953086037
Processed  35 models on 34 predictors in 0.5086104869842529
Selected predictors: ['ABS', 'Age_08_04', 'Airbag_1', 'Airbag_2', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Doors', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'cc', 'const']  AIC: 16917.56065836032
Processed  34 models on 33 predictors in 0.47121691703796387
Selected predictors: ['ABS', 'Age_08_04', 'Airbag_2', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Doors', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'cc', 'const']  AIC: 16915.573733838028
Processed  33 models on 32 predictors in 0.3795206546783447
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Doors', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'cc', 'const']  AIC: 16913.747808225216
Processed  32 models on 31 predictors in 0.33935022354125977
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'cc', 'const']  AIC: 16912.053646583932
Processed  31 models on 30 predictors in 0.29421567916870117
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'cc', 'const']  AIC: 16910.726801088837
Processed  30 models on 29 predictors in 0.29419445991516113
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Power_Steering', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16909.60778490872
Processed  29 models on 28 predictors in 0.25033020973205566
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Metallic_Rim', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16908.55343667602
Processed  28 models on 27 predictors in 0.2254021167755127
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16907.502655808014
Processed  27 models on 26 predictors in 0.20220327377319336
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Central_Lock', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16906.70136854976
Processed  26 models on 25 predictors in 0.20789861679077148
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CD_Player', 'CNG', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16906.676844492846
Processed  25 models on 24 predictors in 0.18823885917663574
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CNG', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Petrol', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16906.641600994557
Processed  24 models on 23 predictors in 0.1715404987335205
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CNG', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Mfg_Month', 'Mfg_Year', 'Mfr_Guarantee', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16906.641600994557
Processed  23 models on 22 predictors in 0.15358972549438477
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CNG', 'Cylinders', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Mfg_Month', 'Mfr_Guarantee', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16906.641600994557
Processed  22 models on 21 predictors in 0.1326441764831543
Selected predictors: ['ABS', 'Age_08_04', 'Airco', 'Automatic_airco', 'BOVAG_Guarantee', 'Backseat_Divider', 'Boardcomputer', 'CNG', 'Diesel', 'Gears', 'Guarantee_Period', 'HP', 'KM', 'Mfg_Month', 'Mfr_Guarantee', 'Powered_Windows', 'Quarterly_Tax', 'Radio_cassette', 'Sport_Model', 'Tow_Bar', 'Weight', 'const']  AIC: 16906.64160099456
Total elapsed time: 4.432608604431152 seconds.


Backward_best_model.aic
16906.641600994557


Modify regression model(Stepwise)

def Stepwise_model(X,y):
    Stepmodels = pd.DataFrame(columns=["AIC", "model"])
    tic = time.time()
    predictors = []
    Smodel_before = processSubset(X,y,predictors+['const'])['AIC']
    # 변수 1~10개 : 0~9 -> 1~10
    for i in range(1, len(X.columns.difference(['const'])) + 1):
        Forward_result = forward(X=X, y=y, predictors=predictors) # constant added
        print('forward')
        Stepmodels.loc[i] = Forward_result
        predictors = Stepmodels.loc[i]["model"].model.exog_names
        predictors = [ k for k in predictors if k != 'const']
        Backward_result = backward(X=X, y=y, predictors=predictors)
        if Backward_result['AIC']< Forward_result['AIC']:
            Stepmodels.loc[i] = Backward_result
            predictors = Stepmodels.loc[i]["model"].model.exog_names
            Smodel_before = Stepmodels.loc[i]["AIC"]
            predictors = [ k for k in predictors if k != 'const']
            print('backward')
        if Stepmodels.loc[i]['AIC']> Smodel_before:
            break
        else:
            Smodel_before = Stepmodels.loc[i]["AIC"]
    toc = time.time()
    print("Total elapsed time:", (toc - tic), "seconds.")
    return (Stepmodels['model'][len(Stepmodels['model'])])
OUTPUT
Stepwise_best_model=Stepwise_model(X=train_x,y=train_y)
Processed  36 models on 1 predictors in 0.09873390197753906
Selected predictors: ['Mfg_Year', 'const']  AIC: 17755.072760646137
forward
Processed  1 models on 0 predictors in 0.009046554565429688
Selected predictors: ['const']  AIC: 19355.08856819785
Processed  35 models on 2 predictors in 0.130143404006958
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'const']  AIC: 17504.57948159159
forward
Processed  2 models on 1 predictors in 0.015958309173583984
Selected predictors: ['Mfg_Year', 'const']  AIC: 17755.072760646137
Processed  34 models on 3 predictors in 0.1465761661529541
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'const']  AIC: 17398.182235131313
forward
Processed  3 models on 2 predictors in 0.016946792602539062
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'const']  AIC: 17504.57948159159
Processed  33 models on 4 predictors in 0.1317136287689209
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'const']  AIC: 17150.1641103143
forward
Processed  4 models on 3 predictors in 0.015963077545166016
Selected predictors: ['Mfg_Year', 'Weight', 'KM', 'const']  AIC: 17306.79774531549
Processed  32 models on 5 predictors in 0.08627820014953613
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'const']  AIC: 17091.096715621316
forward
Processed  5 models on 4 predictors in 0.011969327926635742
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'const']  AIC: 17150.1641103143
Processed  31 models on 6 predictors in 0.07229804992675781
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'const']  AIC: 17055.57896394218
forward
Processed  6 models on 5 predictors in 0.016991615295410156
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'const']  AIC: 17091.096715621316
Processed  30 models on 7 predictors in 0.05830645561218262
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'const']  AIC: 17033.36951099978
forward
Processed  7 models on 6 predictors in 0.01599907875061035
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'const']  AIC: 17055.57896394218
Processed  29 models on 8 predictors in 0.06846237182617188
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'const']  AIC: 17019.85679678918
forward
Processed  8 models on 7 predictors in 0.017005205154418945
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'const']  AIC: 17033.36951099978
Processed  28 models on 9 predictors in 0.11175131797790527
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'const']  AIC: 16995.322287055787
forward
Processed  9 models on 8 predictors in 0.01898479461669922
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Guarantee_Period', 'BOVAG_Guarantee', 'const']  AIC: 17012.519514899912
Processed  27 models on 10 predictors in 0.1047210693359375
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'const']  AIC: 16983.818299485778
forward
Processed  10 models on 9 predictors in 0.03191518783569336
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'const']  AIC: 16995.322287055787
Processed  26 models on 11 predictors in 0.10965585708618164
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'const']  AIC: 16964.290655626864
forward
Processed  11 models on 10 predictors in 0.04288458824157715
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'CNG', 'Quarterly_Tax', 'const']  AIC: 16978.68338783714
Processed  25 models on 12 predictors in 0.15957117080688477
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'const']  AIC: 16928.537083027266
forward
Processed  12 models on 11 predictors in 0.08481073379516602
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'Quarterly_Tax', 'Petrol', 'const']  AIC: 16932.104261902947
Processed  24 models on 13 predictors in 0.17156600952148438
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'const']  AIC: 16921.374043681804
forward
Processed  13 models on 12 predictors in 0.09979891777038574
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'const']  AIC: 16924.75355369365
Processed  23 models on 14 predictors in 0.17253684997558594
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'const']  AIC: 16918.48093923768
forward
Processed  14 models on 13 predictors in 0.08875823020935059
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'const']  AIC: 16921.374043681804
Processed  22 models on 15 predictors in 0.15457653999328613
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'const']  AIC: 16916.04018485048
forward
Processed  15 models on 14 predictors in 0.10401105880737305
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'const']  AIC: 16918.48093923768
Processed  21 models on 16 predictors in 0.15857505798339844
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'const']  AIC: 16912.806529494097
forward
Processed  16 models on 15 predictors in 0.11768555641174316
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'const']  AIC: 16916.04018485048
Processed  20 models on 17 predictors in 0.13663506507873535
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'const']  AIC: 16909.805620763276
forward
Processed  17 models on 16 predictors in 0.08477330207824707
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Airco', 'ABS', 'Sport_Model', 'const']  AIC: 16912.187005800086
Processed  19 models on 18 predictors in 0.10272526741027832
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'const']  AIC: 16907.82736115733
forward
Processed  18 models on 17 predictors in 0.1127007007598877
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'const']  AIC: 16908.531987499395
Processed  18 models on 19 predictors in 0.11521244049072266
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'const']  AIC: 16907.14151076706
forward
Processed  19 models on 18 predictors in 0.15088891983032227
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'const']  AIC: 16907.82736115733
Processed  17 models on 20 predictors in 0.16663289070129395
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Mfg_Month', 'const']  AIC: 16906.91814803349
forward
Processed  20 models on 19 predictors in 0.2127993106842041
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'const']  AIC: 16907.14151076706
Processed  16 models on 21 predictors in 0.10770010948181152
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Mfg_Month', 'Gears', 'const']  AIC: 16906.641600994546
forward
Processed  21 models on 20 predictors in 0.1256864070892334
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Mfg_Month', 'const']  AIC: 16906.91814803349
Processed  15 models on 22 predictors in 0.10097765922546387
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Mfg_Month', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994557
forward
Processed  22 models on 21 predictors in 0.17354369163513184
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.15059447288513184
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.19049072265625
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.1495981216430664
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.13814973831176758
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.11270356178283691
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.15808415412902832
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.09469938278198242
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.14464545249938965
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.1326456069946289
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.1595752239227295
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.11668825149536133
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.13965892791748047
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.17457914352416992
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.19448089599609375
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.10567355155944824
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.15602421760559082
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.09773826599121094
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.1266651153564453
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.0937490463256836
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.12469983100891113
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.11266231536865234
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.14760518074035645
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.10172867774963379
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.19098138809204102
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.14887738227844238
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.15437889099121094
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Processed  15 models on 22 predictors in 0.10134077072143555
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'Diesel', 'const']  AIC: 16906.64160099451
forward
Processed  22 models on 21 predictors in 0.14920735359191895
Selected predictors: ['Mfg_Year', 'Automatic_airco', 'Weight', 'KM', 'Powered_Windows', 'HP', 'Mfr_Guarantee', 'Guarantee_Period', 'BOVAG_Guarantee', 'CNG', 'Quarterly_Tax', 'Petrol', 'Tow_Bar', 'Boardcomputer', 'Airco', 'ABS', 'Sport_Model', 'Backseat_Divider', 'Radio_cassette', 'Gears', 'Age_08_04', 'const']  AIC: 16906.641600994506
backward
Total elapsed time: 8.44080114364624 seconds.


Stepwise_best_model.aic
16906.641600994506


Model performance

# 모델에 의해 예측된/추정된 값 <->  test_y
pred_y_full = fitted_full_model.predict(test_x)
pred_y_forward = Forward_best_model.predict(test_x[Forward_best_model.model.exog_names])
pred_y_backward = Backward_best_model.predict(test_x[Backward_best_model.model.exog_names])
pred_y_stepwise = Stepwise_best_model.predict(test_x[Stepwise_best_model.model.exog_names])

perf_mat = pd.DataFrame(columns=["ALL", "FORWARD", "BACKWARD", "STEPWISE"],
                        index =['MSE', 'RMSE','MAE', 'MAPE'])
			
def mean_absolute_percentage_error(y_true, y_pred):
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
from sklearn import metrics

# 성능지표
perf_mat.loc['MSE']['ALL'] = metrics.mean_squared_error(test_y,pred_y_full)
perf_mat.loc['MSE']['FORWARD'] = metrics.mean_squared_error(test_y,pred_y_forward)
perf_mat.loc['MSE']['BACKWARD'] = metrics.mean_squared_error(test_y,pred_y_backward)
perf_mat.loc['MSE']['STEPWISE'] = metrics.mean_squared_error(test_y,pred_y_stepwise)

perf_mat.loc['RMSE']['ALL'] = np.sqrt(metrics.mean_squared_error(test_y, pred_y_full))
perf_mat.loc['RMSE']['FORWARD'] = np.sqrt(metrics.mean_squared_error(test_y, pred_y_forward))
perf_mat.loc['RMSE']['BACKWARD'] = np.sqrt(metrics.mean_squared_error(test_y, pred_y_backward))
perf_mat.loc['RMSE']['STEPWISE'] = np.sqrt(metrics.mean_squared_error(test_y, pred_y_stepwise))

perf_mat.loc['MAE']['ALL'] = metrics.mean_absolute_error(test_y, pred_y_full)
perf_mat.loc['MAE']['FORWARD'] = metrics.mean_absolute_error(test_y, pred_y_forward)
perf_mat.loc['MAE']['BACKWARD'] = metrics.mean_absolute_error(test_y, pred_y_backward)
perf_mat.loc['MAE']['STEPWISE'] = metrics.mean_absolute_error(test_y, pred_y_stepwise)

perf_mat.loc['MAPE']['ALL'] = mean_absolute_percentage_error(test_y, pred_y_full)
perf_mat.loc['MAPE']['FORWARD'] = mean_absolute_percentage_error(test_y, pred_y_forward)
perf_mat.loc['MAPE']['BACKWARD'] = mean_absolute_percentage_error(test_y, pred_y_backward)
perf_mat.loc['MAPE']['STEPWISE'] = mean_absolute_percentage_error(test_y, pred_y_stepwise)

print(perf_mat)
              ALL      FORWARD     BACKWARD     STEPWISE
MSE   1.44149e+06  1.46142e+06  1.46142e+06  1.46142e+06
RMSE      1200.62      1208.89      1208.89      1208.89
MAE       853.494      863.524      863.524      863.524
MAPE      8.48549      8.59054      8.59054      8.59054
The number of params
print(Forward_best_model.params.shape, Backward_best_model.params.shape, Stepwise_best_model.params.shape)
(24,) (24,) (24,)


print(len(fitted_full_model.params))
print(len(Forward_best_model.params))
print(len(Backward_best_model.params))
print(len(Stepwise_best_model.params))
37
24
24
24




Logistic regression about dataset on real world

Dataset download

Dataset Description
Experience 경력
Income 수입
Famliy 가족단위
CCAvg 월 카드사용량 
Education 교육수준 (1: undergrad; 2, Graduate; 3; Advance )
Mortgage 가계대출
Securities account 유가증권계좌유무
CD account 양도예금증서 계좌 유무
Online 온라인계좌유무
CreidtCard 신용카드유무 

Data preprocessing

# 분석에 필요한 패키지 불러오기
import os
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, roc_auc_score, roc_curve
import statsmodels.api as sm
import matplotlib.pyplot as plt
import itertools
import time

# 의미없는 변수 제거
ploan = pd.read_csv(r'C:\Users\userd\Desktop\dataset\Personal_Loan.csv')
ploan_processed = ploan.dropna().drop(['ID','ZIP Code'], axis=1, inplace=False)
ploan_processed = sm.add_constant(ploan_processed, has_constant='add')

# split into train and test
feature_columns = list(ploan_processed.columns.difference(["Personal Loan"]))
X = ploan_processed[feature_columns]
y = ploan_processed['Personal Loan'] # 대출여부: 1 or 0
train_x, test_x, train_y, test_y = train_test_split(X, y, stratify=y,train_size=0.7,test_size=0.3,random_state=42)
Data : Input
ploan
'\nExperience 경력\nIncome 수입\nFamliy 가족단위\nCCAvg 월 카드사용량 \nEducation 교육수준 (1: undergrad; 2, Graduate; 3; Advance )\nMortgage 가계대출\nSecurities account 유가증권계좌유무\nCD account 양도예금증서 계좌 유무\nOnline 온라인계좌유무\nCreidtCard 신용카드유무 \n\n'


ploan_processed
	Age	Experience	Income	Family	CCAvg	Education	Mortgage	Personal Loan	Securities Account	CD Account	Online	CreditCard
0	25	1		49	4	1.6	1		0		0		1			0		0	0
1	45	19		34	3	1.5	1		0		0		1			0		0	0
2	39	15		11	1	1.0	1		0		0		0			0		0	0
3	35	9		100	1	2.7	2		0		0		0			0		0	0
4	35	8		45	4	1.0	2		0		0		0			0		0	1
...	...	...	...	...	...	...	...	...	...	...	...	...
2495	46	22		70	4	1.9	1		212		0		0			0		0	1
2496	63	37		32	3	0.7	2		141		0		0			0		0	0
2497	33	9		14	3	0.9	3		114		0		0			0		0	0
2498	38	14		111	2	6.1	1		326		0		0			0		0	0
2499	53	27		38	4	2.8	2		144		0		1			0		1	0
2500 rows × 12 columns


constant_ploan_processed
	const	Age	Experience	Income	Family	CCAvg	Education	Mortgage	Personal Loan	Securities Account	CD Account	Online	CreditCard
0	1.0	25	1		49	4	1.6	1		0		0		1			0		0	0
1	1.0	45	19		34	3	1.5	1		0		0		1			0		0	0
2	1.0	39	15		11	1	1.0	1		0		0		0			0		0	0
3	1.0	35	9		100	1	2.7	2		0		0		0			0		0	0
4	1.0	35	8		45	4	1.0	2		0		0		0			0		0	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...
2495	1.0	46	22		70	4	1.9	1		212		0		0			0		0	1
2496	1.0	63	37		32	3	0.7	2		141		0		0			0		0	0
2497	1.0	33	9		14	3	0.9	3		114		0		0			0		0	0
2498	1.0	38	14		111	2	6.1	1		326		0		0			0		0	0
2499	1.0	53	27		38	4	2.8	2		144		0		1			0		1	0
2500 rows × 13 columns


print(train_x.shape, test_x.shape, train_y.shape, test_y.shape)
(1750, 12) (750, 12) (1750,) (750,)


Regression analysis

model = sm.Logit(train_y, train_x)
results = model.fit(method='newton')
results.summary()
OUTPUT : Model results

캡처

results.params
Age                    0.024471
CCAvg                  0.098468
CD Account             4.372577
CreditCard            -1.237447
Education              1.520329
Experience            -0.007032
Family                 0.757911
Income                 0.054695
Mortgage              -0.000133
Online                -0.440746
Securities Account    -1.852006
const                -13.920298
dtype: float64


np.exp(results.params)
Age                   1.024773e+00
CCAvg                 1.103479e+00
CD Account            7.924761e+01
CreditCard            2.901239e-01
Education             4.573729e+00
Experience            9.929928e-01
Family                2.133814e+00
Income                1.056218e+00
Mortgage              9.998665e-01
Online                6.435563e-01
Securities Account    1.569221e-01
const                 9.005163e-07
dtype: float64

Model prediction
pred_y = results.predict(test_x)
pred_y
1065    0.012968
487     0.023841
2157    0.001210
1765    0.196245
525     0.006610
1573    0.241812
2103    0.060656
1601    0.339803
1329    0.002238
970     0.003269
875     0.004334
661     0.000976
1356    0.001064
1454    0.084981
838     0.026756
2042    0.010442
1401    0.038788
2025    0.006997
1475    0.091474
969     0.032079
2268    0.004988
456     0.004391
1685    0.017692
1702    0.014201
102     0.005766
1712    0.001604
1280    0.141404
2470    0.612456
2433    0.435395
2326    0.015946
          ...   
1120    0.001546
689     0.000588
70      0.004755
2483    0.001897
1067    0.561103
1123    0.472680
1166    0.145754
1572    0.002263
227     0.836443
1127    0.000111
812     0.036772
2184    0.977346
998     0.016186
828     0.000613
2104    0.063208
1135    0.000021
2434    0.003421
451     0.008169
1286    0.001812
1364    0.009835
1827    0.010325
2093    0.073346
168     0.000349
2062    0.046096
107     0.000239
277     0.019982
914     0.959460
542     0.005239
32      0.011344
2360    0.084464
Length: 750, dtype: float64


def cut_off(y,threshold):
    Y = y.copy() # copy함수를 사용하여 이전의 y값이 변화지 않게 함
    Y[Y>threshold]=1
    Y[Y<=threshold]=0
    return(Y.astype(int))

pred_Y = cut_off(pred_y,0.5)
pred_Y
1065    0
487     0
2157    0
1765    0
525     0
1573    0
2103    0
1601    0
1329    0
970     0
875     0
661     0
1356    0
1454    0
838     0
2042    0
1401    0
2025    0
1475    0
969     0
2268    0
456     0
1685    0
1702    0
102     0
1712    0
1280    0
2470    1
2433    0
2326    0
       ..
1120    0
689     0
70      0
2483    0
1067    1
1123    0
1166    0
1572    0
227     1
1127    0
812     0
2184    1
998     0
828     0
2104    0
1135    0
2434    0
451     0
1286    0
1364    0
1827    0
2093    0
168     0
2062    0
107     0
277     0
914     1
542     0
32      0
2360    0
Length: 750, dtype: int32

Model diagnosis
print("model AIC: ","{:.5f}".format(results.aic))
model AIC:  482.69329

Model performance(1)
pred_y = results.predict(test_x)

def cut_off(y,threshold):
    Y = y.copy() # copy함수를 사용하여 이전의 y값이 변화지 않게 함
    Y[Y>threshold]=1
    Y[Y<=threshold]=0
    return(Y.astype(int))

pred_Y = cut_off(pred_y,0.5)

cfmat = confusion_matrix(test_y,pred_Y)

def acc(cfmat) :
    acc=(cfmat[0,0]+cfmat[1,1])/np.sum(cfmat) ## accuracy
    return(acc)
Accuracy
print(cfmat)
[[660  13]
 [ 29  48]]


(cfmat[0,0]+cfmat[1,1])/np.sum(cfmat) ## accuracy
0.944


Performance based on cut-off values

def cut_off(y,threshold):
    Y = y.copy() # copy함수를 사용하여 이전의 y값이 변화지 않게 함
    Y[Y>threshold]=1
    Y[Y<=threshold]=0
    return(Y.astype(int))

def acc(cfmat) :
    acc=(cfmat[0,0]+cfmat[1,1])/np.sum(cfmat) ## accuracy
    return(acc)
    
pred_y = results.predict(test_x)    
pred_Y = cut_off(pred_y,0.5)
cfmat = confusion_matrix(test_y,pred_Y)

threshold = np.arange(0,1,0.1)
table = pd.DataFrame(columns=['ACC'])
for i in threshold:
    pred_Y = cut_off(pred_y,i)
    cfmat = confusion_matrix(test_y, pred_Y)
    table.loc[i] = acc(cfmat)
table.index.name='threshold'
table.columns.name='performance'
table
performance	ACC
threshold	
0.0	0.102667
0.1	0.908000
0.2	0.922667
0.3	0.932000
0.4	0.936000
0.5	0.944000
0.6	0.949333
0.7	0.946667
0.8	0.941333
0.9	0.937333

Model performance(2)
# sklearn ROC 패키지 제공
pred_y = results.predict(test_x)    
fpr, tpr, thresholds = metrics.roc_curve(test_y, pred_y, pos_label=1)

# Print ROC curve
plt.plot(fpr,tpr)

# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)
AUC: 0.9463923891858513

다운로드 (6)



Modify regression model

feature_columns = list(ploan_processed.columns.difference(["Personal Loan","Experience",  "Mortgage"]))
X = ploan_processed[feature_columns]
y = ploan_processed['Personal Loan'] # 대출여부: 1 or 0

train_x2, test_x2, train_y, test_y = train_test_split(X, y, stratify=y,train_size=0.7,test_size=0.3,random_state=42)
model = sm.Logit(train_y, train_x2)
results2 = model.fit(method='newton')
results2.summary()
Data : Input
print(train_x.shape, test_x.shape, train_y.shape, test_y.shape)
(1750, 12) (750, 12) (1750,) (750,)

OUTPUT : Model results

캡처


Model prediction
def cut_off(y,threshold):
    Y = y.copy() # copy함수를 사용하여 이전의 y값이 변화지 않게 함
    Y[Y>threshold]=1
    Y[Y<=threshold]=0
    return(Y.astype(int))

pred_y = results2.predict(test_x2)
pred_Y = cut_off(pred_y,0.5)
pred_Y
1065    0
487     0
2157    0
1765    0
525     0
1573    0
2103    0
1601    0
1329    0
970     0
875     0
661     0
1356    0
1454    0
838     0
2042    0
1401    0
2025    0
1475    0
969     0
2268    0
456     0
1685    0
1702    0
102     0
1712    0
1280    0
2470    1
2433    0
2326    0
       ..
1120    0
689     0
70      0
2483    0
1067    1
1123    0
1166    0
1572    0
227     1
1127    0
812     0
2184    1
998     0
828     0
2104    0
1135    0
2434    0
451     0
1286    0
1364    0
1827    0
2093    0
168     0
2062    0
107     0
277     0
914     1
542     0
32      0
2360    0
Length: 750, dtype: int32

Model performance(1)
def cut_off(y,threshold):
    Y = y.copy() # copy함수를 사용하여 이전의 y값이 변화지 않게 함
    Y[Y>threshold]=1
    Y[Y<=threshold]=0
    return(Y.astype(int))
    
pred_y = results2.predict(test_x2)
pred_Y = cut_off(pred_y,0.5)
cfmat = confusion_matrix(test_y,pred_Y)

def acc(cfmat) :
    acc=(cfmat[0,0]+cfmat[1,1])/np.sum(cfmat) ## accuracy
    return(acc)

acc(cfmat)   ## accuracy
0.944
Confusion matrix
print(cfmat)
[[660  13]
 [ 29  48]]


Performance based on cut-off values

def cut_off(y,threshold):
    Y = y.copy() # copy함수를 사용하여 이전의 y값이 변화지 않게 함
    Y[Y>threshold]=1
    Y[Y<=threshold]=0
    return(Y.astype(int))
    
def acc(cfmat) :
    acc=(cfmat[0,0]+cfmat[1,1])/np.sum(cfmat) ## accuracy
    return(acc)

pred_y = results2.predict(test_x2)
pred_Y = cut_off(pred_y,0.5)
cfmat = confusion_matrix(test_y,pred_Y)

threshold = np.arange(0,1,0.1)
table = pd.DataFrame(columns=['ACC'])
for i in threshold:
    pred_Y = cut_off(pred_y,i)
    cfmat = confusion_matrix(test_y, pred_Y)
    table.loc[i] = acc(cfmat)
table.index.name='threshold'
table.columns.name='performance'
table
performance	ACC
threshold	
0.0	0.102667
0.1	0.908000
0.2	0.922667
0.3	0.932000
0.4	0.936000
0.5	0.944000
0.6	0.949333
0.7	0.946667
0.8	0.941333
0.9	0.937333

Model performance(2)
# sklearn ROC 패키지 제공
pred_y = results2.predict(test_x2)
fpr, tpr, thresholds = metrics.roc_curve(test_y, pred_y, pos_label=1)

# Print ROC curve
plt.plot(fpr,tpr)

# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)
AUC: 0.9465467667547905

다운로드 (7)



Modify regression model(Variables selection)

feature_columns = list(ploan_processed.columns.difference(["Personal Loan"]))
X = ploan_processed[feature_columns]
y = ploan_processed['Personal Loan'] # 대출여부: 1 or 0

train_x, test_x, train_y, test_y = train_test_split(X, y, stratify=y,train_size=0.7,test_size=0.3,random_state=42)

def processSubset(X,y, feature_set):
            model = sm.Logit(y,X[list(feature_set)])
            regr = model.fit()
            AIC = regr.aic
            return {"model":regr, "AIC":AIC}
        
'''
전진선택법
'''
def forward(X, y, predictors):
    # 데이터 변수들이 미리정의된 predictors에 있는지 없는지 확인 및 분류
    remaining_predictors = [p for p in X.columns.difference(['const']) if p not in predictors]
    tic = time.time()
    results = []
    for p in remaining_predictors:
        results.append(processSubset(X=X, y= y, feature_set=predictors+[p]+['const']))
    # 데이터프레임으로 변환
    models = pd.DataFrame(results)

    # AIC가 가장 낮은 것을 선택
    best_model = models.loc[models['AIC'].argmin()] # index
    toc = time.time()
    print("Processed ", models.shape[0], "models on", len(predictors)+1, "predictors in", (toc-tic))
    print('Selected predictors:',best_model['model'].model.exog_names,' AIC:',best_model[0] )
    return best_model

def forward_model(X,y):
    Fmodels = pd.DataFrame(columns=["AIC", "model"])
    tic = time.time()
    # 미리 정의된 데이터 변수
    predictors = []
    # 변수 1~10개 : 0~9 -> 1~10
    for i in range(1, len(X.columns.difference(['const'])) + 1):
        Forward_result = forward(X=X,y=y,predictors=predictors)
        if i > 1:
            if Forward_result['AIC'] > Fmodel_before:
                break
        Fmodels.loc[i] = Forward_result
        predictors = Fmodels.loc[i]["model"].model.exog_names
        Fmodel_before = Fmodels.loc[i]["AIC"]
        predictors = [ k for k in predictors if k != 'const']
    toc = time.time()
    print("Total elapsed time:", (toc - tic), "seconds.")

    return(Fmodels['model'][len(Fmodels['model'])])


'''
후진소거법
'''
def backward(X,y,predictors):
    tic = time.time()
    results = []
    
    # 데이터 변수들이 미리정의된 predictors 조합 확인
    for combo in itertools.combinations(predictors, len(predictors) - 1):
        results.append(processSubset(X=X, y= y,feature_set=list(combo)+['const']))
    models = pd.DataFrame(results)
    
    # 가장 낮은 AIC를 가진 모델을 선택
    best_model = models.loc[models['AIC'].argmin()]
    toc = time.time()
    print("Processed ", models.shape[0], "models on", len(predictors) - 1, "predictors in",
          (toc - tic))
    print('Selected predictors:',best_model['model'].model.exog_names,' AIC:',best_model[0] )
    return best_model


def backward_model(X, y):
    Bmodels = pd.DataFrame(columns=["AIC", "model"], index = range(1,len(X.columns)))
    tic = time.time()
    predictors = X.columns.difference(['const'])
    Bmodel_before = processSubset(X,y,predictors)['AIC']
    while (len(predictors) > 1):
        Backward_result = backward(X=train_x, y= train_y, predictors = predictors)
        if Backward_result['AIC'] > Bmodel_before:
            break
        Bmodels.loc[len(predictors) - 1] = Backward_result
        predictors = Bmodels.loc[len(predictors) - 1]["model"].model.exog_names
        Bmodel_before = Backward_result['AIC']
        predictors = [ k for k in predictors if k != 'const']

    toc = time.time()
    print("Total elapsed time:", (toc - tic), "seconds.")
    return (Bmodels['model'].dropna().iloc[0])


'''
단계적 선택법
'''
def Stepwise_model(X,y):
    Stepmodels = pd.DataFrame(columns=["AIC", "model"])
    tic = time.time()
    predictors = []
    Smodel_before = processSubset(X,y,predictors+['const'])['AIC']
    # 변수 1~10개 : 0~9 -> 1~10
    for i in range(1, len(X.columns.difference(['const'])) + 1):
        Forward_result = forward(X=X, y=y, predictors=predictors) # constant added
        print('forward')
        Stepmodels.loc[i] = Forward_result
        predictors = Stepmodels.loc[i]["model"].model.exog_names
        predictors = [ k for k in predictors if k != 'const']
        Backward_result = backward(X=X, y=y, predictors=predictors)
        if Backward_result['AIC']< Forward_result['AIC']:
            Stepmodels.loc[i] = Backward_result
            predictors = Stepmodels.loc[i]["model"].model.exog_names
            Smodel_before = Stepmodels.loc[i]["AIC"]
            predictors = [ k for k in predictors if k != 'const']
            print('backward')
        if Stepmodels.loc[i]['AIC']> Smodel_before:
            break
        else:
            Smodel_before = Stepmodels.loc[i]["AIC"]
    toc = time.time()
    print("Total elapsed time:", (toc - tic), "seconds.")
    return (Stepmodels['model'][len(Stepmodels['model'])])
    
    
def cut_off(y,threshold):
    Y = y.copy() # copy함수를 사용하여 이전의 y값이 변화지 않게 함
    Y[Y>threshold]=1
    Y[Y<=threshold]=0
    return(Y.astype(int))
    
def acc(cfmat) :
    acc=(cfmat[0,0]+cfmat[1,1])/np.sum(cfmat) ## accuracy
    return(acc)
    
    
Forward_best_model = forward_model(X=train_x, y= train_y)
Backward_best_model = backward_model(X=train_x,y=train_y)
Stepwise_best_model = Stepwise_model(X=train_x,y=train_y)

pred_y_full = results2.predict(test_x2) # full model
pred_y_forward = Forward_best_model.predict(test_x[Forward_best_model.model.exog_names])
pred_y_backward = Backward_best_model.predict(test_x[Backward_best_model.model.exog_names])
pred_y_stepwise = Stepwise_best_model.predict(test_x[Stepwise_best_model.model.exog_names])

pred_Y_full= cut_off(pred_y_full,0.5)
pred_Y_forward = cut_off(pred_y_forward,0.5)
pred_Y_backward = cut_off(pred_y_backward,0.5)
pred_Y_stepwise = cut_off(pred_y_stepwise,0.5)

cfmat_full = confusion_matrix(test_y, pred_Y_full)
cfmat_forward = confusion_matrix(test_y, pred_Y_forward)
cfmat_backward = confusion_matrix(test_y, pred_Y_backward)
cfmat_stepwise = confusion_matrix(test_y, pred_Y_stepwise)
Selected model
Forward_best_model = forward_model(X=train_x, y= train_y)
OUTPUT
Optimization terminated successfully.
         Current function value: 0.329986
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.284217
         Iterations 7
Optimization terminated successfully.
         Current function value: 0.296731
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.330062
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.322763
         Iterations 7
Optimization terminated successfully.
         Current function value: 0.329995
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.327824
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.205738
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.324953
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.329912
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.330059
         Iterations 6
Processed  11 models on 1 predictors in 0.06183505058288574
Selected predictors: ['Income', 'const']  AIC: 724.0825012461598
Optimization terminated successfully.
         Current function value: 0.205431
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205682
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.185721
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205517
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.169107
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205563
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.182286
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205735
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205561
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205167
         Iterations 8
Processed  10 models on 2 predictors in 0.05884265899658203
Selected predictors: ['Income', 'Education', 'const']  AIC: 597.8752580578658
Optimization terminated successfully.
         Current function value: 0.168881
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.168679
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152041
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.168833
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.168897
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.154924
         Iterations 9
Optimization terminated successfully.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:21: FutureWarning: 'argmin' is deprecated, use 'idxmin' instead. The behavior of 'argmin'
will be corrected to return the positional minimum in the future.
Use 'series.values.argmin' to get the position of the minimum now.

         Current function value: 0.169073
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.169052
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.168642
         Iterations 9
Processed  9 models on 3 predictors in 0.07081055641174316
Selected predictors: ['Income', 'Education', 'CD Account', 'const']  AIC: 540.1423230958794
Optimization terminated successfully.
         Current function value: 0.152028
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.151411
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.148163
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152036
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.139352
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152015
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.151151
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.150075
         Iterations 9
Processed  8 models on 4 predictors in 0.057845115661621094
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'const']  AIC: 497.73316075623126
Optimization terminated successfully.
         Current function value: 0.138887
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.138758
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136599
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.138901
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.139349
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.138959
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.137180
         Iterations 9
Processed  7 models on 5 predictors in 0.053856849670410156
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'const']  AIC: 490.0954047541096
Optimization terminated successfully.
         Current function value: 0.136127
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135996
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136142
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136574
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135928
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.133263
         Iterations 9
Processed  6 models on 6 predictors in 0.056847572326660156
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'const']  AIC: 480.41892123708624
Optimization terminated successfully.
         Current function value: 0.132630
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132650
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132646
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.133238
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132361
         Iterations 9
Processed  5 models on 7 predictors in 0.03989291191101074
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'const']  AIC: 479.2643543252462
Optimization terminated successfully.
         Current function value: 0.131791
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131772
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131803
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132343
         Iterations 9
Processed  4 models on 8 predictors in 0.03989434242248535
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'const']  AIC: 479.2012205305657
Optimization terminated successfully.
         Current function value: 0.131062
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131077
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131771
         Iterations 9
Processed  3 models on 9 predictors in 0.02792525291442871
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'Age', 'const']  AIC: 478.7181848799073
Optimization terminated successfully.
         Current function value: 0.131061
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131057
         Iterations 9
Processed  2 models on 10 predictors in 0.02393651008605957
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'Age', 'Mortgage', 'const']  AIC: 480.6980587902294
Total elapsed time: 0.5485327243804932 seconds.


Backward_best_model = backward_model(X=train_x,y=train_y)
OUTPUT
Optimization terminated successfully.
         Current function value: 0.137663
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.134821
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131859
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131061
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.214795
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.142500
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131057
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.154241
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135440
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152443
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131753
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131072
         Iterations 9
Processed  11 models on 10 predictors in 0.12366890907287598
Selected predictors: ['Age', 'CCAvg', 'CD Account', 'CreditCard', 'Education', 'Family', 'Income', 'Mortgage', 'Online', 'Securities Account', 'const']  AIC: 480.6980587902294
Optimization terminated successfully.
         Current function value: 0.134824
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131862
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131062
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.215827
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.142665
         Iterations 9
Optimization terminated successfully.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:61: FutureWarning: 'argmin' is deprecated, use 'idxmin' instead. The behavior of 'argmin'
will be corrected to return the positional minimum in the future.
Use 'series.values.argmin' to get the position of the minimum now.

         Current function value: 0.155447
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135443
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152478
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131755
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131771
         Iterations 9
Processed  10 models on 9 predictors in 0.0967409610748291
Selected predictors: ['Age', 'CCAvg', 'CD Account', 'CreditCard', 'Education', 'Family', 'Income', 'Online', 'Securities Account', 'const']  AIC: 478.7181848799073
Optimization terminated successfully.
         Current function value: 0.134831
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131871
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.218281
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.142684
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.155797
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135444
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152482
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131791
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131772
         Iterations 9
Processed  9 models on 8 predictors in 0.07679510116577148
Selected predictors: ['CCAvg', 'CD Account', 'CreditCard', 'Education', 'Family', 'Income', 'Online', 'Securities Account', 'const']  AIC: 479.2012205305657
Total elapsed time: 0.3181488513946533 seconds.


Stepwise_best_model = Stepwise_model(X=train_x,y=train_y)
OUTPUT
Optimization terminated successfully.
         Current function value: 0.330076
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.329986
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.284217
         Iterations 7
Optimization terminated successfully.
         Current function value: 0.296731
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.330062
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.322763
         Iterations 7
Optimization terminated successfully.
         Current function value: 0.329995
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.327824
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.205738
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.324953
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.329912
         Iterations 6
Optimization terminated successfully.
         Current function value: 0.330059
         Iterations 6
Processed  11 models on 1 predictors in 0.06789159774780273
Selected predictors: ['Income', 'const']  AIC: 724.0825012461598
forward
Optimization terminated successfully.
         Current function value: 0.330076
         Iterations 6
Processed  1 models on 0 predictors in 0.008976459503173828
Selected predictors: ['const']  AIC: 1157.267296321307
Optimization terminated successfully.
         Current function value: 0.205431
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205682
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.185721
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205517
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.169107
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205563
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.182286
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205735
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205561
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.205167
         Iterations 8
Processed  10 models on 2 predictors in 0.07081007957458496
Selected predictors: ['Income', 'Education', 'const']  AIC: 597.8752580578658
forward
Optimization terminated successfully.
         Current function value: 0.205738
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.322763
         Iterations 7
Processed  2 models on 1 predictors in 0.017953157424926758
Selected predictors: ['Income', 'const']  AIC: 724.0825012461598
Optimization terminated successfully.
         Current function value: 0.168881
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.168679
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152041
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.168833
         Iterations 8
Optimization terminated successfully.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:21: FutureWarning: 'argmin' is deprecated, use 'idxmin' instead. The behavior of 'argmin'
will be corrected to return the positional minimum in the future.
Use 'series.values.argmin' to get the position of the minimum now.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:61: FutureWarning: 'argmin' is deprecated, use 'idxmin' instead. The behavior of 'argmin'
will be corrected to return the positional minimum in the future.
Use 'series.values.argmin' to get the position of the minimum now.

         Current function value: 0.168897
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.154924
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.169073
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.169052
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.168642
         Iterations 9
Processed  9 models on 3 predictors in 0.06981372833251953
Selected predictors: ['Income', 'Education', 'CD Account', 'const']  AIC: 540.1423230958794
forward
Optimization terminated successfully.
         Current function value: 0.169107
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.185721
         Iterations 8
Optimization terminated successfully.
         Current function value: 0.288940
         Iterations 7
Processed  3 models on 2 predictors in 0.02293872833251953
Selected predictors: ['Income', 'Education', 'const']  AIC: 597.8752580578658
Optimization terminated successfully.
         Current function value: 0.152028
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.151411
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.148163
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152036
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.139352
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152015
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.151151
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.150075
         Iterations 9
Processed  8 models on 4 predictors in 0.06681990623474121
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'const']  AIC: 497.73316075623126
forward
Optimization terminated successfully.
         Current function value: 0.152041
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.154924
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.164270
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.287431
         Iterations 7
Processed  4 models on 3 predictors in 0.04787254333496094
Selected predictors: ['Income', 'Education', 'CD Account', 'const']  AIC: 540.1423230958794
Optimization terminated successfully.
         Current function value: 0.138887
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.138758
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136599
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.138901
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.139349
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.138959
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.137180
         Iterations 9
Processed  7 models on 5 predictors in 0.06382942199707031
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'const']  AIC: 490.0954047541096
forward
Optimization terminated successfully.
         Current function value: 0.139352
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.148163
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.154854
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.160828
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.282426
         Iterations 7
Processed  5 models on 4 predictors in 0.04089093208312988
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'const']  AIC: 497.73316075623126
Optimization terminated successfully.
         Current function value: 0.136127
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135996
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136142
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136574
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135928
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.133263
         Iterations 9
Processed  6 models on 6 predictors in 0.06083846092224121
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'const']  AIC: 480.41892123708624
forward
Optimization terminated successfully.
         Current function value: 0.136599
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.137180
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.144927
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.154299
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.157364
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.273321
         Iterations 7
Processed  6 models on 5 predictors in 0.042886972427368164
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'const']  AIC: 490.0954047541096
Optimization terminated successfully.
         Current function value: 0.132630
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132650
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132646
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.133238
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132361
         Iterations 9
Processed  5 models on 7 predictors in 0.05186176300048828
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'const']  AIC: 479.2643543252462
forward
Optimization terminated successfully.
         Current function value: 0.133263
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135928
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136688
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.143335
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.154141
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.156593
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.271509
         Iterations 7
Processed  7 models on 6 predictors in 0.07081055641174316
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'const']  AIC: 480.41892123708624
Optimization terminated successfully.
         Current function value: 0.131791
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131772
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131803
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132343
         Iterations 9
Processed  4 models on 8 predictors in 0.03690147399902344
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'const']  AIC: 479.2012205305657
forward
Optimization terminated successfully.
         Current function value: 0.132361
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.132650
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135373
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.136112
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.142716
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.153670
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.156410
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.218291
         Iterations 8
Processed  8 models on 7 predictors in 0.07579731941223145
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'const']  AIC: 479.2643543252462
Optimization terminated successfully.
         Current function value: 0.131062
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131077
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131771
         Iterations 9
Processed  3 models on 9 predictors in 0.029920101165771484
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'Age', 'const']  AIC: 478.7181848799073
forward
Optimization terminated successfully.
         Current function value: 0.131772
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131791
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131871
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.134831
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135444
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.142684
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152482
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.155797
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.218281
         Iterations 8
Processed  9 models on 8 predictors in 0.07579827308654785
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'const']  AIC: 479.2012205305657
Optimization terminated successfully.
         Current function value: 0.131061
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131057
         Iterations 9
Processed  2 models on 10 predictors in 0.03091716766357422
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'Age', 'Mortgage', 'const']  AIC: 480.6980587902294
forward
Optimization terminated successfully.
         Current function value: 0.131062
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131771
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131755
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131862
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.134824
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135443
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.142665
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152478
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.155447
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.215827
         Iterations 8
Processed  10 models on 9 predictors in 0.08178138732910156
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'Age', 'const']  AIC: 478.7181848799073
backward
Optimization terminated successfully.
         Current function value: 0.131061
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131057
         Iterations 9
Processed  2 models on 10 predictors in 0.029919862747192383
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'Age', 'Mortgage', 'const']  AIC: 480.6980587902294
forward
Optimization terminated successfully.
         Current function value: 0.131062
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131771
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131755
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.131862
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.134824
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.135443
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.142665
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.152478
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.155447
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.215827
         Iterations 8
Processed  10 models on 9 predictors in 0.08776473999023438
Selected predictors: ['Income', 'Education', 'CD Account', 'Family', 'CreditCard', 'Securities Account', 'Online', 'CCAvg', 'Age', 'const']  AIC: 478.7181848799073
backward
Total elapsed time: 1.2626218795776367 seconds.


Model performance(1)
print(acc(cfmat_full))
print(acc(cfmat_forward))
print(acc(cfmat_backward))
print(acc(cfmat_stepwise))
0.944
0.944
0.944
0.944

Model performance(2)
fpr, tpr, thresholds = metrics.roc_curve(test_y, pred_y_full, pos_label=1)
# Print ROC curve
plt.plot(fpr,tpr)
# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)
AUC: 0.9465467667547905

다운로드 (1)


fpr, tpr, thresholds = metrics.roc_curve(test_y, pred_y_forward, pos_label=1)
# Print ROC curve
plt.plot(fpr,tpr)
# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)
AUC: 0.9465467667547905

다운로드 (2)


fpr, tpr, thresholds = metrics.roc_curve(test_y, pred_y_backward, pos_label=1)
# Print ROC curve
plt.plot(fpr,tpr)
# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)
AUC: 0.9465467667547905

다운로드 (3)


fpr, tpr, thresholds = metrics.roc_curve(test_y, pred_y_stepwise, pos_label=1)
# Print ROC curve
plt.plot(fpr,tpr)
# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)
AUC: 0.9465467667547905

다운로드 (4)


###성능면에서는 네 모델이 큰 차이가 없음
print(len(Forward_best_model.model.exog_names))
print(len(Backward_best_model.model.exog_names))
print(len(Stepwise_best_model.model.exog_names))
10
10
10





Regression with sklearn

from sklearn import datasets
from sklearn import model_selection
from sklearn import linear_model 
import matplotlib.pyplot as plt 
import numpy as np

X_all, y_all = datasets.make_regression(n_samples=50, n_features=50, n_informative=10)
X_train, X_test, y_train, y_test = model_selection.train_test_split(X_all, y_all, train_size=0.5)
model = linear_model.LinearRegression()
model.fit(X_train, y_train)

def sse(resid):
    return np.sum(resid**2) 
    
resid_train = y_train - model.predict(X_train) 
sse_train = sse(resid_train)   
sse_train

resid_test = y_test - model.predict(X_test)  
sse_test = sse(resid_test)   
sse_test 

# R-squared score 
model.score(X_train, y_train) 
model.score(X_test, y_test) 

def plot_residuals_and_coeff(resid_train, resid_test, coeff): 
    fig, axes = plt.subplots(1, 3, figsize=(12, 3))  
    axes[0].bar(np.arange(len(resid_train)), resid_train) 
    axes[0].set_xlabel("sample number")  
    axes[0].set_ylabel("residual")    
    axes[0].set_title("training data")   
    axes[1].bar(np.arange(len(resid_test)), resid_test) 
    axes[1].set_xlabel("sample number")  
    axes[1].set_ylabel("residual")   
    axes[1].set_title("testing data")  
    axes[2].bar(np.arange(len(coeff)), coeff)  
    axes[2].set_xlabel("coefficient number")
    axes[2].set_ylabel("coefficient")   
    fig.tight_layout()   
    return fig, axes
    
fig, ax = plot_residuals_and_coeff(resid_train, resid_test,  model.coef_)

Figure_1





Regression with tensorflow





Regression with pytorch


List of posts followed by this article


Reference