Import Data: Bank
## age job marital education default housing loan contact month
## 1 56 housemaid married basic.4y no no no telephone may
## 2 57 services married high.school unknown no no telephone may
## 3 37 services married high.school no yes no telephone may
## 4 40 admin. married basic.6y no no no telephone may
## 5 56 services married high.school no no yes telephone may
## 6 45 services married basic.9y unknown no no telephone may
## day_of_week duration campaign pdays previous poutcome emp.var.rate
## 1 mon 261 1 999 0 nonexistent 1.1
## 2 mon 149 1 999 0 nonexistent 1.1
## 3 mon 226 1 999 0 nonexistent 1.1
## 4 mon 151 1 999 0 nonexistent 1.1
## 5 mon 307 1 999 0 nonexistent 1.1
## 6 mon 198 1 999 0 nonexistent 1.1
## cons.price.idx cons.conf.idx euribor3m nr.employed y
## 1 93.994 -36.4 4.857 5191 no
## 2 93.994 -36.4 4.857 5191 no
## 3 93.994 -36.4 4.857 5191 no
## 4 93.994 -36.4 4.857 5191 no
## 5 93.994 -36.4 4.857 5191 no
## 6 93.994 -36.4 4.857 5191 no
Attribute Information:
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed.
##Input variables:
bank client data:
1 - age (numeric)
2 - job : type of job (categorical:‘admin.’,‘blue-collar’,‘entrepreneur’,‘housemaid’,‘management’,‘retired’,‘self-employed’,‘services’,‘student’,‘technician’,‘unemployed’,‘unknown’)
3 - marital : marital status (categorical: ‘divorced’,‘married’,‘single’,‘unknown’; note: ‘divorced’ means divorced or widowed)
4 - education (categorical:‘basic.4y’,‘basic.6y’,‘basic.9y’,‘high.school’,‘illiterate’,‘professional.course’,‘university.degree’,‘unknown’)
5 - default: has credit in default? (categorical: ‘no’,‘yes’,‘unknown’)
6 - housing: has housing loan? (categorical: ‘no’,‘yes’,‘unknown’)
7 - loan: has personal loan? (categorical: ‘no’,‘yes’,‘unknown’)
other attributes:
12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
14 - previous: number of contacts performed before this campaign and for this client (numeric)
15 - poutcome: outcome of the previous marketing campaign (categorical: ‘failure’,‘nonexistent’,‘success’)
Logistic regression model
In the present logistic regression model, we model y with the following explanatory variables: age, euribor3m, and job. We want to see that whether these variables are significant in the logistic regression model for client subscription for a term deposit.
##
## Call:
## glm(formula = y ~ age + euribor3m + job, family = binomial, data = bank)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1164 -0.4270 -0.3166 -0.2510 2.7058
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.567098 0.077203 -7.346 2.05e-13 ***
## age 0.004517 0.001777 2.542 0.01103 *
## euribor3m -0.513701 0.009676 -53.092 < 2e-16 ***
## jobblue-collar -0.612790 0.052270 -11.724 < 2e-16 ***
## jobentrepreneur -0.349384 0.102342 -3.414 0.00064 ***
## jobhousemaid -0.094884 0.113859 -0.833 0.40465
## jobmanagement -0.139853 0.069032 -2.026 0.04277 *
## jobretired 0.357183 0.083333 4.286 1.82e-05 ***
## jobself-employed -0.172366 0.095540 -1.804 0.07121 .
## jobservices -0.459570 0.067427 -6.816 9.37e-12 ***
## jobstudent 0.501106 0.085254 5.878 4.16e-09 ***
## jobtechnician -0.062267 0.051398 -1.211 0.22572
## jobunemployed 0.062872 0.100059 0.628 0.52977
## jobunknown -0.002085 0.188139 -0.011 0.99116
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 28999 on 41187 degrees of freedom
## Residual deviance: 24983 on 41174 degrees of freedom
## AIC: 25011
##
## Number of Fisher Scoring iterations: 5
As can be seen, euribor3m and some types of jobs are very significant. Blue-collar, entrepreneur, housemaid, retired, services, and students are the most significant jobs.Only the self-employed one is not significant that means that they would not subscribe for bank term deposit.
%# Confusion matrix
%{r message = FALSE, echo = FALSE, warning = FALSE} %actual_response <- bank$y %predicted_response <- round(fitted(model)) %outcomes <- table(predicted_response, actual_response) %confusion <- conf_mat(outcomes) %autoplot(confusion) %summary(confusion, event_level = "second") %
#Prediction
In this section, we provide the prediction model. According to the results, 4640 out of 41188 would subscribe for the bank term deposit. Also, the mean is almost about 0.89 which is perfect. Lets make training and test subsets to evaluate the model.
## y
## prediction no yes
## no 36548 4640
## [1] 0.8873458
Make training, test set and refit the model
We splits the data to duration < 300 for train, and duration > 300 for test subset. Running the training model, we can see that the effect of age variable is more significant that the previous one. Only the job retired is not significant in this training data. in the following, you can find the prediction model.
##
## Call:
## glm(formula = y ~ age + euribor3m + job, family = binomial, data = bank,
## subset = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0924 -0.3086 -0.0603 -0.0400 3.9232
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.33408 0.13619 -2.453 0.01417 *
## age 0.00795 0.00302 2.633 0.00847 **
## euribor3m -1.27073 0.04344 -29.250 < 2e-16 ***
## jobblue-collar -1.26040 0.11278 -11.176 < 2e-16 ***
## jobentrepreneur -0.88925 0.22441 -3.963 7.41e-05 ***
## jobhousemaid -0.07000 0.20162 -0.347 0.72844
## jobmanagement -0.17706 0.11774 -1.504 0.13261
## jobretired 0.25417 0.13737 1.850 0.06428 .
## jobself-employed -0.20221 0.16456 -1.229 0.21915
## jobservices -0.93433 0.13886 -6.728 1.71e-11 ***
## jobstudent 0.56269 0.12175 4.622 3.80e-06 ***
## jobtechnician -0.09595 0.08891 -1.079 0.28051
## jobunemployed 0.17392 0.15801 1.101 0.27102
## jobunknown 0.15243 0.29950 0.509 0.61079
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 11957.5 on 29937 degrees of freedom
## Residual deviance: 8061.3 on 29924 degrees of freedom
## AIC: 8089.3
##
## Number of Fisher Scoring iterations: 8
Predict
Results of prediction model show that 3131 of clients would subscribe for the bank term deposit which is lower than the previous one. Although the mean of the model is less than the previous model, 0.72 < 0.89, it still satisfying and large enough.
## y.300
## prediction no yes
## no 8119 3131
## [1] 0.7216889
social and economic context attributes
16 - emp.var.rate: employment variation rate - quarterly indicator (numeric)
17 - cons.price.idx: consumer price index - monthly indicator (numeric)
18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric)
19 - euribor3m: euribor 3 month rate - daily indicator (numeric)
20 - nr.employed: number of employees - quarterly indicator (numeric)
Output variable (desired target): 21 - y - has the client subscribed a term deposit? (binary: ‘yes’,‘no’)