6 min read

Assignment 02_Business Analytics

Import Data: Bank

##   age       job marital   education default housing loan   contact month
## 1  56 housemaid married    basic.4y      no      no   no telephone   may
## 2  57  services married high.school unknown      no   no telephone   may
## 3  37  services married high.school      no     yes   no telephone   may
## 4  40    admin. married    basic.6y      no      no   no telephone   may
## 5  56  services married high.school      no      no  yes telephone   may
## 6  45  services married    basic.9y unknown      no   no telephone   may
##   day_of_week duration campaign pdays previous    poutcome emp.var.rate
## 1         mon      261        1   999        0 nonexistent          1.1
## 2         mon      149        1   999        0 nonexistent          1.1
## 3         mon      226        1   999        0 nonexistent          1.1
## 4         mon      151        1   999        0 nonexistent          1.1
## 5         mon      307        1   999        0 nonexistent          1.1
## 6         mon      198        1   999        0 nonexistent          1.1
##   cons.price.idx cons.conf.idx euribor3m nr.employed  y
## 1         93.994         -36.4     4.857        5191 no
## 2         93.994         -36.4     4.857        5191 no
## 3         93.994         -36.4     4.857        5191 no
## 4         93.994         -36.4     4.857        5191 no
## 5         93.994         -36.4     4.857        5191 no
## 6         93.994         -36.4     4.857        5191 no

Attribute Information:

The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed.

##Input variables:

bank client data:

1 - age (numeric)

2 - job : type of job (categorical:‘admin.’,‘blue-collar’,‘entrepreneur’,‘housemaid’,‘management’,‘retired’,‘self-employed’,‘services’,‘student’,‘technician’,‘unemployed’,‘unknown’)

3 - marital : marital status (categorical: ‘divorced’,‘married’,‘single’,‘unknown’; note: ‘divorced’ means divorced or widowed)

4 - education (categorical:‘basic.4y’,‘basic.6y’,‘basic.9y’,‘high.school’,‘illiterate’,‘professional.course’,‘university.degree’,‘unknown’)

5 - default: has credit in default? (categorical: ‘no’,‘yes’,‘unknown’)

6 - housing: has housing loan? (categorical: ‘no’,‘yes’,‘unknown’)

7 - loan: has personal loan? (categorical: ‘no’,‘yes’,‘unknown’)

other attributes:

12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)

13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)

14 - previous: number of contacts performed before this campaign and for this client (numeric)

15 - poutcome: outcome of the previous marketing campaign (categorical: ‘failure’,‘nonexistent’,‘success’)

social and economic context attributes

16 - emp.var.rate: employment variation rate - quarterly indicator (numeric)

17 - cons.price.idx: consumer price index - monthly indicator (numeric)

18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric)

19 - euribor3m: euribor 3 month rate - daily indicator (numeric)

20 - nr.employed: number of employees - quarterly indicator (numeric)

Output variable (desired target): 21 - y - has the client subscribed a term deposit? (binary: ‘yes’,‘no’)

Logistic regression model

In the present logistic regression model, we model y with the following explanatory variables: age, euribor3m, and job. We want to see that whether these variables are significant in the logistic regression model for client subscription for a term deposit.

## 
## Call:
## glm(formula = y ~ age + euribor3m + job, family = binomial, data = bank)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1164  -0.4270  -0.3166  -0.2510   2.7058  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -0.567098   0.077203  -7.346 2.05e-13 ***
## age               0.004517   0.001777   2.542  0.01103 *  
## euribor3m        -0.513701   0.009676 -53.092  < 2e-16 ***
## jobblue-collar   -0.612790   0.052270 -11.724  < 2e-16 ***
## jobentrepreneur  -0.349384   0.102342  -3.414  0.00064 ***
## jobhousemaid     -0.094884   0.113859  -0.833  0.40465    
## jobmanagement    -0.139853   0.069032  -2.026  0.04277 *  
## jobretired        0.357183   0.083333   4.286 1.82e-05 ***
## jobself-employed -0.172366   0.095540  -1.804  0.07121 .  
## jobservices      -0.459570   0.067427  -6.816 9.37e-12 ***
## jobstudent        0.501106   0.085254   5.878 4.16e-09 ***
## jobtechnician    -0.062267   0.051398  -1.211  0.22572    
## jobunemployed     0.062872   0.100059   0.628  0.52977    
## jobunknown       -0.002085   0.188139  -0.011  0.99116    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 28999  on 41187  degrees of freedom
## Residual deviance: 24983  on 41174  degrees of freedom
## AIC: 25011
## 
## Number of Fisher Scoring iterations: 5

As can be seen, euribor3m and some types of jobs are very significant. Blue-collar, entrepreneur, housemaid, retired, services, and students are the most significant jobs.Only the self-employed one is not significant that means that they would not subscribe for bank term deposit.

%# Confusion matrix %{r message = FALSE, echo = FALSE, warning = FALSE} %actual_response <- bank$y %predicted_response <- round(fitted(model)) %outcomes <- table(predicted_response, actual_response) %confusion <- conf_mat(outcomes) %autoplot(confusion) %summary(confusion, event_level = "second") %

#Prediction

In this section, we provide the prediction model. According to the results, 4640 out of 41188 would subscribe for the bank term deposit. Also, the mean is almost about 0.89 which is perfect. Lets make training and test subsets to evaluate the model.

##           y
## prediction    no   yes
##         no 36548  4640
## [1] 0.8873458

Make training, test set and refit the model

We splits the data to duration < 300 for train, and duration > 300 for test subset. Running the training model, we can see that the effect of age variable is more significant that the previous one. Only the job retired is not significant in this training data. in the following, you can find the prediction model.

## 
## Call:
## glm(formula = y ~ age + euribor3m + job, family = binomial, data = bank, 
##     subset = train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0924  -0.3086  -0.0603  -0.0400   3.9232  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -0.33408    0.13619  -2.453  0.01417 *  
## age               0.00795    0.00302   2.633  0.00847 ** 
## euribor3m        -1.27073    0.04344 -29.250  < 2e-16 ***
## jobblue-collar   -1.26040    0.11278 -11.176  < 2e-16 ***
## jobentrepreneur  -0.88925    0.22441  -3.963 7.41e-05 ***
## jobhousemaid     -0.07000    0.20162  -0.347  0.72844    
## jobmanagement    -0.17706    0.11774  -1.504  0.13261    
## jobretired        0.25417    0.13737   1.850  0.06428 .  
## jobself-employed -0.20221    0.16456  -1.229  0.21915    
## jobservices      -0.93433    0.13886  -6.728 1.71e-11 ***
## jobstudent        0.56269    0.12175   4.622 3.80e-06 ***
## jobtechnician    -0.09595    0.08891  -1.079  0.28051    
## jobunemployed     0.17392    0.15801   1.101  0.27102    
## jobunknown        0.15243    0.29950   0.509  0.61079    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 11957.5  on 29937  degrees of freedom
## Residual deviance:  8061.3  on 29924  degrees of freedom
## AIC: 8089.3
## 
## Number of Fisher Scoring iterations: 8

Predict

Results of prediction model show that 3131 of clients would subscribe for the bank term deposit which is lower than the previous one. Although the mean of the model is less than the previous model, 0.72 < 0.89, it still satisfying and large enough.

##           y.300
## prediction   no  yes
##         no 8119 3131
## [1] 0.7216889