- What are the Business Objective(s) and Data Mining Objective(s) for the case?
To develop a proactive retention program (including incentive plans) to reduce the customer churn
Data Mining Objectives
- To predict churn accurately
- To identify key factors that drive customer churn
Run at 3 models on SPSS – Logistic Regression, Discrimination and Cluster
Initial Data Preparation
- Partitioning the data
The data needs to be partitioned into training set and validation set for enabling this we have a “Calibrate” variable which is binary and has same value for 41,000 data points and opposite value for the remaining data points, this enables the partitioning of data. We have to set Calibrate to “segment” in the data set
The Partitioning result
Running different Predictive models
After the data partition step we can apply different predictive models like decision trees, logistic regression and neural network to come up with churn prediction, logistic regression and neural networks may require transformation of variable for good modeling.
3 Interpret and explain the output of the various models.
What should be the assessment criteria for the models? What are the transformations of variables that you tried and what were the impact of the same on the results/ predictive performance of the model(s)? We have used cumulative lift as the assessment criteria to identify the best model. Transformations and explanation of output of each of the models are mentioned in the above section (below each model) i.e. answer to the question 3. 5. Demonstrate the predictive performance of the model and describe your predictive churn model. What was your final model/ Data Mining technique and why? We use model comparison module to select the best model out of the 3- models, Model selection based on Train and lift. Based on the results of the model comparison module, we select best model. If we set the model selection criteria as lift and Validation then, Logistic Regression with Transformation is selected as the best model.
From Logistic Regression models, identify the significant variables/ key factors that predict customer churn? Calculate the LVC and Max Incentive cost for customers with monthly revenues of $30, $50, $70, $90, $110, $130, $150? (Hint: profit=0) The significant variables identified using the decision tree model and the logistic regression model are as follows: EQPDAYS: Number of days of the current equipment (mobile phone) When the customer is about to change his equipment, he is more likely to look at various other service operators to look for attractive offers. There for EQPDAYS is an important variable to predict churn. MOU: Minutes of Usage
Customers who use the services very less frequently are more likely to churn than the ones who use it extensively. MONTHS: Months in Service
Logically customers who have used the services longer should be more loyal than the ones who have used the services less. CHANGER: Percentage of change in revenues
The data shows that if the percentage change in revenues is greater than 4.15% then the churning probability increases. RETCALL: Call has been made to retention team
CHANGEM: Percentage change in minutes of use
Churn is negatively related to CHANGEM. Therefore if percentage change in minutes of use is negative then the churn probability increases.
Monthly Revenue| 30| 50| 70| 90| 110| 130| 150|
LVC| 38.82353| 64.70588| 90.58824| 116.4706| 142.3529| 168.2353| 194.1176| Max Incentive cost| 10.05529| 16.75882| 23.46235| 30.16588| 36.86941| 43.57294| 50.27647|
4.What are the key factors that predict customer churn? Do these factors make business sense? Why or why not? Which of the significant variables/ factors are ACTIONABLE (hint: from the point of view of conversion into incentives) and why? What would you call the variables which were not actionable? How would you use them in churn management / CRM program? The important factors that affect churn from the models have been mentioned in the previous question. EQPDAYS: