Oleh Sekirkin

Oleh Sekirkin

Business Professional & Data Enthusiast

Philadelphia, Pennsylvania, United States

Deep Marketing Data Analysis

Python Marketing Data Data Analysis

1. The Basics
2. Client Behavior Analysis
3. Stratified Sampling
4. ROC Curve
5. Conclusion and Recommendations

1. The Basics

Let's begin by explaining what marketing is. Paraphrasing the marketing genius Dr. Philip Kotler, marketing is defined as "the science and art of exploring, creating, and delivering value to satisfy the needs of a target market at a profit. Marketing identifies unfulfilled needs and desires. It defines, measures, and quantifies the size of the identified market and the profit potential."

Marketing campaigns, on the other hand, consist of a set of actions and efforts aimed at reaching customers and achieving specific objectives. They are useful for achieving goals such as launching a product, increasing sales volume, enhancing brand exposure, and more.

For this project, we will use a dataset from the UCI Machine Learning Repository.

Before diving in, let's briefly define what a "term deposit" is. A term deposit is "a fixed-term investment that includes the deposit of money into an account at a financial institution. Term deposit investments usually carry short-term maturities ranging from one month to a few years and will have varying levels of required minimum deposits." (defined by Investopedia.com)

Basic Statistics Overview
Basic Statistics
Distribution Analysis
Distribution Analysis

If we continue with our analysis:

Number of Occupations: Management is the occupation that is more prevalent in this dataset.

Age by Occupation: As expected, the retired are the ones who have the highest median age while student are the lowest.

Balance by Occupation: Management and Retirees are the ones who have the highest balance in their accounts.

Occupation Analysis
Occupation Analysis
In what industries are our clients depending on their age
Balance Distribution
How high is their balance depending on job occupation
Age Distribution
How do loans impact people, depending on their marital status and education
Occupation Distribution
Some people have a deposit with us, others don't. What are our clients and not clients jobs?
Campaign Analysis
2. Client Behavior Analysis

An analysis of the campaign duration data reveals significant insights into client behavior and term deposit acquisition patterns. The data demonstrates a strong positive correlation between the duration of client interactions and the likelihood of term deposit openings.

The average campaign duration across all interactions stands at 374.76 units. This benchmark provides a useful threshold for segmenting client interactions and analyzing their outcomes. When examining the data through this lens, a clear pattern emerges: clients whose interactions exceeded the average duration showed a markedly higher propensity to open term deposits.

Specifically, within the cohort of clients whose interaction durations surpassed the mean, 78% proceeded to open term deposit accounts. In contrast, among those with below-average interaction durations, only 32% opted for term deposits.

3. Stratified Sampling

Stratified sampling is a crucial yet often overlooked technique in model development, applicable to both regression and classification tasks. This method ensures that key features influencing the target variable are proportionally represented in both training and test datasets, thereby enhancing model reliability and generalizability.

Before implementing stratified sampling, a thorough analysis of the data distribution is essential. In this case, we've observed that the 'loan' column shows a distinct imbalance:

  • 87% of clients do not have personal loans
  • 13% do have personal loans

The process of implementing stratified sampling involves three key steps:

  1. Analyze the distribution of the influential feature (personal loans) in the entire dataset.
  2. Identify the proportions of each category within this feature (87% without loans, 13% with loans).
  3. Ensure that these proportions are maintained when splitting the data into training and test sets.
Confusion Matrix Analysis
Confusion Matrix

To mitigate overfitting, cross-validation is implemented with the following characteristics:

  • Training Set: Two-thirds (66%) of the data
  • Testing Set: One-third (33%) of the data
  • Process is repeated three times
  • Ensures each subset is used for both training and testing

Gradient Boost Classifier accuracy: 0.85

Precision score: 0.8244135732179458

Recall score: 0.8553875236294896

4. ROC Curve Analysis
ROC Curve Comparison
ROC Curves
Model Performance Metrics
Model Metrics

Model Performance Scores:

  • Gradient Boost Classifier Score: 0.9173128596743366
  • Neural Classifier Score: 0.9167698643666292
  • Naives Bayes Classifier Score: 0.803363959942255
Feature Importance Analysis
Feature Importance
5. Conclusion and Recommendations

1. Months of Marketing Activity

The highest level of marketing activity occurred in May, but this month also had the lowest effective subscription rate (-34.49%). For future campaigns:

  • Focus efforts on March, September, October, and December
  • Investigate December's low marketing activity as a potential opportunity

2. Seasonality Patterns

Potential clients show higher tendency to subscribe during fall and winter seasons. Align future marketing campaigns with these seasonal preferences.

3. Campaign Calls Strategy

Implement a three-call limit policy per potential client:

  • Excessive calls increase rejection likelihood
  • Focus on quality of interaction over quantity
  • Streamline contact strategies for better efficiency

4. Age Category Targeting

20s and younger: 60% subscription rate
60 and older: 76% subscription rate

5. Occupation Focus

Primary target segments:

  • Students: High subscription potential
  • Retired individuals: Interested in earning interest on savings

6. Financial Profile Analysis

Lower Balances: More likely to have house loans, limited term deposit capacity
Average/High Balances: Less likely to have house loans, higher term deposit potential

7. Call Strategy Enhancement

Develop engaging questionnaire to extend conversation duration:

  • Positive correlation between call duration and subscription likelihood
  • Focus on meaningful engagement
  • Design questions that encourage detailed responses

8. Duration-Based Targeting

Key Finding: Calls exceeding 375 seconds show 78% term deposit subscription rate

  • Prioritize longer conversations
  • Train staff in engagement techniques
  • Monitor and optimize call duration metrics