There are no items in your cart
Add More
Add More
Item Details | Price |
---|
Learn how Naive Bayes handles features like Age or Salary using the Bell Curve.
In Part 1, we saw how the Naive Bayes classifier uses probabilities based on feature frequencies (like counting words) to classify data. But what happens when our input features aren't categories, but continuous numbers like 'Age', 'Salary', or 'Temperature'?
We can't simply count frequencies for every possible number! We need a different way to estimate the likelihood P(Feature | Class). This is where Gaussian Naive Bayes (GNB) comes in. It's a specific type of Naive Bayes designed to work directly with continuous numerical features.
Main Technical Concept: Gaussian Naive Bayes is an extension of Naive Bayes that handles continuous features by assuming that the values of each feature, *for each class*, follow a Gaussian (Normal, or "bell curve") distribution.
The core idea behind GNB is simple but powerful:
For a given class (e.g., Class 'Yes'), it assumes that the continuous values of a specific feature (e.g., 'Age') are distributed according to a Gaussian (Normal) distribution. It makes the same assumption for Class 'No', but potentially with a different mean and standard deviation.
Image Credit: Inductiveload on Wikimedia Commons, CC BY-SA 3.0
Instead of counting frequencies, GNB calculates the likelihood using the Gaussian Probability Density Function (PDF). Here's the idea:
x
), plug this value, along with the calculated mean (μ) and variance (σ²) *for a given class*, into the Gaussian PDF formula to get the likelihood density P(x | Class).
This formula gives the likelihood density of observing value x
, given that the data for this class follows a Normal distribution with mean μ
and variance σ²
.
π
≈ 3.14159, e
≈ 2.71828
Note: This gives a *density*, not a direct probability (it can be > 1), but it works correctly within Bayes' theorem for comparison.
The algorithm calculates this likelihood density for every feature and every class.
The overall process for classifying a new data point `X = {x₁, x₂, ..., xn}` using Gaussian Naive Bayes is:
Score(C) = P(C) * P(x₁|C) * P(x₂|C) * ... * P(xn|C)
Essentially, it asks: "Based on the typical 'Age' and 'Salary' distributions we saw for people who *did* purchase (Class 1), and the distributions for those who *didn't* (Class 0), which class does this new person's 'Age' and 'Salary' fit better with, considering the overall likelihood of purchase?"
Scikit-learn makes using Gaussian Naive Bayes very easy with the `GaussianNB` classifier.
Let's predict whether a user purchased a product based on 'Age' and 'EstimatedSalary'.
# 1. Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
# 2. Load the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
# Select 'Age' and 'EstimatedSalary' as features, 'Purchased' as target
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
# 3. Split data into Training and Test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)
# 4. Feature Scaling (Important for visualization and sometimes GNB)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# 5. Fit Gaussian Naive Bayes model to the Training set
classifier = GaussianNB()
classifier.fit(X_train_scaled, y_train) # Learns mean & variance per feature per class
# 6. Predict Test set results
y_pred = classifier.predict(X_test_scaled)
# 7. Evaluate the results
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print('Confusion Matrix:\n', cm)
# Accuracy Score
acc = accuracy_score(y_test, y_pred)
print(f'\nAccuracy: {acc:.4f}')
# Classification Report (Precision, Recall, F1-Score)
report = classification_report(y_test, y_pred)
print('\nClassification Report:\n', report)
# 8. Visualize Confusion Matrix (Optional)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix - Gaussian Naive Bayes')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
# plt.show()
The code trains the GNB model, makes predictions, and then shows the confusion matrix and other metrics to evaluate how well it performed on the unseen test data.
Issue / Observation | Potential Cause & Solution | Best Practice |
---|---|---|
Model accuracy is low. | The Gaussian assumption might be strongly violated for some features; features might be highly correlated; insufficient data. Solution: Check feature distributions (histograms per class). Try transforming non-normal features (e.g., log transform). Consider other algorithms if assumptions don't hold. Check for feature correlation. |
Analyze feature distributions per class. Validate assumptions where possible. |
Why might feature scaling still be useful if GNB calculates mean/std per feature? | While GNB handles different scales mathematically via separate means/stds, scaling can sometimes help the numerical stability of the calculations, especially if ranges are vastly different. It also ensures visualizations (like decision boundaries) are not distorted. | Scaling continuous features is generally good practice, though its direct impact on GNB accuracy might be less than distance-based algorithms. |
Getting probability densities > 1 from the PDF. | This is mathematically possible and correct for a Probability *Density* Function (PDF), especially if the variance (σ²) is very small. The *area* under the PDF curve always integrates to 1. Solution: No fix needed, just understand it represents density, not a direct probability for a single point. It works correctly within the relative comparisons of Bayes' Theorem. |
Distinguish between Probability Density (PDF value) and Probability (area under curve). |
Model performs poorly due to correlated features. | The core "naive" independence assumption is violated. Solution: Consider feature selection to remove highly correlated features. Use dimensionality reduction (like PCA) *before* GNB (but interpretation becomes harder). Try models that handle correlations better (e.g., Logistic Regression, SVM, Trees). |
Check feature correlations during EDA. Be aware of the algorithm's assumptions. |
GaussianNB
.Interview Question
Question 1: What is the core assumption that Gaussian Naive Bayes makes about continuous features?
It assumes that the values of each continuous feature, *within each class*, are distributed according to a Gaussian (Normal) distribution.
Question 2: How does Gaussian Naive Bayes calculate the likelihood term P(feature | Class) for a continuous feature?
It calculates the mean (μ) and variance (σ²) of that feature for all training samples belonging to the specific class. Then, it plugs the new data point's feature value (x), along with the calculated μ and σ² for that class, into the Gaussian Probability Density Function (PDF) formula.
Interview Question
Question 3: Why is the "naive" independence assumption still relevant even when using Gaussian Naive Bayes with continuous features?
Because even after calculating the individual likelihoods P(xᵢ|C) for each feature using the Gaussian PDF, the algorithm still combines these likelihoods by *multiplying* them together (along with the prior P(C)) to get the overall score for a class. This multiplication step relies on the assumption that the features x₁, x₂, etc., are conditionally independent given the class C.
Question 4: Is feature scaling (like Standardization) strictly necessary for Gaussian Naive Bayes to work? Why might it still be beneficial?
Strictly speaking, GNB can handle features on different scales because it calculates separate means and variances for each. However, scaling is still often beneficial for:
1. Numerical Stability: It can prevent issues with very large or very small numbers during PDF calculations, especially if variances are tiny.
2. Assumption Validity: Standardizing features makes them closer to a standard normal distribution (mean=0, std=1), which might align better with the Gaussian assumption in some cases.
3. Visualization: Helps when plotting decision boundaries or comparing feature influences.
Interview Question
Question 5: If your continuous features are clearly not normally distributed (e.g., very skewed), what might happen if you apply Gaussian Naive Bayes, and what could you do?
If the Gaussian assumption is strongly violated, the likelihood estimates calculated using the Gaussian PDF will be inaccurate, potentially leading to poor classification performance.
What to do:
1. Try transforming the skewed features to make them more bell-shaped (e.g., using log transform, Box-Cox transform) *before* applying GNB.
2. Consider discretizing the continuous features into bins and using Multinomial Naive Bayes instead.
3. Try a different classification algorithm that doesn't make the Gaussian assumption (e.g., Decision Trees, KNN, SVM).