📄 Need a professional CV? Try our Resume Builder! Get Started

Day 8: Correlation Analysis: Theory to Practice

Understanding Statistical Relationships Through Real-World Applications

January 7, 2025

1. Pearson Correlation Coefficient

Mathematical Foundation

r = Σ((x - μx)(y - μy)) / (σx σy)

Key Properties:

  • Range: -1 to +1
  • -1: Perfect negative correlation
  • 0: No linear correlation
  • +1: Perfect positive correlation

Real-World Applications

1. Economic Indicators

GDP vs. Employment Rate

  • Strong positive correlation (+0.82)
  • As GDP increases, employment typically rises
  • Used in economic forecasting

2. Medical Research

Blood Pressure vs. Age

  • Moderate positive correlation (+0.65)
  • Blood pressure tends to increase with age
  • Helps in preventive healthcare

3. Environmental Studies

Temperature vs. Ice Cream Sales

  • Strong positive correlation (+0.95)
  • Higher temperatures lead to increased sales
  • Used in inventory management

2. Spearman Rank Correlation

Mathematical Foundation

ρ = 1 - (6Σd²) / (n(n² - 1))

Advantages:

  • Works with non-normal distributions
  • Handles non-linear relationships
  • Resistant to outliers
  • Suitable for ordinal data

Real-World Applications

1. Education

Study Time vs. Test Rankings

  • Strong positive correlation (+0.78)
  • More study time generally leads to better rankings
  • Used in academic counseling

2. Sports Analytics

Player Ranking vs. Salary

  • Moderate positive correlation (+0.72)
  • Higher-ranked players tend to earn more
  • Used in contract negotiations

3. Customer Satisfaction

Service Quality vs. Customer Loyalty

  • Strong positive correlation (+0.85)
  • Better service leads to increased loyalty
  • Used in business strategy

When to Use Which?

Scenario Best Choice
Linear relationship expected Pearson
Ranked data Spearman
Outliers present Spearman
Non-normal distribution Spearman

Code Implementation

Sample implementation:

import numpy as np
from scipy import stats
import pandas as pd

# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

# Pearson correlation
pearson_corr, _ = stats.pearsonr(x, y)
print(f"Pearson correlation: {pearson_corr:.2f}")

# Spearman correlation
spearman_corr, _ = stats.spearmanr(x, y)
print(f"Spearman correlation: {spearman_corr:.2f}")