Project Overview

As sustainability and green transportation gain momentum globally, the demand for bicycles is surging. This project focuses on analysing bike sales data to uncover key trends, customer behaviour patterns, and geographic insights that could inform business strategy and marketing decisions.

Using real-world data from Capital Bikeshare (Washington, D.C., USA), the analysis explores how bicycle companies can leverage data science to optimise their operations and adapt to shifting consumer and environmental trends.

🎯 Project Objectives

Analyse historical bike sales data to identify temporal and regional trends.
Understand how sales vary by product category, location, and customer demographics.
Investigate the statistical significance of differences in sales between key markets.
Translate findings into actionable insights for improved sales strategy and market targeting.

🧰 Tools & Technologies

Python (Pandas, NumPy, Seaborn, Matplotlib) – Data analysis and visualisation
Jupyter Notebook – Analysis workflow and reporting
SciPy (t-test) – Statistical validation

🔍 My Approach

1. Data Acquisition & Preparation

Sourced bike sales data from the Capital Bikeshare system.
Loaded data into a Pandas DataFrame.
Performed data cleaning:

Removed duplicates
Handled missing values
Standardised column formats

2. Exploratory Data Analysis (EDA)

Conducted an in-depth analysis to identify trends and patterns:

Sales by Year: Tracked total sales year-over-year with bar and pie charts.
Sales by Geography: Explored sales across countries and U.S. states to detect regional demand spikes.
Product Category Trends: Analysed the most popular bike types (e.g., road bikes, mountain bikes).
Customer Demographics: Investigated how factors like age and location influence buying patterns.
Profit & Sales Correlation: Explored how profits vary across different regions and demographic segments.

3. Statistical Analysis

Performed hypothesis testing to support findings:

T-Test: Compared sales between two major markets — the United States vs. France — to determine if observed differences were statistically significant.
Result: The p-value from the test revealed a significant difference in sales, confirming regional disparity in product demand.

📊 Key Visualizations

Time-series plots showing monthly and yearly sales trends.
Geographic heat maps highlighting top-performing regions.
Box plots comparing profit margins by country.
Scatter plots showing customer age vs. purchase frequency.

📌 Key Deliverables

Cleaned and well-documented dataset
Jupyter notebook detailing the full analysis workflow
EDA and statistical visualisations
Summary report outlining key insights and strategic recommendations

🌟 Project Highlights

Identified key growth markets for bike sales, enabling targeted marketing.
Provided evidence-based recommendations on which product lines to focus on in different regions.
Demonstrated how statistical testing can validate business hypotheses in retail analytics.

💡 What I Learned

This project helped me strengthen my Python-based data analysis workflow from data wrangling to statistical testing. I also gained practical experience in:

Communicating technical insights to a business audience
Using visual storytelling to drive understanding
Applying hypothesis testing in a real-world business context

Python Analysis of Sales Trends