Project Overview
As sustainability and green transportation gain momentum globally, the demand for bicycles is surging. This project focuses on analysing bike sales data to uncover key trends, customer behaviour patterns, and geographic insights that could inform business strategy and marketing decisions.
Using real-world data from Capital Bikeshare (Washington, D.C., USA), the analysis explores how bicycle companies can leverage data science to optimise their operations and adapt to shifting consumer and environmental trends.
🎯 Project Objectives
- Analyse historical bike sales data to identify temporal and regional trends.
- Understand how sales vary by product category, location, and customer demographics.
- Investigate the statistical significance of differences in sales between key markets.
- Translate findings into actionable insights for improved sales strategy and market targeting.
🧰 Tools & Technologies
- Python (Pandas, NumPy, Seaborn, Matplotlib) – Data analysis and visualisation
- Jupyter Notebook – Analysis workflow and reporting
- SciPy (t-test) – Statistical validation
🔍 My Approach
1. Data Acquisition & Preparation
- Sourced bike sales data from the Capital Bikeshare system.
- Loaded data into a Pandas DataFrame.
- Performed data cleaning:
- Removed duplicates
- Handled missing values
- Standardised column formats
2. Exploratory Data Analysis (EDA)
Conducted an in-depth analysis to identify trends and patterns:
- Sales by Year: Tracked total sales year-over-year with bar and pie charts.
- Sales by Geography: Explored sales across countries and U.S. states to detect regional demand spikes.
- Product Category Trends: Analysed the most popular bike types (e.g., road bikes, mountain bikes).
- Customer Demographics: Investigated how factors like age and location influence buying patterns.
- Profit & Sales Correlation: Explored how profits vary across different regions and demographic segments.
3. Statistical Analysis
Performed hypothesis testing to support findings:
- T-Test: Compared sales between two major markets — the United States vs. France — to determine if observed differences were statistically significant.
- Result: The p-value from the test revealed a significant difference in sales, confirming regional disparity in product demand.
📊 Key Visualizations
- Time-series plots showing monthly and yearly sales trends.
- Geographic heat maps highlighting top-performing regions.
- Box plots comparing profit margins by country.
- Scatter plots showing customer age vs. purchase frequency.
📌 Key Deliverables
- Cleaned and well-documented dataset
- Jupyter notebook detailing the full analysis workflow
- EDA and statistical visualisations
- Summary report outlining key insights and strategic recommendations
🌟 Project Highlights
- Identified key growth markets for bike sales, enabling targeted marketing.
- Provided evidence-based recommendations on which product lines to focus on in different regions.
- Demonstrated how statistical testing can validate business hypotheses in retail analytics.
💡 What I Learned
This project helped me strengthen my Python-based data analysis workflow from data wrangling to statistical testing. I also gained practical experience in:
- Communicating technical insights to a business audience
- Using visual storytelling to drive understanding
- Applying hypothesis testing in a real-world business context