Bug Classification Machine Learning Project

Advanced AI system for intelligent bug type prediction with 94.34% accuracy

10,000

Bug Reports

94.34%

Best Accuracy

10+

ML Models

500+

Features

Bug Type Distribution

Project Overview

Project Goal

Develop an intelligent bug classification system that automatically categorizes bug reports into Defects, Tasks, and Enhancements using advanced machine learning techniques.

Key Achievements

94.34% accuracy with Stacking Classifier
Advanced preprocessing with TF-IDF and SMOTE
10+ ML algorithms comprehensively evaluated
Professional dashboard for visualization

Innovation

Combines state-of-the-art ensemble methods, advanced text processing, and proper class balancing to achieve exceptional performance in automated bug classification.

Dataset Analysis

Dataset Overview

10,000

Total Records

Original Columns

Missing Values

High

Data Quality

Dataset Columns

Bug ID

Unique identifier for each bug report

Type

Bug classification (Defect/Task/Enhancement)

Summary

Detailed description of the bug

Product

Software product (Firefox, Core, etc.)

Component

Specific component within the product

Assignee

Developer assigned to fix the bug

Status

Current status (RESOLVED, OPEN, etc.)

Resolution

How the bug was resolved (FIXED, etc.)

Updated

Last update timestamp

Sample Data Preview

Bug ID	Type	Summary	Product	Component
1949668	defect	Race condition between clearing mIsDeferredPurgePending...	Core	Memory Allocator
1948993	task	Remove Nightly condition for vertical tabs checkboxes...	Firefox	Sidebar
1947536	defect	QR code image is not exposed to assistive technology...	Firefox	Messaging System
1947606	task	Add Nimbus support for calculator	Firefox	Address Bar

Showing 4 of 10,000 total records

Temporal Distribution

Product Analysis

Data Preprocessing Pipeline

Data Cleaning

Removed irrelevant columns and handled missing values

Feature Engineering

Extracted temporal features and grouped categories

Text Processing

Applied TF-IDF vectorization to bug summaries

Class Balancing

Used SMOTE to balance class distribution

Before SMOTE

After SMOTE

Machine Learning Models

🥇

Stacking Classifier

94.34%

Ensemble Champion

🥈

Random Forest

91.73%

Tree Ensemble

🥉

Decision Tree

83.81%

Interpretable Model

Complete Model Comparison

Performance Analysis

Model Metrics

Model	Accuracy	Precision	Recall	F1-Score
Stacking Classifier	94.34%	94.2%	94.1%	94.15%
Random Forest	91.73%	91.5%	91.8%	91.65%
Decision Tree	83.81%	83.5%	83.9%	83.70%

Confusion Matrix

Key Insights & Impact

Ensemble Superiority

Stacking Classifier achieved 94.34% accuracy by combining multiple algorithms, demonstrating the power of ensemble learning.

Text Features Critical

TF-IDF vectorization of bug summaries provided the most predictive power with distinctive language patterns.

Balancing Essential

SMOTE dramatically improved performance by addressing severe class imbalance in the dataset.

Engineering Impact

Feature engineering and preprocessing significantly boosted model performance and generalization.