Smelly Code Dataset for LLM-Based Code Analysis

Overview

This repository contains a collection of intentionally smelly code in multiple programming languages (Java, Python, JavaScript, and C++). The dataset is used to test and evaluate the capabilities of Large Language Models (LLMs) in detecting and refactoring common code smells.

Code smells included in this dataset belong to the following categories:

Bloaters (Large Classes, Long Methods, Primitive Obsession)
Couplers (Feature Envy, Inappropriate Intimacy)
Change Preventers (Divergent Change, Shotgun Surgery)
Dispensables (Duplicate Code, Lazy Class, Data Class)
Object-Oriented Abusers (Refused Bequest, Alternative Classes with Different Interfaces)

Purpose

The primary goal of this dataset is to assess how well LLMs like GPT-4, Code Llama, and other AI models can:

Detect various types of code smells.
Suggest effective refactoring strategies.
Improve code maintainability based on automated analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
Analysis		Analysis
C++		C++
Docs		Docs
Java		Java
JavaScript		JavaScript
PlantUML		PlantUML
Prompts		Prompts
Python		Python
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smelly Code Dataset for LLM-Based Code Analysis

Overview

Purpose

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Smelly Code Dataset for LLM-Based Code Analysis

Overview

Purpose

Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages