Skip to content

HRI-EU/SmellyCodeDataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Smelly Code Dataset for LLM-Based Code Analysis

Overview

This repository contains a collection of intentionally smelly code in multiple programming languages (Java, Python, JavaScript, and C++). The dataset is used to test and evaluate the capabilities of Large Language Models (LLMs) in detecting and refactoring common code smells.

Code smells included in this dataset belong to the following categories:

  • Bloaters (Large Classes, Long Methods, Primitive Obsession)
  • Couplers (Feature Envy, Inappropriate Intimacy)
  • Change Preventers (Divergent Change, Shotgun Surgery)
  • Dispensables (Duplicate Code, Lazy Class, Data Class)
  • Object-Oriented Abusers (Refused Bequest, Alternative Classes with Different Interfaces)

Purpose

The primary goal of this dataset is to assess how well LLMs like GPT-4, Code Llama, and other AI models can:

  • Detect various types of code smells.
  • Suggest effective refactoring strategies.
  • Improve code maintainability based on automated analysis.

Project Structure

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages