A small duplicate-code metric for Ruby.
ruby-duplicates parses Ruby with the standard library Ripper, normalizes syntax trees so names and literal values do not dominate the comparison, fingerprints method subtrees, and reports methods with high Jaccard similarity.
It is inspired by Uncle Bob's dry4clj, which applies the same broad idea to Clojure code: compare normalized structure instead of doing plain text clone detection.
This is a metric tool, not a refactoring engine. It points at suspiciously similar methods so a human or coding agent can decide whether the duplication is accidental, intentional symmetry, or data-shaped boilerplate.
From RubyGems:
gem install ruby-duplicates
ruby-duplicates app lib testFrom this repo:
bundle install
exe/ruby-duplicates app lib testruby-duplicates [options] [file-or-directory ...]Examples:
ruby-duplicates app lib test
ruby-duplicates --threshold 0.9 --min-lines 5 --min-nodes 30 app
ruby-duplicates --json app/models app/controllersOptions:
--threshold N Minimum similarity score, default 0.82
--min-lines N Minimum method source lines, default 4
--min-nodes N Minimum normalized syntax nodes, default 20
--max-results N Maximum matches to print, default 50
--format F text or json, default text
--json Same as --format json
--ignore-dir N Directory basename or path to skip; may be repeatedExample output:
ruby_duplicates candidates=3 matches=1 threshold=0.82
DUPLICATE score=1.00 shared=21
examples/duplicate_sample.rb:1-4 alpha nodes=64
examples/duplicate_sample.rb:7-10 beta nodes=64
For each Ruby method, the scanner:
- Parses the file with
Ripper.sexp. - Extracts
defanddefsmethod nodes. - Normalizes identifiers, constants, instance variables, globals, labels, strings, and numbers into token classes.
- Normalizes most non-head symbols so tiny operator/name differences do not hide repeated shape.
- Fingerprints every normalized subtree with SHA1.
- Compares method fingerprint sets with Jaccard similarity.
The defaults intentionally favor high-signal matches. Lower --threshold, --min-lines, or --min-nodes when exploring.
- It only scans Ruby methods, not arbitrary repeated blocks.
- It is structural, not semantic.
- Metaprogrammed code can look sparse because the useful behavior is hidden in data.
- Rails controllers and tests can produce intentional symmetry. Treat those as review candidates, not automatic refactors.
ruby -Ilib test/ruby_duplicates_test.rb
gem build ruby-duplicates.gemspec- Uncle Bob's
dry4clj: https://github.com/unclebob/dry4clj