Skip to content

EvilBit-Labs/libmagic-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

149 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

libmagic-rs

GitHub License GitHub Sponsors

GitHub Actions Workflow Status docs.rs Deps.rs Repository Dependencies

Codecov GitHub issues GitHub last commit

Crates.io GitHub Release Date Crates.io Downloads (latest version) Crates.io MSRV


OpenSSF Scorecard OpenSSF Best Practices


A pure-Rust implementation of libmagic, the library that powers the file command for identifying file types. This project provides a memory-safe, efficient alternative to the C-based libmagic library.

Note

This is a clean-room implementation inspired by the original libmagic project. We respect and acknowledge the original work by Ian Darwin and the current maintainers led by Christos Zoulas.

Project Status

v0.5.0 -- The core file identification pipeline is functional. Common file types can be identified using text magic files today.

Warning

Pre-1.0 API. libmagic-rs is a pre-1.0 crate and the public API may change between minor versions until v1.0.0 is cut. Pin an exact version in Cargo.toml if you need reproducible builds, and read CHANGELOG.md before upgrading. See issue #52 for the v1.0 stability roadmap.

  • 1,200+ tests with >94% line coverage
  • Zero unsafe code (unsafe_code = "forbid" enforced project-wide)
  • Zero warnings with strict clippy linting
  • Published on crates.io

Features

  • Parse and evaluate text magic files (the stable, documented format)
  • Identify files via CLI (rmagic) or as a library dependency
  • Text and JSON output formats
  • Built-in fallback rules for 10 common formats (ELF, PE, ZIP, TAR, GZIP, JPEG, PNG, GIF, BMP, PDF)
  • Custom magic files via --magic-file
  • Memory-mapped I/O with bounds checking
  • Hierarchical rule evaluation with confidence scoring
  • Stdin support (rmagic -)

Supported Magic File Syntax

Category Supported
Types byte, short, long, quad, float, double, string, pstring (with big/little-endian variants), unsigned variants (ubyte, ushort/ubeshort/uleshort, ulong/ubelong/ulelong, uquad/ubequad/ulequad), 32-bit dates (date/ldate/bedate/beldate/ledate/leldate), 64-bit dates (qdate/qldate/beqdate/beqldate/leqdate/leqldate), regex, and search/N
Regex Binary-safe via regex::bytes::Regex. Flags: /c (case-insensitive), /s (match-start anchor advance), /l (line-based scan window). Counts: regex/N (N bytes), regex/Nl (N lines). All variants capped at 8192 bytes (FILE_REGEX_MAX). Compile size is clamped to 1 MiB (size_limit + dfa_size_limit) to bound compile-time DoS exposure from adversarial patterns.
Search Bounded literal scan via memchr::memmem::find. search/N scans the first N bytes from the offset; the range is mandatory (NonZeroUsize). Match-end anchor advance for relative-offset children (matches GNU file semantics).
Operators =, !=, <, >, <=, >=, & (bitwise AND with optional mask), ^ (bitwise XOR), ~ (bitwise NOT), x (any value)
Offsets Absolute, from-end, indirect, and relative (all fully evaluated; magic-file &+N/&-N parsing for relative is pending)
Directives !:strength (parsed; !:mime, !:ext, !:apple planned)

Quick Start

Installation

cargo install libmagic-rs

CLI Usage

# Basic file identification
rmagic file.bin

# JSON output
rmagic file.bin --json

# Use built-in rules (no external magic file needed)
rmagic --use-builtin file.bin

# Custom magic file
rmagic --magic-file custom.magic file.bin

# Multiple files
rmagic file1.bin file2.bin file3.bin

# Read from stdin
cat file.bin | rmagic -

Library Usage

use libmagic_rs::MagicDatabase;

// Load magic rules from a text magic file
let db = MagicDatabase::load_from_file("/usr/share/misc/magic")?;

// Identify file type
let result = db.evaluate_file("example.bin")?;
println!("File type: {}", result.description);
println!("Confidence: {:.0}%", result.confidence * 100.0);

// Or evaluate an in-memory buffer
let buffer = std::fs::read("example.bin")?;
let result = db.evaluate_buffer(&buffer)?;
if let Some(mime) = result.mime_type {
    println!("MIME type: {}", mime);
}

// Or use built-in rules (no external files needed)
let db = MagicDatabase::with_builtin_rules();
let result = db.evaluate_file("example.bin")?;

Architecture

Magic File --> Parser --> AST --> Evaluator --> Match Results --> Output Formatter
     |
Target File --> Memory Mapper --> File Buffer
Module Purpose
parser/ Magic file DSL parsing into AST (nom-based)
evaluator/ Rule evaluation with offset resolution, type interpretation, operator matching
output/ Text (GNU file compatible) and JSON formatting
io/ Memory-mapped file buffers with safe bounds checking

Key Types

pub struct MagicRule {
    pub offset: OffsetSpec,     // Where to look in the file
    pub typ: TypeKind,          // How to interpret the bytes
    pub op: Operator,           // How to compare
    pub value: Value,           // What to compare against
    pub message: String,        // Output on match
    pub children: Vec<MagicRule>, // Nested sub-rules
    pub level: u32,             // Nesting depth
    pub strength_modifier: Option<StrengthModifier>,
}

pub enum TypeKind {
    Byte { signed: bool },
    Short { endian: Endianness, signed: bool },
    Long { endian: Endianness, signed: bool },
    Quad { endian: Endianness, signed: bool },
    Float { endian: Endianness },
    Double { endian: Endianness },
    Date { endian: Endianness, utc: bool },
    QDate { endian: Endianness, utc: bool },
    String { max_length: Option<usize> },
    PString { max_length: Option<usize>, length_width: PStringLengthWidth, length_includes_itself: bool },
    Regex { flags: RegexFlags, count: RegexCount },
    Search { range: NonZeroUsize },
    // See src/parser/ast.rs for the authoritative definition.
}

pub enum OffsetSpec {
    Absolute(i64),
    FromEnd(i64),
    Indirect { base_offset, pointer_type, adjustment, endian },
    Relative(i64),
}

Compatibility

libmagic-rs follows the OpenBSD approach: parse text magic files directly, prioritizing simplicity and correctness. Text magic files are stable across libmagic versions and work unchanged from system installations (/usr/share/misc/magic).

Compatibility is validated against the original file project test suite.

Security

  • Memory Safety: unsafe_code = "forbid" enforced project-wide
  • Bounds Checking: All buffer access protected
  • Resource Limits: Configurable recursion depth, string length, and per-file timeout
  • Fuzzing: Robustness testing with malformed inputs

Verifying Releases

All release artifacts are signed via Sigstore using GitHub Attestations:

gh attestation verify <artifact> --repo EvilBit-Labs/libmagic-rs

See the release verification guide for details.

Roadmap

See ROADMAP.md for the full roadmap, or GitHub Milestones for issue tracking.

Milestone Status Focus
v0.2.0 shipped Comparison operators, bitwise XOR/NOT, indirect/relative offsets, 64-bit integers
v0.3.0 shipped Regex, float/double, date/timestamp, pascal strings, meta-types
v0.4.0 shipped Evaluator submodule split, JSON metadata, parse warnings, improved errors
v0.5.x (current) in flight TOCTOU/search-path hardening, regex compile cache, validated constructors
v0.6.0 planned Value pattern refactor, MagicDatabase builder, Directive extension point
v1.0.0 planned 95%+ GNU file compatibility, stable API, fuzzing harness, full non_exhaustive

Contributing

See CONTRIBUTING.md for development setup, coding guidelines, and submission process.

License

Licensed under the Apache License 2.0 - see LICENSE for details.

Support

Acknowledgments

  • Ian Darwin for the original file command and libmagic
  • Christos Zoulas and the current libmagic maintainers
  • The Rust community for excellent tooling and ecosystem

About

A pure-Rust replacement of libmagic, the library behind the file command

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

 

Contributors

Languages