A pure-Rust implementation of libmagic, the library that powers the file command for identifying file types. This project provides a memory-safe, efficient alternative to the C-based libmagic library.
Note
This is a clean-room implementation inspired by the original libmagic project. We respect and acknowledge the original work by Ian Darwin and the current maintainers led by Christos Zoulas.
v0.5.0 -- The core file identification pipeline is functional. Common file types can be identified using text magic files today.
Warning
Pre-1.0 API. libmagic-rs is a pre-1.0 crate and the public API may change between minor versions until v1.0.0 is cut. Pin an exact version in Cargo.toml if you need reproducible builds, and read CHANGELOG.md before upgrading. See issue #52 for the v1.0 stability roadmap.
- 1,200+ tests with >94% line coverage
- Zero unsafe code (
unsafe_code = "forbid"enforced project-wide) - Zero warnings with strict clippy linting
- Published on crates.io
- Parse and evaluate text magic files (the stable, documented format)
- Identify files via CLI (
rmagic) or as a library dependency - Text and JSON output formats
- Built-in fallback rules for 10 common formats (ELF, PE, ZIP, TAR, GZIP, JPEG, PNG, GIF, BMP, PDF)
- Custom magic files via
--magic-file - Memory-mapped I/O with bounds checking
- Hierarchical rule evaluation with confidence scoring
- Stdin support (
rmagic -)
| Category | Supported |
|---|---|
| Types | byte, short, long, quad, float, double, string, pstring (with big/little-endian variants), unsigned variants (ubyte, ushort/ubeshort/uleshort, ulong/ubelong/ulelong, uquad/ubequad/ulequad), 32-bit dates (date/ldate/bedate/beldate/ledate/leldate), 64-bit dates (qdate/qldate/beqdate/beqldate/leqdate/leqldate), regex, and search/N |
| Regex | Binary-safe via regex::bytes::Regex. Flags: /c (case-insensitive), /s (match-start anchor advance), /l (line-based scan window). Counts: regex/N (N bytes), regex/Nl (N lines). All variants capped at 8192 bytes (FILE_REGEX_MAX). Compile size is clamped to 1 MiB (size_limit + dfa_size_limit) to bound compile-time DoS exposure from adversarial patterns. |
| Search | Bounded literal scan via memchr::memmem::find. search/N scans the first N bytes from the offset; the range is mandatory (NonZeroUsize). Match-end anchor advance for relative-offset children (matches GNU file semantics). |
| Operators | =, !=, <, >, <=, >=, & (bitwise AND with optional mask), ^ (bitwise XOR), ~ (bitwise NOT), x (any value) |
| Offsets | Absolute, from-end, indirect, and relative (all fully evaluated; magic-file &+N/&-N parsing for relative is pending) |
| Directives | !:strength (parsed; !:mime, !:ext, !:apple planned) |
cargo install libmagic-rs# Basic file identification
rmagic file.bin
# JSON output
rmagic file.bin --json
# Use built-in rules (no external magic file needed)
rmagic --use-builtin file.bin
# Custom magic file
rmagic --magic-file custom.magic file.bin
# Multiple files
rmagic file1.bin file2.bin file3.bin
# Read from stdin
cat file.bin | rmagic -use libmagic_rs::MagicDatabase;
// Load magic rules from a text magic file
let db = MagicDatabase::load_from_file("/usr/share/misc/magic")?;
// Identify file type
let result = db.evaluate_file("example.bin")?;
println!("File type: {}", result.description);
println!("Confidence: {:.0}%", result.confidence * 100.0);
// Or evaluate an in-memory buffer
let buffer = std::fs::read("example.bin")?;
let result = db.evaluate_buffer(&buffer)?;
if let Some(mime) = result.mime_type {
println!("MIME type: {}", mime);
}
// Or use built-in rules (no external files needed)
let db = MagicDatabase::with_builtin_rules();
let result = db.evaluate_file("example.bin")?;Magic File --> Parser --> AST --> Evaluator --> Match Results --> Output Formatter
|
Target File --> Memory Mapper --> File Buffer
| Module | Purpose |
|---|---|
parser/ |
Magic file DSL parsing into AST (nom-based) |
evaluator/ |
Rule evaluation with offset resolution, type interpretation, operator matching |
output/ |
Text (GNU file compatible) and JSON formatting |
io/ |
Memory-mapped file buffers with safe bounds checking |
pub struct MagicRule {
pub offset: OffsetSpec, // Where to look in the file
pub typ: TypeKind, // How to interpret the bytes
pub op: Operator, // How to compare
pub value: Value, // What to compare against
pub message: String, // Output on match
pub children: Vec<MagicRule>, // Nested sub-rules
pub level: u32, // Nesting depth
pub strength_modifier: Option<StrengthModifier>,
}
pub enum TypeKind {
Byte { signed: bool },
Short { endian: Endianness, signed: bool },
Long { endian: Endianness, signed: bool },
Quad { endian: Endianness, signed: bool },
Float { endian: Endianness },
Double { endian: Endianness },
Date { endian: Endianness, utc: bool },
QDate { endian: Endianness, utc: bool },
String { max_length: Option<usize> },
PString { max_length: Option<usize>, length_width: PStringLengthWidth, length_includes_itself: bool },
Regex { flags: RegexFlags, count: RegexCount },
Search { range: NonZeroUsize },
// See src/parser/ast.rs for the authoritative definition.
}
pub enum OffsetSpec {
Absolute(i64),
FromEnd(i64),
Indirect { base_offset, pointer_type, adjustment, endian },
Relative(i64),
}libmagic-rs follows the OpenBSD approach: parse text magic files directly, prioritizing simplicity and correctness. Text magic files are stable across libmagic versions and work unchanged from system installations (/usr/share/misc/magic).
Compatibility is validated against the original file project test suite.
- Memory Safety:
unsafe_code = "forbid"enforced project-wide - Bounds Checking: All buffer access protected
- Resource Limits: Configurable recursion depth, string length, and per-file timeout
- Fuzzing: Robustness testing with malformed inputs
All release artifacts are signed via Sigstore using GitHub Attestations:
gh attestation verify <artifact> --repo EvilBit-Labs/libmagic-rsSee the release verification guide for details.
See ROADMAP.md for the full roadmap, or GitHub Milestones for issue tracking.
| Milestone | Status | Focus |
|---|---|---|
| v0.2.0 | shipped | Comparison operators, bitwise XOR/NOT, indirect/relative offsets, 64-bit integers |
| v0.3.0 | shipped | Regex, float/double, date/timestamp, pascal strings, meta-types |
| v0.4.0 | shipped | Evaluator submodule split, JSON metadata, parse warnings, improved errors |
| v0.5.x (current) | in flight | TOCTOU/search-path hardening, regex compile cache, validated constructors |
| v0.6.0 | planned | Value pattern refactor, MagicDatabase builder, Directive extension point |
| v1.0.0 | planned | 95%+ GNU file compatibility, stable API, fuzzing harness, full non_exhaustive |
See CONTRIBUTING.md for development setup, coding guidelines, and submission process.
Licensed under the Apache License 2.0 - see LICENSE for details.
- Ian Darwin for the original file command and libmagic
- Christos Zoulas and the current libmagic maintainers
- The Rust community for excellent tooling and ecosystem