Structurally faithful development surrogates for tabular data.
masque turns a confidential tabular dataset -- a single table, a folder of
files, or a multi-sheet workbook -- into a structurally faithful synthetic
clone whose experimental design, NA pattern, and global covariance are close
enough to the original that pipeline code runs unchanged. It returns a private
recipe that round-trips: a pipeline written against the synthetic re-targets
to the original data with no source changes.
The custodian holds the data and the recipe; the analyst gets only the
synthetic. masque bridges that gap.
Version 0.6.0.9000 (development). Pre-CRAN; tagged releases on the GitHub repository.
From GitHub:
# install.packages("pak")
pak::pak("max578/masque")A companion r-universe distribution will provide pre-built binaries once the registry is live:
install.packages("masque", repos = "https://max578.r-universe.dev")CRAN submission is in preparation.
library(masque)
# Read a small public fixture (alpha-design field trial; John & Williams, 1995).
f <- system.file("extdata", "john_alpha.csv", package = "masque")
df <- read.csv(f, stringsAsFactors = TRUE)
# One guided call: read -> propose roles -> (review) -> mask -> audit.
# In an interactive session it pauses to let you review the plan.
m <- masque(df, mode = "collaborate", seed = 1L)
synth <- synthetic(m) # hand this to the analyst
rec <- recipe(m) # keep this private
# Analyst builds a pipeline against the synthetic namespace ...
fit <- lm(yield ~ gen + rep, data = synth)
# ... and the custodian re-targets it to the original data.
preds <- predict(fit, newdata = apply_recipe(df, rec))A folder of files or a multi-sheet workbook works the same way -- pass the
path to masque() and it masks every table at once, aliasing shared keys
consistently so the synthetic tables still join.
See vignette("getting_started", package = "masque") for the full
walk-through.
masque is not a privacy-preserving or differential-privacy tool. It is a
structurally faithful development surrogate with explicit confidentiality
guardrails. Read vignette("confidentiality", package = "masque") before
using.
What masque does
- Preserves enough structure for pipelines to run unchanged.
- Provides two explicit modes:
localfor owner-only realistic surrogates, andcollaboratefor controlled sharing with opaque aliasing, numeric jitter, and an automatic leakage audit. - Records every translation (column names, factor levels) in a private
recipeobject that is, at minimum, as sensitive as the original data. - Audits its own output (
audit_mask()) and flags realistic leakage risks before sharing.
What masque does not do
- It does not provide differential-privacy guarantees.
- It does not make outputs safe for public release.
- It does not anonymise rare strata, small designs, or operational metadata (small site-by-year combinations, contact names, geolocations).
- It does not rewrite arbitrary pipeline source code.
Bottom line. The recipe is at least as sensitive as the original. Never share the recipe and the synthetic together. The collaborate workflow assumes only the synthetic crosses the trust boundary.
vignette("getting_started")— the one-call path on a public fixture.vignette("confidentiality")— full threat model, the two modes, and the depth controls.vignette("recipe_anatomy")— what a recipe holds and how the round-trip re-targets a pipeline onto the original.
Reference index: https://max578.github.io/masque/ — full per-function
docs + rendered vignettes, deployed from the gh-pages branch.
API stability policy: see API_STABILITY.md.
citation("masque")The package also ships a CITATION.cff file; GitHub renders a "Cite this
repository" widget on the repo landing page.
MIT. See LICENSE and LICENSE.md.