diff --git a/README.md b/README.md
index 929a2ce..3b2a584 100644
--- a/README.md
+++ b/README.md
@@ -5,6 +5,8 @@ Training materials for learning the [ChipFlow platform](https://build.chipflow.c
 ## What's included
 
 - **[Creating a Design for Use with the ChipFlow Platform](getting-started-design.md)** — how to create a new design from scratch for the ChipFlow platform
+- **[Hard-Macro Builds (`package = "block"`)](block-package.md)** — produce a LEF + Liberty + GDS + blackbox stub for a parent chip, instead of a packaged chip
+- **[Using Hard Macros (`load_blackbox_wrapper`)](using-hard-macros.md)** — instantiate NDA / third-party hard macros (or your own block builds) inside an Amaranth design
 - **[Wrapping External RTL](wrapping-external-rtl.md)** — integrating existing Verilog or SystemVerilog modules into an Amaranth design (manual `Instance`)
 - **[`RTLWrapper` (TOML-based wrapping)](rtl-wrapper.md)** — higher-level wrapper with auto-mapping and sv2v preprocessing
 - **[Wrapping CV32E40P](cv32e40p-example.md)** — worked example: wrapping the OpenHW Group RISC-V core with sv2v and `RTLWrapper`
@@ -37,6 +39,8 @@ chipflow-training/
 ├── Makefile                    # Build commands
 ├── pyproject.toml              # Python dependencies
 ├── getting-started-design.md   # How to write a new design
+├── block-package.md            # Hard-macro / IP block builds
+├── using-hard-macros.md        # Instantiating hard macros via load_blackbox_wrapper
 ├── wrapping-external-rtl.md    # Wrapping Verilog / SystemVerilog IP (manual Instance)
 ├── rtl-wrapper.md              # TOML-based RTLWrapper reference
 ├── cv32e40p-example.md         # CV32E40P worked example
diff --git a/block-package.md b/block-package.md
new file mode 100644
index 0000000..abf1037
--- /dev/null
+++ b/block-package.md
@@ -0,0 +1,93 @@
+# Hard-Macro Builds (`package = "block"`)
+
+A "block" build produces a **hard-macro deliverable** instead of a packaged chip — LEF, Liberty (`.lib`), GDS, and a blackbox Verilog stub that a parent chip design can instantiate. Use this when you're delivering reusable IP to be integrated into someone else's chip rather than building a standalone chip yourself.
+
+For a normal packaged-chip build, see [Creating a Design for Use with the ChipFlow Platform](getting-started-design.md). For instantiating a block (this build's output, or any third-party hard macro) inside another design, see [Using Hard Macros](using-hard-macros.md).
+
+---
+
+## When to use it
+
+- You want to **ship an IP block** (analog macro, hardened CPU/peripheral, NDA IP) that another team will instantiate in their chip.
+- You need a **physical implementation** (placed/routed, with timing models) rather than just RTL.
+- You don't want pads, package bringup, or fixed clock/reset/JTAG slot reservations — block builds skip all of that.
+
+If you just want to manufacture a chip, use a chip package (`pga144` etc.) instead.
+
+---
+
+## chipflow.toml
+
+```toml
+[chipflow]
+project_name = "my_block"
+
+[chipflow.top]
+soc = "design.design:MySoC"
+
+[chipflow.silicon]
+process = "ihp_sg13g2"
+package = "block"
+
+[chipflow.silicon.block]
+width  = 50    # pin slots on the N and S edges
+height = 80    # pin slots on the W and E edges
+```
+
+`width` and `height` are **pin-slot counts** — how many signal pins fit along each edge. The backend converts them to physical microns using the process's pin pitch. Make them generous enough for every signal in your design (the build fails if you run out of slots).
+
+## What's different vs a chip build
+
+| | Chip (`pga144`, …) | Block (`block`) |
+|---|---|---|
+| Pin numbering | Anti-clockwise from top-left | Per-edge `(side, index)` |
+| Bringup pins (clock/reset/JTAG/power at fixed slots) | Yes | No — clock and reset go through regular pins; power comes via straps from the parent |
+| Pad cells | Yes | No |
+| Floorplan | Fixed package size | Sized from `width`/`height` × pin pitch (or auto-promoted to fixed size if the perimeter dominates) |
+| Outputs | GDS | GDS + LEF + Liberty `.lib` + blackbox `.bb.v` |
+
+---
+
+## Build outputs
+
+After a successful block submission you can download:
+
+| File | What it is |
+|---|---|
+| `<design>.gds` | Final layout |
+| `<design>.lef` | Abstract view for the parent's place-and-route — pin locations + obstructions, with `USE POWER`/`USE GROUND` PINs at the boundary so the parent connects power by abutment |
+| `<design>_typ.lib` | Liberty timing model (typ corner) for the parent's STA |
+| `<design>.bb.v` | Blackbox Verilog stub — module declaration + ports + `(* blackbox *)` attribute, no implementation; what the parent's RTL imports |
+
+The stub looks like:
+
+```verilog
+(* blackbox *)
+module my_block (clk, rst_n, soc_pins_count_0, ...);
+  input clk;
+  input rst_n;
+  output soc_pins_count_0;
+  // ...
+endmodule
+```
+
+---
+
+## Submitting
+
+Identical to a chip build:
+
+```bash
+CHIPFLOW_ROOT=my_block uv run chipflow pin lock
+CHIPFLOW_ROOT=my_block uv run chipflow silicon submit
+```
+
+The platform detects `package = "block"` from the lockfile and runs the macro build flow (synth → floorplan → PDN → place → CTS → route → fill → GDS → abstract).
+
+---
+
+## Caveats
+
+- **Clock and reset go through ordinary pins** — there's no fixed `clk`/`rst_n` slot. They're declared like any other I/O in your design and end up on the perimeter wherever pin allocation places them.
+- **Power is by abutment.** The block emits M1 followpin stubs at the boundary as VDD/VSS LEF PINs; the parent must abut to those (or run straps over the macro). There are no dedicated power pads.
+- **No bringup harness.** JTAG, scan, and other bringup logic that a chip flow inserts automatically is **not** added. If your block needs them, instantiate them in your design.
diff --git a/chipflow-toml-reference.md b/chipflow-toml-reference.md
index 04e3512..f7051cd 100644
--- a/chipflow-toml-reference.md
+++ b/chipflow-toml-reference.md
@@ -28,9 +28,10 @@ Required for silicon builds.
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
 | `process` | enum | **Yes** | — | Target manufacturing process |
-| `package` | string | **Yes** | — | Package identifier (e.g. `"pga144"`) |
+| `package` | string | **Yes** | — | Package identifier (e.g. `"pga144"`, `"block"`) |
 | `power` | dict of voltages | No | `{}` | Power domain voltages |
 | `debug` | dict of booleans | No | `None` | Debug configuration flags |
+| `block` | table | No (Yes when `package = "block"`) | `None` | Per-project block dimensions for hard-macro builds — see [`[chipflow.silicon.block]`](#chipflowsiliconblock) |
 
 **Allowed `process` values:**
 
@@ -42,6 +43,51 @@ Required for silicon builds.
 
 ---
 
+### `[chipflow.silicon.block]`
+
+Required when `package = "block"`, ignored otherwise. Used for hard-macro deliverables — see [Hard-Macro Builds](block-package.md).
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `width` | integer | **Yes** | — | Pin slots on the N and S edges |
+| `height` | integer | **Yes** | — | Pin slots on the W and E edges |
+
+```toml
+[chipflow.silicon]
+process = "ihp_sg13g2"
+package = "block"
+
+[chipflow.silicon.block]
+width  = 50
+height = 80
+```
+
+`width`/`height` are pin-slot counts, not microns — the backend translates them using the process's pin pitch.
+
+---
+
+### `[chipflow.silicon.macros]`
+
+Optional. Declares hard macros (NDA SRAMs, vendor IP, PLLs, blocks produced by an earlier `package = "block"` build) for inclusion in the build. See [Using Hard Macros](using-hard-macros.md).
+
+Each entry is keyed by a **logical name** (used from Python as `load_blackbox_wrapper("<logical_name>", ...)`) and points at a `*.blackbox.json` produced by [`macrostrip`](https://github.com/ChipFlow/macrostrip):
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `blackbox` | path | **Yes** | — | Path to a `*.blackbox.json` describing the macro. Relative paths resolve against `CHIPFLOW_ROOT`. |
+
+```toml
+[chipflow.silicon.macros.sram_64x64]
+blackbox = "vendor/ihp/sram_64x64.blackbox.json"
+
+[chipflow.silicon.macros.pll_core]
+blackbox = "vendor/pll/pll_core.blackbox.json"
+```
+
+The blackbox JSON itself carries paths to companion artifacts (LEF, Liberty, frame-view or real GDS, Verilog stub), interpreted relative to the JSON's own directory. At submit time those artifacts are packed into `bundle.zip` under `macros/<logical_name>/`.
+
+---
+
 ### `[chipflow.simulation]`
 
 | Field | Type | Required | Default | Description |
diff --git a/training-commands.md b/training-commands.md
index cfbe435..dc07072 100644
--- a/training-commands.md
+++ b/training-commands.md
@@ -92,11 +92,13 @@ gh auth login
 
 Follow the prompts to authenticate with your GitHub account.
 
-**Important:** After logging in, add the `user` scope (required for ChipFlow to read your email):
+**Important:** After logging in, add the `user:email` scope (required for ChipFlow to read your verified email):
 ```bash
-gh auth refresh -h github.com -s user
+gh auth refresh -h github.com -s user:email
 ```
 
+> A later `gh auth refresh` (or VS Code re-auth) without `-s user:email` can silently drop this scope. If a previously working `chipflow auth login` starts complaining about authentication, re-run the command above.
+
 ---
 
 ## Part 1: Clone and Set Up
@@ -248,9 +250,26 @@ rm upcounter/pins.lock
 CHIPFLOW_ROOT=upcounter uv run chipflow pin lock
 ```
 
+### Inspecting and rearranging the allocation
+
+After `chipflow pin lock` you can inspect or tweak the result without hand-editing JSON:
+
+```bash
+# Show the allocation as a text table (works for any package type)
+CHIPFLOW_ROOT=upcounter uv run chipflow pin show
+
+# Render an SVG layout (Quad/Block packages only) and write it to a file
+CHIPFLOW_ROOT=upcounter uv run chipflow pin show -f svg -o upcounter-pinout.svg
+
+# Swap two pin assignments — useful for matching a board layout
+CHIPFLOW_ROOT=upcounter uv run chipflow pin swap 17 42
+```
+
+`pin swap` operates on integer-pin packages (Quad, Block) and refuses to move bringup pins (clock, reset, JTAG, power) — those stay at fixed slots so PCB-level board bringup remains predictable.
+
 ### Notes
 
-- Pin assignment is **automatic** — you cannot manually assign specific signals to specific pins.
+- Pin assignment is **automatic** — you cannot manually assign specific signals to specific pins, but you can rearrange the result with `chipflow pin swap`.
 - The default package is `pga144`, which has 144 total pins. Some are reserved for system use (clock, reset, JTAG, power rails), leaving roughly 120 pins available for your design's I/O. Other packages can be added by request — contact the ChipFlow team.
 - Pins are numbered anti-clockwise starting from pin 1 at the top-left corner.
 - The `pins.lock` file should be **committed to version control** so that everyone on the team works with the same pinout.
@@ -311,6 +330,8 @@ After `chipflow silicon prepare`, local build outputs are in:
 ls upcounter/build/
 ```
 
+`chipflow silicon submit` packs the design RTLIL, `pins.lock`, and a `manifest.json` into a single `bundle.zip` next to the RTLIL — that's what gets uploaded to the platform. Useful to know if you're inspecting the build folder or want a self-contained artifact you can replay later.
+
 ---
 
 ## Part 6: Clean Up
@@ -345,6 +366,8 @@ make clean
 | Command | What it does |
 |---------|-------------|
 | `chipflow pin lock` | Generate deterministic pin assignments (pins.lock) |
+| `chipflow pin show [-f text\|svg] [-o FILE]` | Display the current allocation (SVG for Quad/Block packages) |
+| `chipflow pin swap <a> <b>` | Exchange two pin assignments in pins.lock (Quad/Block packages, non-bringup pins) |
 | `chipflow silicon prepare` | Synthesise design to RTLIL locally |
 | `chipflow silicon submit` | Submit RTLIL to cloud platform for backend build |
 | `chipflow silicon submit --wait` | Submit and stream build logs until complete |
@@ -405,13 +428,24 @@ Other processes may be available on request. If your target process is not yet s
 
 ---
 
+## Build modes
+
+| Mode | `package` value | Output | Doc |
+|------|----------------|--------|-----|
+| Packaged chip | `pga144`, … | GDS for fab | [Creating a Design](getting-started-design.md) |
+| Hard macro / IP block | `block` | GDS + LEF + Liberty `.lib` + blackbox `.bb.v` | [Hard-Macro Builds](block-package.md) |
+
+---
+
 ## Troubleshooting
 
 ### "Authentication failed" or "Could not retrieve email from GitHub"
 
+If chipflow-lib prints `Your gh CLI token is missing the user:email scope`, your `gh` token has lost the scope (commonly after `gh auth refresh` without `-s user:email`, or a VS Code re-auth):
+
 ```bash
-# Ensure gh has the user scope (required for ChipFlow to read your email)
-gh auth refresh -h github.com -s user
+# Add the user:email scope back
+gh auth refresh -h github.com -s user:email
 
 # Then re-login with ChipFlow
 CHIPFLOW_ROOT=upcounter uv run chipflow auth login
diff --git a/using-hard-macros.md b/using-hard-macros.md
new file mode 100644
index 0000000..4258d90
--- /dev/null
+++ b/using-hard-macros.md
@@ -0,0 +1,191 @@
+# Using Hard Macros (`load_blackbox_wrapper`)
+
+`chipflow.rtl.load_blackbox_wrapper` instantiates a hard macro (SRAM, PLL, vendor IP, an NDA cell, or a block produced by a previous [`package = "block"`](block-package.md) build) inside an Amaranth design. The macro's physical artifacts — LEF, Liberty, frame-view or real GDS, Verilog stub — travel with the submission so the platform's place-and-route can integrate them without exposing layout to anyone who shouldn't see it.
+
+For producing a hard macro, see **[Hard-Macro Builds](block-package.md)**. For wrapping plain Verilog/SystemVerilog RTL, see **[Wrapping External RTL](wrapping-external-rtl.md)** or **[`RTLWrapper`](rtl-wrapper.md)**.
+
+---
+
+## How it fits together
+
+Hard-macro integration uses two tools:
+
+1. **[`macrostrip`](https://github.com/ChipFlow/macrostrip)** — runs once per macro to produce a `*.blackbox.json` describing the macro (pin list, physical dimensions, paths to LEF/Liberty/GDS/stub). For NDA macros, `macrostrip frame` first replaces the real GDS with a frame view that has the same boundary and pin geometry but no internal layout.
+2. **chipflow-lib** — `load_blackbox_wrapper` reads the JSON and gives you an Amaranth `wiring.Component` you instantiate like any other submodule. At submit time, the macro's companion files are packed into the submission `bundle.zip` under `macros/<logical_name>/`.
+
+The `*.blackbox.json` is the single contract between the two tools — once it exists, the chipflow side doesn't care where it came from.
+
+---
+
+## Declaring a macro in `chipflow.toml`
+
+Each macro is given a **logical name** (the key you'll use from Python) and pointed at its blackbox JSON:
+
+```toml
+[chipflow.silicon.macros.sram_64x64]
+blackbox = "vendor/ihp/sram_64x64.blackbox.json"
+
+[chipflow.silicon.macros.pll_core]
+blackbox = "vendor/pll/pll_core.blackbox.json"
+```
+
+`blackbox` is a path relative to `CHIPFLOW_ROOT`. Companion-file paths inside the JSON are interpreted relative to the JSON's own directory, so a typical layout looks like:
+
+```
+my_design/
+├── chipflow.toml
+├── design/design.py
+└── vendor/ihp/
+    ├── sram_64x64.blackbox.json
+    ├── sram_64x64.lef
+    ├── sram_64x64.lib
+    ├── sram_64x64.gds       # frame-view for NDA, real GDS otherwise
+    └── sram_64x64.v         # blackbox Verilog stub
+```
+
+---
+
+## Instantiating from Python
+
+```python
+from amaranth import Module
+from amaranth.lib import wiring
+from chipflow.rtl import load_blackbox_wrapper
+
+
+class MyDesign(wiring.Component):
+    # ... signature omitted
+
+    def elaborate(self, platform):
+        m = Module()
+
+        m.submodules.sram = sram = load_blackbox_wrapper(
+            "sram_64x64",
+            clocks={"sys": "CLK"},
+            resets={"sys": "RST_N"},
+        )
+
+        # sram.signature has one member per signal pin (In(width) / Out(width)).
+        # Power and ground pins are handled by the platform — not visible here.
+        m.d.sync += sram.A.eq(addr)
+        m.d.sync += sram.D.eq(write_data)
+        m.d.comb += read_data.eq(sram.Q)
+
+        return m
+```
+
+The returned `BlackboxWrapper` is a `wiring.Component` whose signature mirrors the macro's **signal pins**:
+
+- Direction `in` → `In(width)`
+- Direction `out` → `Out(width)`
+- Power, ground, clock, and reset pins are **omitted from the signature** — clocks and resets are wired via the `clocks=` / `resets=` arguments, and power/ground are connected at the platform/PDN level.
+- `inout` signal pins are not auto-wrapped; declare them explicitly if needed.
+
+`clocks` / `resets` map an Amaranth clock-domain name to the macro's pin name (the LEF pin, verbatim — typically uppercase). `RST_N` is wired with active-low semantics, matching `RTLWrapper`.
+
+---
+
+## What gets uploaded
+
+When you run `chipflow silicon submit`, every macro you instantiated is packed into `bundle.zip` alongside the RTLIL and `pins.lock`:
+
+```
+bundle.zip
+├── manifest.json
+├── top.il
+├── pins.lock
+└── macros/
+    └── sram_64x64/
+        ├── sram_64x64.lef
+        ├── sram_64x64.lib
+        ├── sram_64x64.gds
+        ├── sram_64x64.v
+        └── sram_64x64.blackbox.json
+```
+
+The manifest carries a `"macros"` dict keyed by logical name with `_file`-suffixed paths to each artifact, so the backend can locate them without parsing the JSON itself. The platform feeds them to OpenROAD as `ADDITIONAL_LEFS` / `ADDITIONAL_LIBS` / `ADDITIONAL_GDS_FILES`.
+
+---
+
+## Security model
+
+There are two tiers of protection for the macro layout, depending on how strict your IP-handling requirements are. Both use the same `load_blackbox_wrapper` API — the difference is only what gets bundled into the submission.
+
+### Tier 1 — Real layout never leaves your premises
+
+For NDA macros (vendor IP, foundry SRAMs, anything where the agreement bars *any* external transmission of the GDS):
+
+- Use `macrostrip frame` to produce a **frame-view GDS** — same boundary and pin geometry, **no internal layout**. Roughly the GDS analogue of a LEF abstract.
+- The submission carries the frame view + LEF + Liberty + Verilog stub — the artifacts vendor NDAs typically permit sharing with foundries and EDA tools anyway.
+- After the build returns, `macrostrip swap` substitutes the **real** GDS back in locally before tape-out.
+- The cloud sees, stores, and processes only abstracts. The real layout never reaches it.
+
+This is the right tier when the customer's NDA explicitly forbids external transmission of the macro GDS, or when "real GDS in a third-party cloud" is itself the concern.
+
+### Tier 2 — Real layout goes to the cloud, isolated within ChipFlow's tenancy
+
+For internal IP, your own block builds, open-source macros, or vendor IP whose NDA permits use in cloud EDA services:
+
+- Skip `macrostrip frame` and point `macrostrip blackbox` at the real GDS directly.
+- The real layout travels in `bundle.zip` and is processed inside ChipFlow's cloud environment. It's isolated per customer (auth-scoped storage, internal access controls), not shared with other customers, and not transmitted to third parties.
+- Common case: the layout shouldn't end up in competitors' hands, but trusting ChipFlow as a vendor is fine — the same trust posture you'd extend to a hosted EDA tool.
+
+For ChipFlow's specific data-handling commitments (tenancy boundaries, retention, internal access, sub-processors, audit), see ChipFlow's published policy or contact the team.
+
+### Choosing a tier
+
+| You want… | Use |
+|---|---|
+| Real GDS never on a third-party server | Tier 1 (frame workflow) |
+| Real GDS not exposed to other customers or third parties, but ChipFlow as vendor is OK | Tier 2 (real GDS) |
+| Macro is open-source / public IP | Tier 2 (or skip the security framing entirely) |
+
+Tier 2 is the simpler workflow (no frame/swap step). Tier 1 is the right answer whenever the contract requires it — when in doubt, ask the IP owner what artifacts they permit you to send to a cloud EDA service.
+
+---
+
+## NDA vs non-NDA workflows
+
+The same `load_blackbox_wrapper` path serves both tiers above. The difference is purely how you produce the JSON.
+
+**NDA macros** (you've signed something — vendor IP, foundry SRAMs you can't redistribute):
+
+```bash
+# 1. Strip the real GDS down to a frame view (same boundary + pin geometry, no internal layout).
+macrostrip frame --gds real.gds --top SRAM_64X64 -o sram_64x64.gds
+
+# 2. Build the blackbox JSON pointing at the frame GDS.
+macrostrip blackbox \
+  --lef sram_64x64.lef --top SRAM_64X64 \
+  --frame-gds sram_64x64.gds \
+  --liberty sram_64x64.lib \
+  --verilog-stub sram_64x64.v \
+  -o sram_64x64.blackbox.json
+
+# 3. After the build returns, swap the real GDS back into the result before tape-out.
+macrostrip swap --result build.gds --real real.gds --top SRAM_64X64 -o build.final.gds
+```
+
+The frame view is what travels to ChipFlow; the real layout never leaves your premises.
+
+**Non-NDA macros** (anything you're free to ship in full — your own block from a previous `package = "block"` build, open-source IP, etc.):
+
+```bash
+# Skip the frame step — point macrostrip at the real GDS directly.
+macrostrip blackbox \
+  --lef macro.lef --top MY_MACRO \
+  --frame-gds macro.real.gds \
+  --liberty macro.lib \
+  --verilog-stub macro.v \
+  -o macro.blackbox.json
+```
+
+The schema field is named `frame_gds` for historical reasons but chipflow-lib treats it as "the GDS to include" — frame-view or real, the submission path is identical. Skip `macrostrip swap` on return: there's nothing to substitute back.
+
+---
+
+## See also
+
+- **[Hard-Macro Builds](block-package.md)** — produce a macro using ChipFlow (the inverse operation).
+- **[macrostrip](https://github.com/ChipFlow/macrostrip)** — the tool that produces the `*.blackbox.json`.
+- **[`RTLWrapper`](rtl-wrapper.md)** — wrapping ordinary Verilog/SystemVerilog RTL (no physical artifacts).