Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 21 additions & 15 deletions skills/openrouter-images/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,37 +36,47 @@ Create a new image from a text prompt:
```bash
cd <skill-path>/scripts && npx tsx generate.ts "a red panda wearing sunglasses"
cd <skill-path>/scripts && npx tsx generate.ts "a futuristic cityscape at night" --aspect-ratio 16:9
cd <skill-path>/scripts && npx tsx generate.ts "pixel art of a dragon" --output dragon.png
cd <skill-path>/scripts && npx tsx generate.ts "pixel art of a dragon" --output dragon
cd <skill-path>/scripts && npx tsx generate.ts "a watercolor painting" --model google/gemini-2.5-flash-image
cd <skill-path>/scripts && npx tsx generate.ts "red logo on white" --model recraft-ai/recraft-v3-svg --rgb-colors "220,30,30"
```

### Options

| Flag | Description | Default |
|---|---|---|
| `--model <id>` | OpenRouter model ID | `google/gemini-3.1-flash-image-preview` |
| `--output <path>` | Output file path | `image-YYYYMMDD-HHmmss.png` |
| `--output <stem>` | Output path stem (extension auto-derived from MIME type) | `image-YYYYMMDD-HHmmss` |
| `--aspect-ratio <r>` | Aspect ratio (e.g. `16:9`, `1:1`, `4:3`) | Model default |
| `--image-size <s>` | Image size (e.g. `1K`, `2K`) | Model default |
| `--rgb-colors <list>` | Semicolon-separated RGB palette, e.g. `255,0,0;0,128,0` (Recraft) | — |
| `--background-rgb-color <rgb>` | Background color as `r,g,b` (Recraft) | — |
| `--strength <0-1>` | Influence strength for style/color transfer (Recraft) | — |

**Output extension:** The file extension (`.png`, `.jpg`, `.webp`, `.svg`, etc.) is derived automatically from the MIME type returned by the model. If you pass `--output dragon`, the saved file might be `dragon.png` or `dragon.svg` depending on the model.

## Edit Image

Modify an existing image with a text prompt:

```bash
cd <skill-path>/scripts && npx tsx edit.ts photo.png "make the sky purple"
cd <skill-path>/scripts && npx tsx edit.ts avatar.jpg "add a party hat" --output avatar-hat.png
cd <skill-path>/scripts && npx tsx edit.ts avatar.jpg "add a party hat" --output avatar-hat
cd <skill-path>/scripts && npx tsx edit.ts scene.png "convert to watercolor style" --model google/gemini-2.5-flash-image
cd <skill-path>/scripts && npx tsx edit.ts logo.png "recolor in red palette" --rgb-colors "220,30,30;180,20,20"
```

### Options

| Flag | Description | Default |
|---|---|---|
| `--model <id>` | OpenRouter model ID | `google/gemini-3.1-flash-image-preview` |
| `--output <path>` | Output file path | `image-YYYYMMDD-HHmmss.png` |
| `--output <stem>` | Output path stem (extension auto-derived from MIME type) | `image-YYYYMMDD-HHmmss` |
| `--aspect-ratio <r>` | Aspect ratio (e.g. `16:9`, `1:1`, `4:3`) | Model default |
| `--image-size <s>` | Image size (e.g. `1K`, `2K`) | Model default |
| `--rgb-colors <list>` | Semicolon-separated RGB palette, e.g. `255,0,0;0,128,0` (Recraft) | — |
| `--background-rgb-color <rgb>` | Background color as `r,g,b` (Recraft) | — |
| `--strength <0-1>` | Influence strength for style/color transfer (Recraft) | — |

Supported input formats: `.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`

Expand Down Expand Up @@ -97,20 +107,16 @@ Supported input formats: `.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`

## API Response Shapes

Image generation uses `POST /api/v1/responses` with `modalities: ["image", "text"]`. See the [Responses API reference](https://openrouter.ai/docs/api/reference/responses/overview) and [image generation guide](https://openrouter.ai/docs/guides/overview/multimodal/image-generation) for full request details.
Image generation uses `POST /api/v1/chat/completions`. Google models require `modalities: ["image", "text"]`; other models (Recraft, DALL-E, etc.) must omit `modalities` to avoid a 404.

The image-specific output item type is `image_generation_call` — this is not obvious from the general Responses API docs:
Images are extracted from four possible response shapes, tried in order:

```json
{
"type": "image_generation_call",
"id": "imagegen-abc123",
"status": "completed",
"result": "<base64-encoded image data>"
}
```
1. **OpenRouter extension** — `choices[0].message.images[]` (string array)
2. **Responses API items** — `output[].type == "image_generation_call"` with `status == "completed"`
3. **DALL-E / native** — `data[].url` or `data[].b64_json`
4. **Content array** — `choices[0].message.content[].type == "image_url"`

This appears alongside standard `message` output items in the `output` array. Text and image outputs may each be absent depending on the model and prompt.
The saved file extension (`.png`, `.jpg`, `.webp`, `.svg`, etc.) is derived from the MIME type in the response — either the `content-type` header (for HTTP URL images) or the `data:` URL prefix (for base64 images).

## Using a Different Model

Expand Down
61 changes: 43 additions & 18 deletions skills/openrouter-images/scripts/edit.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
import { extname } from "node:path";
import {
DEFAULT_MODEL,
requireApiKey,
parseArgs,
parseRgbColors,
parseRgbTriplet,
postChatCompletion,
readImageAsDataUrl,
extractImages,
saveImage,
defaultOutputPath,
} from "./lib.js";
Expand All @@ -15,20 +19,38 @@ const imagePath = args.get("_0") as string | undefined;
const prompt = args.get("_1") as string | undefined;

if (!imagePath || !prompt) {
console.error("Usage: npx tsx edit.ts <image-path> \"prompt\" [--model <id>] [--output <path>] [--aspect-ratio <r>] [--image-size <s>]");
console.error(
"Usage: npx tsx edit.ts <image-path> \"prompt\" [--model <id>] [--output <path>]\n" +
" [--aspect-ratio <r>] [--image-size <s>]\n" +
" [--rgb-colors \"r,g,b[;r,g,b...]\"] [--background-rgb-color \"r,g,b\"]\n" +
" [--strength <0-1>]"
);
process.exit(1);
}

const model = (args.get("model") as string) || DEFAULT_MODEL;
const outputBase = (args.get("output") as string) || defaultOutputPath();
const aspectRatio = args.get("aspect-ratio") as string | undefined;
const imageSize = args.get("image-size") as string | undefined;
const rgbColorsRaw = args.get("rgb-colors") as string | undefined;
const bgColorRaw = args.get("background-rgb-color") as string | undefined;
const strengthRaw = args.get("strength") as string | undefined;

const dataUrl = readImageAsDataUrl(imagePath as string);

const imageConfig: Record<string, string> = {};
const imageConfig: Record<string, unknown> = {};
if (aspectRatio) imageConfig.aspect_ratio = aspectRatio;
if (imageSize) imageConfig.image_size = imageSize;
if (rgbColorsRaw) imageConfig.rgb_colors = parseRgbColors(rgbColorsRaw);
if (bgColorRaw) imageConfig.background_rgb_color = parseRgbTriplet(bgColorRaw);
if (strengthRaw) {
const s = parseFloat(strengthRaw);
if (isNaN(s) || s < 0 || s > 1) {
console.error("Error: --strength must be a number between 0 and 1.");
process.exit(1);
}
imageConfig.strength = s;
}

const body: any = {
model,
Expand All @@ -41,41 +63,44 @@ const body: any = {
],
},
],
modalities: ["image", "text"],
// Recraft and other non-Google models reject modalities:"image","text" with a 404.
// Google models require it for image output.
...(model.startsWith("google/") ? { modalities: ["image", "text"] } : {}),
...(Object.keys(imageConfig).length > 0 ? { image_config: imageConfig } : {}),
};

const json = await postChatCompletion(apiKey, body);
const message = json.choices?.[0]?.message;

if (!message) {
console.error("Error: No response from model.");
process.exit(1);
}

if (message.content) {
console.error(`Model: ${message.content}`);
const textContent = json.choices?.[0]?.message?.content;
if (textContent) {
console.error(`Model: ${textContent}`);
}

const images: string[] = message.images ?? [];
const images = extractImages(json);
if (images.length === 0) {
console.error("Error: No images returned by model.");
console.error("Response:", JSON.stringify(json, null, 2));
process.exit(1);
}

const saved: string[] = [];
for (let i = 0; i < images.length; i++) {
const img = images[i].startsWith("data:") ? images[i] : `data:image/png;base64,${images[i]}`;
const raw = images[i];
// Normalise: pass data: and https: URLs as-is; wrap bare base64 as PNG.
const imgData =
raw.startsWith("data:") || raw.startsWith("http://") || raw.startsWith("https://")
? raw
: `data:image/png;base64,${raw}`;

let outPath: string;
if (images.length === 1) {
outPath = outputBase;
} else {
const dotIdx = outputBase.lastIndexOf(".");
const base = dotIdx > 0 ? outputBase.slice(0, dotIdx) : outputBase;
const ext = dotIdx > 0 ? outputBase.slice(dotIdx) : ".png";
outPath = `${base}-${i + 1}${ext}`;
const currentExt = extname(outputBase);
const base = currentExt ? outputBase.slice(0, -currentExt.length) : outputBase;
outPath = `${base}-${i + 1}`;
}
const abs = saveImage(img, outPath);
const abs = await saveImage(imgData, outPath);
saved.push(abs);
}

Expand Down
61 changes: 43 additions & 18 deletions skills/openrouter-images/scripts/generate.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
import { extname } from "node:path";
import {
DEFAULT_MODEL,
requireApiKey,
parseArgs,
parseRgbColors,
parseRgbTriplet,
postChatCompletion,
extractImages,
saveImage,
defaultOutputPath,
} from "./lib.js";
Expand All @@ -12,57 +16,78 @@ const args = parseArgs(process.argv.slice(2));

const prompt = args.get("_0") as string | undefined;
if (!prompt) {
console.error("Usage: npx tsx generate.ts \"prompt\" [--model <id>] [--output <path>] [--aspect-ratio <r>] [--image-size <s>]");
console.error(
"Usage: npx tsx generate.ts \"prompt\" [--model <id>] [--output <path>]\n" +
" [--aspect-ratio <r>] [--image-size <s>]\n" +
" [--rgb-colors \"r,g,b[;r,g,b...]\"] [--background-rgb-color \"r,g,b\"]\n" +
" [--strength <0-1>]"
);
process.exit(1);
}

const model = (args.get("model") as string) || DEFAULT_MODEL;
const outputBase = (args.get("output") as string) || defaultOutputPath();
const aspectRatio = args.get("aspect-ratio") as string | undefined;
const imageSize = args.get("image-size") as string | undefined;
const rgbColorsRaw = args.get("rgb-colors") as string | undefined;
const bgColorRaw = args.get("background-rgb-color") as string | undefined;
const strengthRaw = args.get("strength") as string | undefined;

const imageConfig: Record<string, string> = {};
const imageConfig: Record<string, unknown> = {};
if (aspectRatio) imageConfig.aspect_ratio = aspectRatio;
if (imageSize) imageConfig.image_size = imageSize;
if (rgbColorsRaw) imageConfig.rgb_colors = parseRgbColors(rgbColorsRaw);
if (bgColorRaw) imageConfig.background_rgb_color = parseRgbTriplet(bgColorRaw);
if (strengthRaw) {
const s = parseFloat(strengthRaw);
if (isNaN(s) || s < 0 || s > 1) {
console.error("Error: --strength must be a number between 0 and 1.");
process.exit(1);
}
imageConfig.strength = s;
}

const body: any = {
model,
messages: [{ role: "user", content: prompt }],
modalities: ["image", "text"],
// Recraft and other non-Google models reject modalities:"image","text" with a 404.
// Google models require it for image output.
...(model.startsWith("google/") ? { modalities: ["image", "text"] } : {}),
...(Object.keys(imageConfig).length > 0 ? { image_config: imageConfig } : {}),
};

const json = await postChatCompletion(apiKey, body);
const message = json.choices?.[0]?.message;

if (!message) {
console.error("Error: No response from model.");
process.exit(1);
}

if (message.content) {
console.error(`Model: ${message.content}`);
const textContent = json.choices?.[0]?.message?.content;
if (textContent) {
console.error(`Model: ${textContent}`);
}

const images: string[] = message.images ?? [];
const images = extractImages(json);
if (images.length === 0) {
console.error("Error: No images returned by model.");
console.error("Response:", JSON.stringify(json, null, 2));
process.exit(1);
}

const saved: string[] = [];
for (let i = 0; i < images.length; i++) {
const dataUrl = images[i].startsWith("data:") ? images[i] : `data:image/png;base64,${images[i]}`;
const raw = images[i];
// Normalise: pass data: and https: URLs as-is; wrap bare base64 as PNG.
const imgData =
raw.startsWith("data:") || raw.startsWith("http://") || raw.startsWith("https://")
? raw
: `data:image/png;base64,${raw}`;

let outPath: string;
if (images.length === 1) {
outPath = outputBase;
} else {
const dotIdx = outputBase.lastIndexOf(".");
const base = dotIdx > 0 ? outputBase.slice(0, dotIdx) : outputBase;
const ext = dotIdx > 0 ? outputBase.slice(dotIdx) : ".png";
outPath = `${base}-${i + 1}${ext}`;
const currentExt = extname(outputBase);
const base = currentExt ? outputBase.slice(0, -currentExt.length) : outputBase;
outPath = `${base}-${i + 1}`;
}
const abs = saveImage(dataUrl, outPath);
const abs = await saveImage(imgData, outPath);
saved.push(abs);
}

Expand Down
Loading