ControlNet
ControlNet: Fine-Grained Control Over Image Generation
ControlNet is an advanced feature within the Scenario API that provides fine-grained control over the image generation process. It allows you to guide the AI with specific structural or compositional information extracted from an input image. This is particularly useful when you need to maintain the pose of a character, the depth of a scene, or the edges of an object while generating new content. This guide will explain ControlNet modalities, how to use them, and provide code examples.
What is ControlNet?
ControlNet is a neural network architecture that enables you to add spatial conditioning controls to large, pre-trained text-to-image diffusion models. Essentially, it allows you to provide an additional input image (a "control image") that dictates certain structural aspects of the generated output, ensuring that the AI adheres to specific visual constraints while still generating new content based on your textual prompt.
ControlNet Modalities
The Scenario API supports various ControlNet modalities, each designed to extract and utilize different types of structural information from your control image:
Flux Specific Modalities
Modality | Description | Primary Use Case |
---|---|---|
blur | Unblur an image. It enhances the sharpness and clarity of a given image. | Useful for recovering details in blurred photos or refining image quality. |
tile | Designed to generate an image that closely matches the structure and style of a reference tile. | Ideal for creating high-resolution or detailed repetitive patterns. |
gray | Colorize grayscale images | Adds color to a black-and-white photo, making it an excellent tool for restoring old images or adding a creative touch to monochrome artwork. |
low-quality | Enhances low-resolution or poorly detailed images by generating a high-quality output | Can transform a rough or pixelated source image into a refined, high-quality version. |
sketch | Designed specifically for hand-drawn sketches or line art references. | Use it to transform your sketches into rendered images with your own style models in a streamlined "Sketch-to-Render" workflow. |
Stable Diffusion Specific Modalities
Modality | Description | Primary Use Case |
---|---|---|
seg | Breaks down an image into semantic parts like background, clothing, etc. | Great for precise control over regions and content types. |
illusion | Creates a warped or distorted version of the input image. | Used to inspire abstract, surreal, or imaginative interpretations. |
scribble | Control and guide image generation by drawing simple sketches or outlines | Designed to interpret the basic shapes and patterns from your scribbles, using them as structural guides for the AI to create a more detailed, refined image. |
Common Modalities
Modality | Description | Primary Use Case |
---|---|---|
canny | Guides generation based on the edges detected in the input image. Useful for maintaining outlines and shapes. | Recreating line art, transferring outlines from a sketch to a photorealistic image. |
pose | Controls the pose of human figures based on a skeleton extracted from the input image. | Generating characters in specific poses, animating characters consistently across frames. |
depth | Influences the depth perception and spatial arrangement of objects in the generated image. | Recreating scenes with accurate spatial relationships, generating images with consistent perspective. |
Endpoint
ControlNet functionalities are typically integrated with txt2img
or img2img
endpoints. The specific endpoint will depend on whether you are starting with just a text prompt or an existing image.
For txt2img
with ControlNet:
POST https://api.cloud.scenario.com/v1/generate/controlnet
- API Reference
For img2img
with ControlNet:
POST https://api.cloud.scenario.com/v1/generate/controlnet-img2img
- API Reference
Request Body
The request body for ControlNet generation includes standard image generation parameters along with specific ControlNet parameters:
Parameter | Type | Description | Default |
---|---|---|---|
prompt | string | Required. A textual description of the desired image content. | |
controlImage | string | Required. The controlnet input image as a data URL. | |
controlImageId | string | Required. The controlnet input image as an AssetId. Will be ignored if the controlImage parameter is provided | |
image | string | The input image as a data URL or the asset ID (example: "asset_GTrL3mq4SXWyMxkOHRxlpw ") | |
modality | string | Required. The ControlNet modality to use (e.g., canny , openpose , depth , normal ). | |
strength | number | (For controlnet-img2img ) Determines how much the generated image deviates from the original input image. Range: 0.0 to 1.0. | 0.8 |
numSamples | integer | The number of images to generate. | 1 |
guidance | number | The guidance scale. | 7.5 |
numInferenceSteps | integer | The number of denoising steps. | 30 |
width | integer | The width of the generated image in pixels. | 512 |
height | integer | The height of the generated image in pixels. | 512 |
scheduler | string | The scheduler to use for the denoising process. Only for Stable Diffusion. | "EulerAncestralDiscrete" |
modelId | string | Required. The ID of the model to use for generation. | |
seed | integer | A seed value for the random number generator. |
Code Examples
Here are some examples of how to use ControlNet with the Scenario API.
Example: Maintaining a Character Pose (ControlNet with txt2img)
Suppose you have an image of a person in a specific pose, and you want to generate a new image of a different character in the exact same pose. You would use the original image as the controlImageId
with the pose
modality, 0.5
influence.
cURL
curl -X POST \
-u "YOUR_API_KEY:YOUR_API_SECRET" \
-H "Content-Type: application/json" \
-d
'{
"prompt": "a superhero in a dynamic pose, comic book style",
"controlImageId": "asset_GTrL3mq4SXWyMxkOHRxlpw",
"modality": "pose:0.5",
"numSamples": 1,
"guidance": 3.5,
"numInferenceSteps": 28,
"modelId": "flux.1-dev"
}' \
https://api.cloud.scenario.com/v1/generate/controlnet
Python
import requests
import time
api_key = "YOUR_API_KEY"
api_secret = "YOUR_API_SECRET"
# Step 1: Initiate ControlNet txt2img Generation
url = "https://api.cloud.scenario.com/v1/generate/controlnet"
headers = {"Content-Type": "application/json"}
payload = {
"prompt": "a superhero in a dynamic pose, comic book style",
"controlImageId": "asset_GTrL3mq4SXWyMxkOHRxlpw",
"modality": "pose:0.5",
"numSamples": 1,
"guidance": 3.5,
"numInferenceSteps": 28,
"modelId": "flux.1-dev"
}
print("Initiating ControlNet txt2img generation...")
initial_response = requests.post(url, headers=headers, json=payload, auth=(api_key, api_secret))
if initial_response.status_code == 200:
initial_data = initial_response.json()
job_id = initial_data.get("job").get("jobId")
if job_id:
print(f"ControlNet txt2img generation job initiated. Job ID: {job_id}")
# Step 2: Poll for Job Status
polling_url = f"https://api.scenario.com/v1/jobs/{job_id}"
status = "queued"
while status not in ["success", "failure", "canceled"]:
print(f"Polling job {job_id}... Current status: {status}")
time.sleep(3) # Wait for 3 seconds before polling again
polling_response = requests.get(polling_url, auth=(api_key, api_secret))
if polling_response.status_code == 200:
polling_data = polling_response.json()
status = polling_data.get("job").get("status")
progress = polling_data.get("job").get("progress", 0) * 100
print(f"Progress: {progress:.2f}%")
if status == "success":
asset_ids = polling_data.get("job").get("metadata").get("assetIds", [])
print(f"ControlNet txt2img generation completed! Asset IDs: {asset_ids}")
elif status in ["failure", "canceled"]:
print(f"ControlNet txt2img generation failed or canceled: {polling_data.get("job").get("error")}")
else:
print(f"Error polling job status: {polling_response.status_code} - {polling_response.text}")
break
else:
print("Error: No jobId returned in the initial response.")
else:
print(f"Error initiating ControlNet txt2img generation: {initial_response.status_code} - {initial_response.text}")
Node.js
const fetch = require("node-fetch");
const apiKey = "YOUR_API_KEY";
const apiSecret = "YOUR_API_SECRET";
const credentials = Buffer.from(`${apiKey}:${apiSecret}`).toString("base64");
async function generateControlNetTxt2img() {
const initialUrl = "https://api.cloud.scenario.com/v1/generate/controlnet";
const headers = {
"Content-Type": "application/json",
Authorization: `Basic ${credentials}`,
};
const payload = {
prompt: "a superhero in a dynamic pose, comic book style",
controlImageId: "asset_GTrL3mq4SXWyMxkOHRxlpw",
modality: "pose:0.5",
numSamples: 1,
guidance: 3.5,
numInferenceSteps: 28,
modelId: "flux.1-dev",
};
console.log("Initiating ControlNet txt2img generation...");
try {
const initialResponse = await fetch(initialUrl, {
method: "POST",
headers: headers,
body: JSON.stringify(payload),
});
const initialData = await initialResponse.json();
if (initialResponse.ok) {
const jobId = initialData.job.jobId;
if (jobId) {
console.log(`ControlNet txt2img generation job initiated. Job ID: ${jobId}`);
const pollingUrl = `https://api.scenario.com/v1/jobs/${jobId}`;
let status = "queued";
while (!["success", "failure", "canceled"].includes(status)) {
console.log(`Polling job ${jobId}... Current status: ${status}`);
await new Promise(resolve => setTimeout(resolve, 3000)); // Wait for 3 seconds
const pollingResponse = await fetch(pollingUrl, {
headers: { Authorization: `Basic ${credentials}` },
});
const pollingData = await pollingResponse.json();
if (pollingResponse.ok) {
status = pollingData.job.status;
const progress = (pollingData.job.progress || 0) * 100;
console.log(`Progress: ${progress.toFixed(2)}%`);
if (status === "success") {
const assetIds = pollingData.job.metadata.assetIds || [];
console.log("ControlNet txt2img generation completed! Asset IDs:", assetIds);
} else if (["failure", "canceled"].includes(status)) {
console.error(`ControlNet txt2img generation failed or canceled: ${pollingData.job.error}`);
}
} else {
console.error(`Error polling job status: ${pollingResponse.status} - ${JSON.stringify(pollingData)}`);
break;
}
}
} else {
console.error("Error: No jobId returned in the initial response.");
}
} else {
console.error(`Error initiating ControlNet txt2img generation: ${initialResponse.status} - ${JSON.stringify(initialData)}`);
}
} catch (error) {
console.error("Network or other error:", error);
}
}
generateControlNetTxt2img();
Example: Transferring Canny Edges (ControlNet with img2img)
If you want to apply the edge structure from one image to another, you can use canny
modality with img2img
.
cURL
curl -X POST \
-u "YOUR_API_KEY:YOUR_API_SECRET" \
-H "Content-Type: application/json" \
-d
'{
"prompt": "a futuristic cityscape, neon lights",
"imageUrl": "https://example.com/your-base-image.jpg",
"controlImage": "https://example.com/your-canny-edge-image.jpg",
"modality": "canny",
"strength": 0.8,
"numSamples": 1,
"guidance": 7.5,
"numInferenceSteps": 30,
"modelId": "YOUR_MODEL_ID"
}' \
https://api.cloud.scenario.com/v1/generate/controlnet-img2img
Python
import requests
import time
api_key = "YOUR_API_KEY"
api_secret = "YOUR_API_SECRET"
# Step 1: Initiate ControlNet img2img Generation
url = "https://api.cloud.scenario.com/v1/generate/controlnet-img2img"
headers = {"Content-Type": "application/json"}
payload = {
"prompt": "a futuristic cityscape, neon lights",
"imageUrl": "https://example.com/your-base-image.jpg",
"controlImage": "https://example.com/your-canny-edge-image.jpg",
"modality": "canny",
"strength": 0.8,
"numSamples": 1,
"guidance": 7.5,
"numInferenceSteps": 30,
"modelId": "YOUR_MODEL_ID"
}
print("Initiating ControlNet img2img generation...")
initial_response = requests.post(url, headers=headers, json=payload, auth=(api_key, api_secret))
if initial_response.status_code == 200:
initial_data = initial_response.json()
job_id = initial_data.get("job").get("jobId")
if job_id:
print(f"ControlNet img2img generation job initiated. Job ID: {job_id}")
# Step 2: Poll for Job Status
polling_url = f"https://api.scenario.com/v1/jobs/{job_id}"
status = "queued"
while status not in ["success", "failure", "canceled"]:
print(f"Polling job {job_id}... Current status: {status}")
time.sleep(3) # Wait for 3 seconds before polling again
polling_response = requests.get(polling_url, auth=(api_key, api_secret))
if polling_response.status_code == 200:
polling_data = polling_response.json()
status = polling_data.get("job").get("status")
progress = polling_data.get("job").get("progress", 0) * 100
print(f"Progress: {progress:.2f}%")
if status == "success":
asset_ids = polling_data.get("job").get("metadata").get("assetIds", [])
print(f"ControlNet img2img generation completed! Asset IDs: {asset_ids}")
elif status in ["failure", "canceled"]:
print(f"ControlNet img2img generation failed or canceled: {polling_data.get("job").get("error")}")
else:
print(f"Error polling job status: {polling_response.status_code} - {polling_response.text}")
break
else:
print("Error: No jobId returned in the initial response.")
else:
print(f"Error initiating ControlNet img2img generation: {initial_response.status_code} - {initial_response.text}")
Node.js
const fetch = require("node-fetch");
const apiKey = "YOUR_API_KEY";
const apiSecret = "YOUR_API_SECRET";
const credentials = Buffer.from(`${apiKey}:${apiSecret}`).toString("base64");
async function generateControlNetImg2img() {
const initialUrl = "https://api.cloud.scenario.com/v1/generate/controlnet-img2img";
const headers = {
"Content-Type": "application/json",
Authorization: `Basic ${credentials}`,
};
const payload = {
prompt: "a futuristic cityscape, neon lights",
imageUrl: "https://example.com/your-base-image.jpg",
controlImage: "https://example.com/your-canny-edge-image.jpg",
modality: "canny",
strength: 0.8,
numSamples: 1,
guidance: 7.5,
numInferenceSteps: 30,
modelId: "YOUR_MODEL_ID",
};
console.log("Initiating ControlNet img2img generation...");
try {
const initialResponse = await fetch(initialUrl, {
method: "POST",
headers: headers,
body: JSON.stringify(payload),
});
const initialData = await initialResponse.json();
if (initialResponse.ok) {
const jobId = initialData.job.jobId;
if (jobId) {
console.log(`ControlNet img2img generation job initiated. Job ID: ${jobId}`);
const pollingUrl = `https://api.scenario.com/v1/jobs/${jobId}`;
let status = "queued";
while (!["success", "failure", "canceled"].includes(status)) {
console.log(`Polling job ${jobId}... Current status: ${status}`);
await new Promise(resolve => setTimeout(resolve, 3000)); // Wait for 3 seconds
const pollingResponse = await fetch(pollingUrl, {
headers: { Authorization: `Basic ${credentials}` },
});
const pollingData = await pollingResponse.json();
if (pollingResponse.ok) {
status = pollingData.job.status;
const progress = (pollingData.job.progress || 0) * 100;
console.log(`Progress: ${progress.toFixed(2)}%`);
if (status === "success") {
const assetIds = pollingData.job.metadata.assetIds || [];
console.log("ControlNet img2img generation completed! Asset IDs:", assetIds);
} else if (["failure", "canceled"].includes(status)) {
console.error(`ControlNet img2img generation failed or canceled: ${pollingData.job.error}`);
}
} else {
console.error(`Error polling job status: ${pollingResponse.status} - ${JSON.stringify(pollingData)}`);
break;
}
}
} else {
console.error("Error: No jobId returned in the initial response.");
}
} else {
console.error(`Error initiating ControlNet img2img generation: ${initialResponse.status} - ${JSON.stringify(initialData)}`);
}
} catch (error) {
console.error("Network or other error:", error);
}
}
generateControlNetImg2img();
Response
An initial successful response will return a JSON object containing your job details:
{
"job": {
"jobId": "job_controlnet_123",
"jobType": "controlnet",
"metadata": {
"input": {
"prompt": "a superhero in a dynamic pose, comic book style",
"controlImageId": "asset_GTrL3mq4SXWyMxkOHRxlpw",
"modality": "pose:0.5",
"numSamples": 1,
"guidance": 3.5,
"numInferenceSteps": 28,
"modelId": "flux.1-dev"
},
"assetIds": []
},
"status": "queued",
"progress": 0
},
"creativeUnitsCost": 5
}
Then you can poll the GET /jobs/{jobId}
endpoint every 3 seconds until the job is complete (status: "success") and then extract the assetIds
from the metadata
field. (API Reference)
Upon successful completion, the polling response will include the assetIds
:
{
"job": {
"jobId": "job_controlnet_123",
"jobType": "flux",
"metadata": {
"input": {
"prompt": "a superhero in a dynamic pose, comic book style",
"controlImageId": "asset_GTrL3mq4SXWyMxkOHRxlpw",
"type": "controlnet",
"modality": "pose:0.5",
"numSamples": 1,
"guidance": 3.5,
"numInferenceSteps": 28,
"modelId": "flux.1-dev"
},
"assetIds": [
"asset_controlnet_abc"
]
},
"status": "success",
"progress": 1
},
"creativeUnitsCost": 5
}
Updated 21 days ago