ControlNet: Fine-Grained Control Over Image Generation

ControlNet is an advanced feature within the Scenario API that provides fine-grained control over the image generation process. It allows you to guide the AI with specific structural or compositional information extracted from an input image. This is particularly useful when you need to maintain the pose of a character, the depth of a scene, or the edges of an object while generating new content. This guide will explain ControlNet modalities, how to use them, and provide code examples.

What is ControlNet?

ControlNet is a neural network architecture that enables you to add spatial conditioning controls to large, pre-trained text-to-image diffusion models. Essentially, it allows you to provide an additional input image (a "control image") that dictates certain structural aspects of the generated output, ensuring that the AI adheres to specific visual constraints while still generating new content based on your textual prompt.

ControlNet Modalities

The Scenario API supports various ControlNet modalities, each designed to extract and utilize different types of structural information from your control image:

Flux Specific Modalities

Modality	Description	Primary Use Case
`blur`	Unblur an image. It enhances the sharpness and clarity of a given image.	Useful for recovering details in blurred photos or refining image quality.
`tile`	Designed to generate an image that closely matches the structure and style of a reference tile.	Ideal for creating high-resolution or detailed repetitive patterns.
`gray`	Colorize grayscale images	Adds color to a black-and-white photo, making it an excellent tool for restoring old images or adding a creative touch to monochrome artwork.
`low-quality`	Enhances low-resolution or poorly detailed images by generating a high-quality output	Can transform a rough or pixelated source image into a refined, high-quality version.
`sketch`	Designed specifically for hand-drawn sketches or line art references.	Use it to transform your sketches into rendered images with your own style models in a streamlined "Sketch-to-Render" workflow.

Stable Diffusion Specific Modalities

Modality	Description	Primary Use Case
`seg`	Breaks down an image into semantic parts like background, clothing, etc.	Great for precise control over regions and content types.
`illusion`	Creates a warped or distorted version of the input image.	Used to inspire abstract, surreal, or imaginative interpretations.
`scribble`	Control and guide image generation by drawing simple sketches or outlines	Designed to interpret the basic shapes and patterns from your scribbles, using them as structural guides for the AI to create a more detailed, refined image.

Common Modalities

Modality	Description	Primary Use Case
`canny`	Guides generation based on the edges detected in the input image. Useful for maintaining outlines and shapes.	Recreating line art, transferring outlines from a sketch to a photorealistic image.
`pose`	Controls the pose of human figures based on a skeleton extracted from the input image.	Generating characters in specific poses, animating characters consistently across frames.
`depth`	Influences the depth perception and spatial arrangement of objects in the generated image.	Recreating scenes with accurate spatial relationships, generating images with consistent perspective.

Endpoint

ControlNet functionalities are typically integrated with txt2img or img2img endpoints. The specific endpoint will depend on whether you are starting with just a text prompt or an existing image.

For txt2img with ControlNet:
POST https://api.cloud.scenario.com/v1/generate/controlnet - API Reference

For img2img with ControlNet:
POST https://api.cloud.scenario.com/v1/generate/controlnet-img2img - API Reference

Request Body

The request body for ControlNet generation includes standard image generation parameters along with specific ControlNet parameters:

Parameter	Type	Description	Default
`prompt`	string	Required. A textual description of the desired image content.
`controlImage`	string	Required. The controlnet input image as a data URL.
`controlImageId`	string	Required. The controlnet input image as an AssetId. Will be ignored if the `controlImage` parameter is provided
`image`	string	The input image as a data URL or the asset ID (example: "`asset_GTrL3mq4SXWyMxkOHRxlpw`")
`modality`	string	Required. The ControlNet modality to use (e.g., `canny`, `openpose`, `depth`, `normal`).
`strength`	number	(For `controlnet-img2img`) Determines how much the generated image deviates from the original input image. Range: 0.0 to 1.0.	0.8
`numSamples`	integer	The number of images to generate.	1
`guidance`	number	The guidance scale.	7.5
`numInferenceSteps`	integer	The number of denoising steps.	30
`width`	integer	The width of the generated image in pixels.	512
`height`	integer	The height of the generated image in pixels.	512
`scheduler`	string	The scheduler to use for the denoising process. Only for Stable Diffusion.	"EulerAncestralDiscrete"
`modelId`	string	Required. The ID of the model to use for generation.
`seed`	integer	A seed value for the random number generator.

Code Examples

Here are some examples of how to use ControlNet with the Scenario API.

Example: Maintaining a Character Pose (ControlNet with txt2img)

Suppose you have an image of a person in a specific pose, and you want to generate a new image of a different character in the exact same pose. You would use the original image as the controlImageId with the pose modality, 0.5 influence.

cURL

curl -X POST \
  -u "YOUR_API_KEY:YOUR_API_SECRET" \
  -H "Content-Type: application/json" \
  -d 
'{
    "prompt": "a superhero in a dynamic pose, comic book style",
    "controlImageId": "asset_GTrL3mq4SXWyMxkOHRxlpw",
    "modality": "pose:0.5",
    "numSamples": 1,
    "guidance": 3.5,
    "numInferenceSteps": 28,
    "modelId": "flux.1-dev"
}' \
  https://api.cloud.scenario.com/v1/generate/controlnet

Python

import requests
import time

api_key = "YOUR_API_KEY"
api_secret = "YOUR_API_SECRET"

# Step 1: Initiate ControlNet txt2img Generation
url = "https://api.cloud.scenario.com/v1/generate/controlnet"
headers = {"Content-Type": "application/json"}

payload = {
    "prompt": "a superhero in a dynamic pose, comic book style",
    "controlImageId": "asset_GTrL3mq4SXWyMxkOHRxlpw",
    "modality": "pose:0.5",
    "numSamples": 1,
    "guidance": 3.5,
    "numInferenceSteps": 28,
    "modelId": "flux.1-dev"
}

print("Initiating ControlNet txt2img generation...")
initial_response = requests.post(url, headers=headers, json=payload, auth=(api_key, api_secret))

if initial_response.status_code == 200:
    initial_data = initial_response.json()
    job_id = initial_data.get("job").get("jobId")
    if job_id:
        print(f"ControlNet txt2img generation job initiated. Job ID: {job_id}")
        
        # Step 2: Poll for Job Status
        polling_url = f"https://api.scenario.com/v1/jobs/{job_id}"
        status = "queued"
        while status not in ["success", "failure", "canceled"]:
            print(f"Polling job {job_id}... Current status: {status}")
            time.sleep(3) # Wait for 3 seconds before polling again
            
            polling_response = requests.get(polling_url, auth=(api_key, api_secret))
            if polling_response.status_code == 200:
                polling_data = polling_response.json()
                status = polling_data.get("job").get("status")
                progress = polling_data.get("job").get("progress", 0) * 100
                print(f"Progress: {progress:.2f}%")
                
                if status == "success":
                    asset_ids = polling_data.get("job").get("metadata").get("assetIds", [])
                    print(f"ControlNet txt2img generation completed! Asset IDs: {asset_ids}")
                elif status in ["failure", "canceled"]:
                    print(f"ControlNet txt2img generation failed or canceled: {polling_data.get("job").get("error")}")
            else:
                print(f"Error polling job status: {polling_response.status_code} - {polling_response.text}")
                break
    else:
        print("Error: No jobId returned in the initial response.")
else:
    print(f"Error initiating ControlNet txt2img generation: {initial_response.status_code} - {initial_response.text}")

Node.js

const fetch = require("node-fetch");

const apiKey = "YOUR_API_KEY";
const apiSecret = "YOUR_API_SECRET";

const credentials = Buffer.from(`${apiKey}:${apiSecret}`).toString("base64");

async function generateControlNetTxt2img() {
  const initialUrl = "https://api.cloud.scenario.com/v1/generate/controlnet";
  const headers = {
    "Content-Type": "application/json",
    Authorization: `Basic ${credentials}`,
  };

  const payload = {
    prompt: "a superhero in a dynamic pose, comic book style",
    controlImageId: "asset_GTrL3mq4SXWyMxkOHRxlpw",
    modality: "pose:0.5",
    numSamples: 1,
    guidance: 3.5,
    numInferenceSteps: 28,
    modelId: "flux.1-dev",
  };

  console.log("Initiating ControlNet txt2img generation...");
  try {
    const initialResponse = await fetch(initialUrl, {
      method: "POST",
      headers: headers,
      body: JSON.stringify(payload),
    });

    const initialData = await initialResponse.json();

    if (initialResponse.ok) {
      const jobId = initialData.job.jobId;
      if (jobId) {
        console.log(`ControlNet txt2img generation job initiated. Job ID: ${jobId}`);

        const pollingUrl = `https://api.scenario.com/v1/jobs/${jobId}`;
        let status = "queued";

        while (!["success", "failure", "canceled"].includes(status)) {
          console.log(`Polling job ${jobId}... Current status: ${status}`);
          await new Promise(resolve => setTimeout(resolve, 3000)); // Wait for 3 seconds

          const pollingResponse = await fetch(pollingUrl, {
            headers: { Authorization: `Basic ${credentials}` },
          });
          const pollingData = await pollingResponse.json();

          if (pollingResponse.ok) {
            status = pollingData.job.status;
            const progress = (pollingData.job.progress || 0) * 100;
            console.log(`Progress: ${progress.toFixed(2)}%`);

            if (status === "success") {
              const assetIds = pollingData.job.metadata.assetIds || [];
              console.log("ControlNet txt2img generation completed! Asset IDs:", assetIds);
            } else if (["failure", "canceled"].includes(status)) {
              console.error(`ControlNet txt2img generation failed or canceled: ${pollingData.job.error}`);
            }
          } else {
            console.error(`Error polling job status: ${pollingResponse.status} - ${JSON.stringify(pollingData)}`);
            break;
          }
        }
      } else {
        console.error("Error: No jobId returned in the initial response.");
      }
    } else {
      console.error(`Error initiating ControlNet txt2img generation: ${initialResponse.status} - ${JSON.stringify(initialData)}`);
    }
  } catch (error) {
    console.error("Network or other error:", error);
  }
}

generateControlNetTxt2img();

Example: Transferring Canny Edges (ControlNet with img2img)

If you want to apply the edge structure from one image to another, you can use canny modality with img2img.

cURL

curl -X POST \
  -u "YOUR_API_KEY:YOUR_API_SECRET" \
  -H "Content-Type: application/json" \
  -d 
'{
    "prompt": "a futuristic cityscape, neon lights",
    "imageUrl": "https://example.com/your-base-image.jpg",
    "controlImage": "https://example.com/your-canny-edge-image.jpg",
    "modality": "canny",
    "strength": 0.8,
    "numSamples": 1,
    "guidance": 7.5,
    "numInferenceSteps": 30,
    "modelId": "YOUR_MODEL_ID"
}' \
  https://api.cloud.scenario.com/v1/generate/controlnet-img2img

Python

import requests
import time

api_key = "YOUR_API_KEY"
api_secret = "YOUR_API_SECRET"

# Step 1: Initiate ControlNet img2img Generation
url = "https://api.cloud.scenario.com/v1/generate/controlnet-img2img"
headers = {"Content-Type": "application/json"}

payload = {
    "prompt": "a futuristic cityscape, neon lights",
    "imageUrl": "https://example.com/your-base-image.jpg",
    "controlImage": "https://example.com/your-canny-edge-image.jpg",
    "modality": "canny",
    "strength": 0.8,
    "numSamples": 1,
    "guidance": 7.5,
    "numInferenceSteps": 30,
    "modelId": "YOUR_MODEL_ID"
}

print("Initiating ControlNet img2img generation...")
initial_response = requests.post(url, headers=headers, json=payload, auth=(api_key, api_secret))

if initial_response.status_code == 200:
    initial_data = initial_response.json()
    job_id = initial_data.get("job").get("jobId")
    if job_id:
        print(f"ControlNet img2img generation job initiated. Job ID: {job_id}")
        
        # Step 2: Poll for Job Status
        polling_url = f"https://api.scenario.com/v1/jobs/{job_id}"
        status = "queued"
        while status not in ["success", "failure", "canceled"]:
            print(f"Polling job {job_id}... Current status: {status}")
            time.sleep(3) # Wait for 3 seconds before polling again
            
            polling_response = requests.get(polling_url, auth=(api_key, api_secret))
            if polling_response.status_code == 200:
                polling_data = polling_response.json()
                status = polling_data.get("job").get("status")
                progress = polling_data.get("job").get("progress", 0) * 100
                print(f"Progress: {progress:.2f}%")
                
                if status == "success":
                    asset_ids = polling_data.get("job").get("metadata").get("assetIds", [])
                    print(f"ControlNet img2img generation completed! Asset IDs: {asset_ids}")
                elif status in ["failure", "canceled"]:
                    print(f"ControlNet img2img generation failed or canceled: {polling_data.get("job").get("error")}")
            else:
                print(f"Error polling job status: {polling_response.status_code} - {polling_response.text}")
                break
    else:
        print("Error: No jobId returned in the initial response.")
else:
    print(f"Error initiating ControlNet img2img generation: {initial_response.status_code} - {initial_response.text}")

Node.js

const fetch = require("node-fetch");

const apiKey = "YOUR_API_KEY";
const apiSecret = "YOUR_API_SECRET";

const credentials = Buffer.from(`${apiKey}:${apiSecret}`).toString("base64");

async function generateControlNetImg2img() {
  const initialUrl = "https://api.cloud.scenario.com/v1/generate/controlnet-img2img";
  const headers = {
    "Content-Type": "application/json",
    Authorization: `Basic ${credentials}`,
  };

  const payload = {
    prompt: "a futuristic cityscape, neon lights",
    imageUrl: "https://example.com/your-base-image.jpg",
    controlImage: "https://example.com/your-canny-edge-image.jpg",
    modality: "canny",
    strength: 0.8,
    numSamples: 1,
    guidance: 7.5,
    numInferenceSteps: 30,
    modelId: "YOUR_MODEL_ID",
  };

  console.log("Initiating ControlNet img2img generation...");
  try {
    const initialResponse = await fetch(initialUrl, {
      method: "POST",
      headers: headers,
      body: JSON.stringify(payload),
    });

    const initialData = await initialResponse.json();

    if (initialResponse.ok) {
      const jobId = initialData.job.jobId;
      if (jobId) {
        console.log(`ControlNet img2img generation job initiated. Job ID: ${jobId}`);

        const pollingUrl = `https://api.scenario.com/v1/jobs/${jobId}`;
        let status = "queued";

        while (!["success", "failure", "canceled"].includes(status)) {
          console.log(`Polling job ${jobId}... Current status: ${status}`);
          await new Promise(resolve => setTimeout(resolve, 3000)); // Wait for 3 seconds

          const pollingResponse = await fetch(pollingUrl, {
            headers: { Authorization: `Basic ${credentials}` },
          });
          const pollingData = await pollingResponse.json();

          if (pollingResponse.ok) {
            status = pollingData.job.status;
            const progress = (pollingData.job.progress || 0) * 100;
            console.log(`Progress: ${progress.toFixed(2)}%`);

            if (status === "success") {
              const assetIds = pollingData.job.metadata.assetIds || [];
              console.log("ControlNet img2img generation completed! Asset IDs:", assetIds);
            } else if (["failure", "canceled"].includes(status)) {
              console.error(`ControlNet img2img generation failed or canceled: ${pollingData.job.error}`);
            }
          } else {
            console.error(`Error polling job status: ${pollingResponse.status} - ${JSON.stringify(pollingData)}`);
            break;
          }
        }
      } else {
        console.error("Error: No jobId returned in the initial response.");
      }
    } else {
      console.error(`Error initiating ControlNet img2img generation: ${initialResponse.status} - ${JSON.stringify(initialData)}`);
    }
  } catch (error) {
    console.error("Network or other error:", error);
  }
}

generateControlNetImg2img();

Response

An initial successful response will return a JSON object containing your job details:

{
  "job": {
    "jobId": "job_controlnet_123",
    "jobType": "controlnet",
    "metadata": {
      "input": {
        "prompt": "a superhero in a dynamic pose, comic book style",
        "controlImageId": "asset_GTrL3mq4SXWyMxkOHRxlpw",
        "modality": "pose:0.5",
        "numSamples": 1,
        "guidance": 3.5,
        "numInferenceSteps": 28,
        "modelId": "flux.1-dev"
      },
      "assetIds": []
    },
    "status": "queued",
    "progress": 0
  },
  "creativeUnitsCost": 5
}

Then you can poll the GET /jobs/{jobId} endpoint every 3 seconds until the job is complete (status: "success") and then extract the assetIds from the metadata field. (API Reference)

Upon successful completion, the polling response will include the assetIds:

{
  "job": {
    "jobId": "job_controlnet_123",
    "jobType": "flux",
    "metadata": {
      "input": {
        "prompt": "a superhero in a dynamic pose, comic book style",
        "controlImageId": "asset_GTrL3mq4SXWyMxkOHRxlpw",
				"type": "controlnet",
        "modality": "pose:0.5",
        "numSamples": 1,
        "guidance": 3.5,
        "numInferenceSteps": 28,
        "modelId": "flux.1-dev"
      },
      "assetIds": [
        "asset_controlnet_abc"
      ]
    },
    "status": "success",
    "progress": 1
  },
  "creativeUnitsCost": 5
}