ControlNet

ControlNet: Fine-Grained Control Over Image Generation

ControlNet is an advanced feature within the Scenario API that provides fine-grained control over the image generation process. It allows you to guide the AI with specific structural or compositional information extracted from an input image. This is particularly useful when you need to maintain the pose of a character, the depth of a scene, or the edges of an object while generating new content. This guide will explain ControlNet modalities, how to use them, and provide code examples.

What is ControlNet?

ControlNet is a neural network architecture that enables you to add spatial conditioning controls to large, pre-trained text-to-image diffusion models. Essentially, it allows you to provide an additional input image (a "control image") that dictates certain structural aspects of the generated output, ensuring that the AI adheres to specific visual constraints while still generating new content based on your textual prompt.

ControlNet Modalities

The Scenario API supports various ControlNet modalities, each designed to extract and utilize different types of structural information from your control image:

Flux Specific Modalities

ModalityDescriptionPrimary Use Case
blurUnblur an image. It enhances the sharpness and clarity of a given image.Useful for recovering details in blurred photos or refining image quality.
tileDesigned to generate an image that closely matches the structure and style of a reference tile.Ideal for creating high-resolution or detailed repetitive patterns.
grayColorize grayscale imagesAdds color to a black-and-white photo, making it an excellent tool for restoring old images or adding a creative touch to monochrome artwork.
low-qualityEnhances low-resolution or poorly detailed images by generating a high-quality outputCan transform a rough or pixelated source image into a refined, high-quality version.
sketchDesigned specifically for hand-drawn sketches or line art references.Use it to transform your sketches into rendered images with your own style models in a streamlined "Sketch-to-Render" workflow.

Stable Diffusion Specific Modalities

ModalityDescriptionPrimary Use Case
segBreaks down an image into semantic parts like background, clothing, etc.Great for precise control over regions and content types.
illusionCreates a warped or distorted version of the input image.Used to inspire abstract, surreal, or imaginative interpretations.
scribbleControl and guide image generation by drawing simple sketches or outlinesDesigned to interpret the basic shapes and patterns from your scribbles, using them as structural guides for the AI to create a more detailed, refined image.

Common Modalities

ModalityDescriptionPrimary Use Case
cannyGuides generation based on the edges detected in the input image. Useful for maintaining outlines and shapes.Recreating line art, transferring outlines from a sketch to a photorealistic image.
poseControls the pose of human figures based on a skeleton extracted from the input image.Generating characters in specific poses, animating characters consistently across frames.
depthInfluences the depth perception and spatial arrangement of objects in the generated image.Recreating scenes with accurate spatial relationships, generating images with consistent perspective.

Endpoint

ControlNet functionalities are typically integrated with txt2img or img2img endpoints. The specific endpoint will depend on whether you are starting with just a text prompt or an existing image.

For txt2img with ControlNet:
POST https://api.cloud.scenario.com/v1/generate/controlnet - API Reference

For img2img with ControlNet:
POST https://api.cloud.scenario.com/v1/generate/controlnet-img2img - API Reference

Request Body

The request body for ControlNet generation includes standard image generation parameters along with specific ControlNet parameters:

ParameterTypeDescriptionDefault
promptstringRequired. A textual description of the desired image content.
controlImagestringRequired. The controlnet input image as a data URL.
controlImageIdstringRequired. The controlnet input image as an AssetId. Will be ignored if the controlImage parameter is provided
imagestringThe input image as a data URL or the asset ID (example: "asset_GTrL3mq4SXWyMxkOHRxlpw")
modalitystringRequired. The ControlNet modality to use (e.g., canny, openpose, depth, normal).
strengthnumber(For controlnet-img2img) Determines how much the generated image deviates from the original input image. Range: 0.0 to 1.0.0.8
numSamplesintegerThe number of images to generate.1
guidancenumberThe guidance scale.7.5
numInferenceStepsintegerThe number of denoising steps.30
widthintegerThe width of the generated image in pixels.512
heightintegerThe height of the generated image in pixels.512
schedulerstringThe scheduler to use for the denoising process. Only for Stable Diffusion."EulerAncestralDiscrete"
modelIdstringRequired. The ID of the model to use for generation.
seedintegerA seed value for the random number generator.

Code Examples

Here are some examples of how to use ControlNet with the Scenario API.

Example: Maintaining a Character Pose (ControlNet with txt2img)

Suppose you have an image of a person in a specific pose, and you want to generate a new image of a different character in the exact same pose. You would use the original image as the controlImageId with the pose modality, 0.5 influence.

cURL

curl -X POST \
  -u "YOUR_API_KEY:YOUR_API_SECRET" \
  -H "Content-Type: application/json" \
  -d 
'{
    "prompt": "a superhero in a dynamic pose, comic book style",
    "controlImageId": "asset_GTrL3mq4SXWyMxkOHRxlpw",
    "modality": "pose:0.5",
    "numSamples": 1,
    "guidance": 3.5,
    "numInferenceSteps": 28,
    "modelId": "flux.1-dev"
}' \
  https://api.cloud.scenario.com/v1/generate/controlnet

Python

import requests
import time

api_key = "YOUR_API_KEY"
api_secret = "YOUR_API_SECRET"

# Step 1: Initiate ControlNet txt2img Generation
url = "https://api.cloud.scenario.com/v1/generate/controlnet"
headers = {"Content-Type": "application/json"}

payload = {
    "prompt": "a superhero in a dynamic pose, comic book style",
    "controlImageId": "asset_GTrL3mq4SXWyMxkOHRxlpw",
    "modality": "pose:0.5",
    "numSamples": 1,
    "guidance": 3.5,
    "numInferenceSteps": 28,
    "modelId": "flux.1-dev"
}

print("Initiating ControlNet txt2img generation...")
initial_response = requests.post(url, headers=headers, json=payload, auth=(api_key, api_secret))

if initial_response.status_code == 200:
    initial_data = initial_response.json()
    job_id = initial_data.get("job").get("jobId")
    if job_id:
        print(f"ControlNet txt2img generation job initiated. Job ID: {job_id}")
        
        # Step 2: Poll for Job Status
        polling_url = f"https://api.scenario.com/v1/jobs/{job_id}"
        status = "queued"
        while status not in ["success", "failure", "canceled"]:
            print(f"Polling job {job_id}... Current status: {status}")
            time.sleep(3) # Wait for 3 seconds before polling again
            
            polling_response = requests.get(polling_url, auth=(api_key, api_secret))
            if polling_response.status_code == 200:
                polling_data = polling_response.json()
                status = polling_data.get("job").get("status")
                progress = polling_data.get("job").get("progress", 0) * 100
                print(f"Progress: {progress:.2f}%")
                
                if status == "success":
                    asset_ids = polling_data.get("job").get("metadata").get("assetIds", [])
                    print(f"ControlNet txt2img generation completed! Asset IDs: {asset_ids}")
                elif status in ["failure", "canceled"]:
                    print(f"ControlNet txt2img generation failed or canceled: {polling_data.get("job").get("error")}")
            else:
                print(f"Error polling job status: {polling_response.status_code} - {polling_response.text}")
                break
    else:
        print("Error: No jobId returned in the initial response.")
else:
    print(f"Error initiating ControlNet txt2img generation: {initial_response.status_code} - {initial_response.text}")

Node.js

const fetch = require("node-fetch");

const apiKey = "YOUR_API_KEY";
const apiSecret = "YOUR_API_SECRET";

const credentials = Buffer.from(`${apiKey}:${apiSecret}`).toString("base64");

async function generateControlNetTxt2img() {
  const initialUrl = "https://api.cloud.scenario.com/v1/generate/controlnet";
  const headers = {
    "Content-Type": "application/json",
    Authorization: `Basic ${credentials}`,
  };

  const payload = {
    prompt: "a superhero in a dynamic pose, comic book style",
    controlImageId: "asset_GTrL3mq4SXWyMxkOHRxlpw",
    modality: "pose:0.5",
    numSamples: 1,
    guidance: 3.5,
    numInferenceSteps: 28,
    modelId: "flux.1-dev",
  };

  console.log("Initiating ControlNet txt2img generation...");
  try {
    const initialResponse = await fetch(initialUrl, {
      method: "POST",
      headers: headers,
      body: JSON.stringify(payload),
    });

    const initialData = await initialResponse.json();

    if (initialResponse.ok) {
      const jobId = initialData.job.jobId;
      if (jobId) {
        console.log(`ControlNet txt2img generation job initiated. Job ID: ${jobId}`);

        const pollingUrl = `https://api.scenario.com/v1/jobs/${jobId}`;
        let status = "queued";

        while (!["success", "failure", "canceled"].includes(status)) {
          console.log(`Polling job ${jobId}... Current status: ${status}`);
          await new Promise(resolve => setTimeout(resolve, 3000)); // Wait for 3 seconds

          const pollingResponse = await fetch(pollingUrl, {
            headers: { Authorization: `Basic ${credentials}` },
          });
          const pollingData = await pollingResponse.json();

          if (pollingResponse.ok) {
            status = pollingData.job.status;
            const progress = (pollingData.job.progress || 0) * 100;
            console.log(`Progress: ${progress.toFixed(2)}%`);

            if (status === "success") {
              const assetIds = pollingData.job.metadata.assetIds || [];
              console.log("ControlNet txt2img generation completed! Asset IDs:", assetIds);
            } else if (["failure", "canceled"].includes(status)) {
              console.error(`ControlNet txt2img generation failed or canceled: ${pollingData.job.error}`);
            }
          } else {
            console.error(`Error polling job status: ${pollingResponse.status} - ${JSON.stringify(pollingData)}`);
            break;
          }
        }
      } else {
        console.error("Error: No jobId returned in the initial response.");
      }
    } else {
      console.error(`Error initiating ControlNet txt2img generation: ${initialResponse.status} - ${JSON.stringify(initialData)}`);
    }
  } catch (error) {
    console.error("Network or other error:", error);
  }
}

generateControlNetTxt2img();

Example: Transferring Canny Edges (ControlNet with img2img)

If you want to apply the edge structure from one image to another, you can use canny modality with img2img.

cURL

curl -X POST \
  -u "YOUR_API_KEY:YOUR_API_SECRET" \
  -H "Content-Type: application/json" \
  -d 
'{
    "prompt": "a futuristic cityscape, neon lights",
    "imageUrl": "https://example.com/your-base-image.jpg",
    "controlImage": "https://example.com/your-canny-edge-image.jpg",
    "modality": "canny",
    "strength": 0.8,
    "numSamples": 1,
    "guidance": 7.5,
    "numInferenceSteps": 30,
    "modelId": "YOUR_MODEL_ID"
}' \
  https://api.cloud.scenario.com/v1/generate/controlnet-img2img

Python

import requests
import time

api_key = "YOUR_API_KEY"
api_secret = "YOUR_API_SECRET"

# Step 1: Initiate ControlNet img2img Generation
url = "https://api.cloud.scenario.com/v1/generate/controlnet-img2img"
headers = {"Content-Type": "application/json"}

payload = {
    "prompt": "a futuristic cityscape, neon lights",
    "imageUrl": "https://example.com/your-base-image.jpg",
    "controlImage": "https://example.com/your-canny-edge-image.jpg",
    "modality": "canny",
    "strength": 0.8,
    "numSamples": 1,
    "guidance": 7.5,
    "numInferenceSteps": 30,
    "modelId": "YOUR_MODEL_ID"
}

print("Initiating ControlNet img2img generation...")
initial_response = requests.post(url, headers=headers, json=payload, auth=(api_key, api_secret))

if initial_response.status_code == 200:
    initial_data = initial_response.json()
    job_id = initial_data.get("job").get("jobId")
    if job_id:
        print(f"ControlNet img2img generation job initiated. Job ID: {job_id}")
        
        # Step 2: Poll for Job Status
        polling_url = f"https://api.scenario.com/v1/jobs/{job_id}"
        status = "queued"
        while status not in ["success", "failure", "canceled"]:
            print(f"Polling job {job_id}... Current status: {status}")
            time.sleep(3) # Wait for 3 seconds before polling again
            
            polling_response = requests.get(polling_url, auth=(api_key, api_secret))
            if polling_response.status_code == 200:
                polling_data = polling_response.json()
                status = polling_data.get("job").get("status")
                progress = polling_data.get("job").get("progress", 0) * 100
                print(f"Progress: {progress:.2f}%")
                
                if status == "success":
                    asset_ids = polling_data.get("job").get("metadata").get("assetIds", [])
                    print(f"ControlNet img2img generation completed! Asset IDs: {asset_ids}")
                elif status in ["failure", "canceled"]:
                    print(f"ControlNet img2img generation failed or canceled: {polling_data.get("job").get("error")}")
            else:
                print(f"Error polling job status: {polling_response.status_code} - {polling_response.text}")
                break
    else:
        print("Error: No jobId returned in the initial response.")
else:
    print(f"Error initiating ControlNet img2img generation: {initial_response.status_code} - {initial_response.text}")

Node.js

const fetch = require("node-fetch");

const apiKey = "YOUR_API_KEY";
const apiSecret = "YOUR_API_SECRET";

const credentials = Buffer.from(`${apiKey}:${apiSecret}`).toString("base64");

async function generateControlNetImg2img() {
  const initialUrl = "https://api.cloud.scenario.com/v1/generate/controlnet-img2img";
  const headers = {
    "Content-Type": "application/json",
    Authorization: `Basic ${credentials}`,
  };

  const payload = {
    prompt: "a futuristic cityscape, neon lights",
    imageUrl: "https://example.com/your-base-image.jpg",
    controlImage: "https://example.com/your-canny-edge-image.jpg",
    modality: "canny",
    strength: 0.8,
    numSamples: 1,
    guidance: 7.5,
    numInferenceSteps: 30,
    modelId: "YOUR_MODEL_ID",
  };

  console.log("Initiating ControlNet img2img generation...");
  try {
    const initialResponse = await fetch(initialUrl, {
      method: "POST",
      headers: headers,
      body: JSON.stringify(payload),
    });

    const initialData = await initialResponse.json();

    if (initialResponse.ok) {
      const jobId = initialData.job.jobId;
      if (jobId) {
        console.log(`ControlNet img2img generation job initiated. Job ID: ${jobId}`);

        const pollingUrl = `https://api.scenario.com/v1/jobs/${jobId}`;
        let status = "queued";

        while (!["success", "failure", "canceled"].includes(status)) {
          console.log(`Polling job ${jobId}... Current status: ${status}`);
          await new Promise(resolve => setTimeout(resolve, 3000)); // Wait for 3 seconds

          const pollingResponse = await fetch(pollingUrl, {
            headers: { Authorization: `Basic ${credentials}` },
          });
          const pollingData = await pollingResponse.json();

          if (pollingResponse.ok) {
            status = pollingData.job.status;
            const progress = (pollingData.job.progress || 0) * 100;
            console.log(`Progress: ${progress.toFixed(2)}%`);

            if (status === "success") {
              const assetIds = pollingData.job.metadata.assetIds || [];
              console.log("ControlNet img2img generation completed! Asset IDs:", assetIds);
            } else if (["failure", "canceled"].includes(status)) {
              console.error(`ControlNet img2img generation failed or canceled: ${pollingData.job.error}`);
            }
          } else {
            console.error(`Error polling job status: ${pollingResponse.status} - ${JSON.stringify(pollingData)}`);
            break;
          }
        }
      } else {
        console.error("Error: No jobId returned in the initial response.");
      }
    } else {
      console.error(`Error initiating ControlNet img2img generation: ${initialResponse.status} - ${JSON.stringify(initialData)}`);
    }
  } catch (error) {
    console.error("Network or other error:", error);
  }
}

generateControlNetImg2img();

Response

An initial successful response will return a JSON object containing your job details:

{
  "job": {
    "jobId": "job_controlnet_123",
    "jobType": "controlnet",
    "metadata": {
      "input": {
        "prompt": "a superhero in a dynamic pose, comic book style",
        "controlImageId": "asset_GTrL3mq4SXWyMxkOHRxlpw",
        "modality": "pose:0.5",
        "numSamples": 1,
        "guidance": 3.5,
        "numInferenceSteps": 28,
        "modelId": "flux.1-dev"
      },
      "assetIds": []
    },
    "status": "queued",
    "progress": 0
  },
  "creativeUnitsCost": 5
}

Then you can poll the GET /jobs/{jobId} endpoint every 3 seconds until the job is complete (status: "success") and then extract the assetIds from the metadata field. (API Reference)

Upon successful completion, the polling response will include the assetIds:

{
  "job": {
    "jobId": "job_controlnet_123",
    "jobType": "flux",
    "metadata": {
      "input": {
        "prompt": "a superhero in a dynamic pose, comic book style",
        "controlImageId": "asset_GTrL3mq4SXWyMxkOHRxlpw",
				"type": "controlnet",
        "modality": "pose:0.5",
        "numSamples": 1,
        "guidance": 3.5,
        "numInferenceSteps": 28,
        "modelId": "flux.1-dev"
      },
      "assetIds": [
        "asset_controlnet_abc"
      ]
    },
    "status": "success",
    "progress": 1
  },
  "creativeUnitsCost": 5
}