Edit Images with Prompts
The "Edit with Prompts" feature in the Scenario API empowers you to modify images using natural language instructions. This advanced capability leverages state-of-the-art AI models to interpret your textual descriptions and apply the desired changes directly to your images. This guide will walk you through the process of using the "Edit with Prompts" API, detailing its capabilities, the underlying AI models, necessary parameters, and providing code examples to help you integrate this powerful editing tool into your applications.
🚀 Key Concepts
-
Prompt-Based Editing: The core idea is to describe the desired image modifications using clear and concise text prompts, rather than traditional graphical editing tools.
-
AI Models for Editing: The Scenario API utilizes various specialized AI models, such as GPT-Image, Gemini 2.0 Flash, Flux.1 Kontext, and Runway Gen-4. Each model has unique strengths and is suited for different types of edits:
- Gemini 2.0 Flash: Ideal for precise edits, adjusting small details, or swapping specific elements while largely preserving the original image's integrity.
- GPT-Image: Excellent for more creative or broad transformations, such as changing character proportions or restyling an entire scene. Note that it might sometimes cause the original style to drift.
- Flux.1 Kontext: Offers a good balance, with less tendency for style drift compared to GPT-Image. It's faster and more cost-effective, though it might not handle highly complex transformations as effectively.
- Runway Gen-4: Generally superior in quality and consistency, especially when combining multiple input images. It excels at producing coherent and polished results with minimal style drift.
-
Reference Image: The primary image you wish to modify. This can be an existing asset in your Scenario workspace or a new image provided as a Data URL.
-
Additional Reference Images: Supplementary images that can be provided to guide the AI model, influencing aspects like style, composition, or content. The number of additional images supported varies by the chosen AI model.
-
Masking: For highly controlled edits, you can provide a mask to specify the exact areas of the image that should be modified. The black areas of the mask indicate regions to be replaced, while filled areas are preserved. Only available for the
gpt-image-1
model. Will be ignored for other models. -
Asynchronous Processing: Image editing with prompts is an asynchronous operation. Upon initiating an edit, the API returns a
jobId
, which you then use to poll for the status and results of your editing task.
⚡️ Editing Workflow
The general workflow for editing an image with prompts involves the following steps:
- Prepare Your Image and Prompt: Identify the image you want to edit and formulate a clear text prompt describing the desired changes.
- Initiate Prompt-Based Editing: Make an API request to the dedicated endpoint, providing your image, prompt, and any additional parameters.
- Monitor Job Status: Periodically check the status of your editing job until it is complete.
- Retrieve Edited Image: Once the job is successful, retrieve the modified image.
Let's explore each step in detail.
1. Initiate Prompt-Based Editing
To begin editing an image with a prompt, you will make a POST
request to the /v1/generate/prompt-editing
endpoint. In the request body, you will specify the image to be edited, your text prompt, and optionally, additional reference images and various parameters to fine-tune the editing process.
Endpoint:
POST https://api.cloud.scenario.com/v1/generate/prompt-editing
Request Body Parameters:
Parameter | Type | Description | Required |
---|---|---|---|
image | string | The image to edit. This can be an existing AssetId (e.g., "asset_GTrL3mq4SXWyMxkOHRxlpw" ) or a Data URL (e.g., "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVQYV2NgYAAAAAMAAWgmWQ0AAAAASUVORK5CYII=" ). | Yes |
prompt | string | The natural language instruction describing the desired edits to the image. Be as specific and clear as possible. | Yes |
modelId | string | The AI model to use for editing. Options include "gemini-2.0-flash" , "gpt-image-1" , "flux-kontext" , or "runway-gen4-image" . Defaults to "gemini-2.0-flash" . | No |
referenceImages | array of strings | A list of additional reference images. These can be Data URLs or AssetId s. The number of allowed images depends on the modelId selected (e.g., 5 for Gemini/GPT-Image, 3 for Flux-Kontext, 2 for Runway Gen-4). | No |
mask | string | A mask image (as AssetId or Data URL) to indicate specific areas for editing. Black areas will be replaced, while filled areas are kept. Only available for the gpt-image-1 model. | No |
numSamples | number | The number of variations of the edited image to generate. The maximum number depends on your subscription tier. | No |
aspectRatio | string | The aspect ratio of the generated image(s). Options include "auto" , "1:1" , "3:2" , "2:3" , "4:3" , "3:4" , "9:16" , "16:9" . Defaults to "auto" . Availability varies by modelId . | No |
quality | string | The quality of the generated image(s). Options include "high" , "standard" . Defaults to "high" . Availability varies by modelId . | No |
inputFidelity | string | When set to high, it preserves image details, ideal for faces or logos. The first image gets the finest textures, so place key elements there. Available only for gpt-image-1 . | |
seed | number | A seed value for the random number generator to ensure reproducibility. Only available for the flux-kontext model. | No |
guidanceScale | number | Controls how closely the generated image adheres to the prompt. Only available for the flux-kontext model. | No |
format | string | The output format of the generated image(s). Options include "png" , "jpeg" , "webp" . Defaults to "png" . Only available for the gpt-image-1 model. | No |
compression | number | The compression level (0-100%) for the generated images. Only available for the gpt-image-1 model with webp or jpeg output formats. Defaults to 100. | No |
Example Request (Python):
import requests
api_key = "YOUR_API_KEY"
api_secret = "YOUR_API_SECRET"
url = "https://api.cloud.scenario.com/v1/generate/prompt-editing"
headers = {"Authorization": f"Basic {requests.utils.b64encode(f'{api_key}:{api_secret}'.encode()).decode()}"}
payload = {
"image": "asset_your_image_id", # Replace with your image AssetId or Data URL
"prompt": "Change the background to a futuristic cityscape at night",
"modelId": "gpt-image-1", # Or "gemini-2.0-flash", "flux-kontext", "runway-gen4-image"
"numSamples": 1,
"aspectRatio": "16:9"
}
response = requests.post(url, headers=headers, json=payload)
if response.status_code == 200:
job_data = response.json()
job_id = job_data["job"]["jobId"]
print(f"Image editing job launched successfully! Job ID: {job_id}")
else:
print(f"Error launching image editing job: {response.status_code} - {response.text}")
Upon successful initiation, the API will return a jobId
. You will use this jobId
to monitor the progress of your image editing task.
2. Monitor Job Status
Similar to other asynchronous operations in the Scenario API, you need to poll the API to check the status of your image editing job. You will make GET
requests to the /v1/jobs/{jobId}
endpoint until the job reaches a final status (e.g., success
, failure
, or canceled
).
Endpoint:
GET https://api.cloud.scenario.com/v1/jobs/{jobId}
Path Parameters:
Parameter | Type | Description | Required |
---|---|---|---|
jobId | string | The ID of the image editing job. | Yes |
Example Request (Python):
import time
import requests
# Assuming job_id is obtained from the editing initiation step
# job_id = "your_job_id"
api_key = "YOUR_API_KEY"
api_secret = "YOUR_API_SECRET"
url = f"https://api.cloud.scenario.com/v1/jobs/{job_id}"
headers = {"Authorization": f"Basic {requests.utils.b64encode(f'{api_key}:{api_secret}'.encode()).decode()}"}
while True:
response = requests.get(url, headers=headers)
response.raise_for_status() # Raise an exception for HTTP errors
data = response.json()
status = data["job"]["status"]
print(f"Job status: {status}")
if status == "success":
asset_ids = data["job"]["metadata"].get("assetIds", [])
print(f"Image editing complete. Edited Asset IDs: {asset_ids}")
break
elif status in ["failure", "canceled"]:
raise Exception(f"Image editing job ended with status: {status}")
time.sleep(3) # Poll every 3 seconds
3. Retrieve Edited Image
Once the job status is success
, the metadata
field of the job response will contain the assetIds
of the newly generated (edited) images. You can then use these assetIds
to retrieve the actual image data, for example, by using the GET /v1/assets/{assetId}
endpoint if available, or by constructing a direct download URL if the API provides one.
Example of successful job response (relevant part):
{
"job": {
"jobId": "job_editing_example",
"status": "success",
"metadata": {
"assetIds": [
"asset_edited_image_1",
"asset_edited_image_2"
]
}
// ... other job details
}
}
📚 References
Updated 1 day ago