Generating Custom Images with Hugging Face’s FLUX.1-dev Model
In this tutorial, we’ll explore a Python script that utilizes the FLUX.1-dev model from Hugging Face to generate custom images based on text prompts. This powerful model allows us to create unique images by leveraging the capabilities of the FLUX.1-dev model. Throughout the tutorial, we’ll carefully analyze the code and provide a detailed explanation of each step in the process.
What is Flux.1-dev Model?
The FLUX.1 suite of text-to-image models defines a new state-of-the-art in image detail, prompt adherence, style diversity, and scene complexity for text-to-image synthesis.
To strike a balance between accessibility and model capabilities, FLUX.1 comes in three variants: FLUX.1 [pro], FLUX.1 [dev], and FLUX.1 [schnell] but we will focus on Flux.1 [dev] model
- FLUX.1 [dev]: FLUX.1 [dev] is an open-weight, guidance-distilled model for non-commercial applications. Directly distilled from FLUX.1 [pro], FLUX.1 [dev] obtains similar quality and prompt adherence capabilities, while being more efficient than a standard model of the same size. FLUX.1 [dev] weights are available on HuggingFace and can be directly tried out on Replicate or Fal.ai.
FLUX.1 [dev]
is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post.
Key Features
- Cutting-edge output quality, second only to our state-of-the-art model
FLUX.1 [pro]
. - Competitive prompt following, matching the performance of closed source alternatives .
- Trained using guidance distillation, making
FLUX.1 [dev]
more efficient. - Open weights to drive new scientific research, and empower artists to develop innovative workflows.
- Generated outputs can be used for personal, scientific, and commercial purposes as described in the
FLUX.1 [dev]
Non-Commercial License.
Usage
We provide a reference implementation of FLUX.1 [dev]
, as well as sampling code, in a dedicated github repository. Developers and creatives looking to build on top of FLUX.1 [dev]
are encouraged to use this as a starting point.
API Endpoints
The FLUX.1 models are also available via API from the following sources
- bfl.ml (currently
FLUX.1 [pro]
) - replicate.com
- fal.ai
- mystic.ai
ComfyUI
FLUX.1 [dev]
is also available in Comfy UI for local inference with a node-based workflow.
Implementation
Step 1: Import Required Libraries
import requests
import os
from dotenv import load_dotenv, find_dotenv
from PIL import Image
import io
We start by importing the necessary libraries:
requests
: For making HTTP requests to the Hugging Face APIos
: For accessing environment variablesdotenv
: To load environment variables from a .env filePIL
(Python Imaging Library): For handling image processingio
: For working with byte streams
Step 2: Load Environment Variables
load_dotenv(find_dotenv())
This line uses the dotenv
library to find and load the .env
file in our project directory. This file should contain our Hugging Face API key.
Step 3: Set Up API Configuration
API_URL = "https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev"
headers = {
"Authorization": f"Bearer {os.getenv('HUGGIGFACE_API_KEY')}"
}
Here, we define two important variables:
API_URL
: The endpoint for the FLUX.1-dev model on Hugging Faceheaders
: A dictionary containing the authorization header with our API key
Note: Make sure to replace 'HUGGIGFACE_API_KEY'
with the correct environment variable name for your Hugging Face API key.
Step 4: Define the Query Function
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.content
This function sends a POST request to the Hugging Face API:
- It takes a
payload
parameter, which will contain our image generation prompt - It sends the request to the
API_URL
with our authorization headers - The function returns the raw content of the response, which will be our generated image in bytes
Step 5: Generate the Image
image_bytes = query({
"inputs": "A detailed cross-section of a fantastical multi-level treehouse, showing the daily life of its whimsical inhabitants",
})
Here, we call our query
function with a dictionary containing the "inputs"
key. The value is a detailed prompt describing the image we want to generate. In this case, we're asking for an image of Bhavik Jikadara riding a horse in a post-apocalyptic art style.
Step 6: Process and Display the Image
image = Image.open(io.BytesIO(image_bytes))
image
Output:
Finally, we process the returned image bytes:
- We use
io.BytesIO()
to create a byte stream from ourimage_bytes
- We open this byte stream with
PIL.Image.open()
to create anImage
object - By simply writing
image
at the end, we're implicitly displaying the image (this works in Jupyter notebooks and similar environments)
Limitations
- This model is not intended or able to provide factual information.
- As a statistical model this checkpoint might amplify existing societal biases.
- The model may fail to generate output that matches the prompts.
- Prompt following is heavily influenced by the prompting-style.