Generating Custom Images with Hugging Face’s FLUX.1-dev Model

4 min readSep 6, 2024

In this tutorial, we’ll explore a Python script that utilizes the FLUX.1-dev model from Hugging Face to generate custom images based on text prompts. This powerful model allows us to create unique images by leveraging the capabilities of the FLUX.1-dev model. Throughout the tutorial, we’ll carefully analyze the code and provide a detailed explanation of each step in the process.

What is Flux.1-dev Model?

The FLUX.1 suite of text-to-image models defines a new state-of-the-art in image detail, prompt adherence, style diversity, and scene complexity for text-to-image synthesis.

To strike a balance between accessibility and model capabilities, FLUX.1 comes in three variants: FLUX.1 [pro], FLUX.1 [dev], and FLUX.1 [schnell] but we will focus on Flux.1 [dev] model

FLUX.1 [dev]: FLUX.1 [dev] is an open-weight, guidance-distilled model for non-commercial applications. Directly distilled from FLUX.1 [pro], FLUX.1 [dev] obtains similar quality and prompt adherence capabilities, while being more efficient than a standard model of the same size. FLUX.1 [dev] weights are available on HuggingFace and can be directly tried out on Replicate or Fal.ai.

FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post.

Key Features

Cutting-edge output quality, second only to our state-of-the-art model FLUX.1 [pro].
Competitive prompt following, matching the performance of closed source alternatives .
Trained using guidance distillation, making FLUX.1 [dev] more efficient.
Open weights to drive new scientific research, and empower artists to develop innovative workflows.
Generated outputs can be used for personal, scientific, and commercial purposes as described in the FLUX.1 [dev] Non-Commercial License.

Usage

We provide a reference implementation of FLUX.1 [dev], as well as sampling code, in a dedicated github repository. Developers and creatives looking to build on top of FLUX.1 [dev] are encouraged to use this as a starting point.

API Endpoints

The FLUX.1 models are also available via API from the following sources

bfl.ml (currently FLUX.1 [pro])
replicate.com
fal.ai
mystic.ai

ComfyUI

FLUX.1 [dev] is also available in Comfy UI for local inference with a node-based workflow.

Implementation

Step 1: Import Required Libraries

import requests
import os
from dotenv import load_dotenv, find_dotenv
from PIL import Image
import io

We start by importing the necessary libraries:

requests: For making HTTP requests to the Hugging Face API
os: For accessing environment variables
dotenv: To load environment variables from a .env file
PIL (Python Imaging Library): For handling image processing
io: For working with byte streams

Step 2: Load Environment Variables

load_dotenv(find_dotenv())

This line uses the dotenv library to find and load the .env file in our project directory. This file should contain our Hugging Face API key.

Step 3: Set Up API Configuration

API_URL = "https://api-inference.huggingface.co/models/black-forest-labs/FLUX.1-dev"
headers = {
    "Authorization": f"Bearer {os.getenv('HUGGIGFACE_API_KEY')}"
}

Here, we define two important variables:

API_URL: The endpoint for the FLUX.1-dev model on Hugging Face
headers: A dictionary containing the authorization header with our API key

Note: Make sure to replace 'HUGGIGFACE_API_KEY' with the correct environment variable name for your Hugging Face API key.

Step 4: Define the Query Function

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.content

This function sends a POST request to the Hugging Face API:

It takes a payload parameter, which will contain our image generation prompt
It sends the request to the API_URL with our authorization headers
The function returns the raw content of the response, which will be our generated image in bytes

Step 5: Generate the Image

image_bytes = query({
    "inputs": "A detailed cross-section of a fantastical multi-level treehouse, showing the daily life of its whimsical inhabitants",
})

Here, we call our query function with a dictionary containing the "inputs" key. The value is a detailed prompt describing the image we want to generate. In this case, we're asking for an image of Bhavik Jikadara riding a horse in a post-apocalyptic art style.

Step 6: Process and Display the Image

image = Image.open(io.BytesIO(image_bytes))
image

Output:

Finally, we process the returned image bytes:

We use io.BytesIO() to create a byte stream from our image_bytes
We open this byte stream with PIL.Image.open() to create an Image object
By simply writing image at the end, we're implicitly displaying the image (this works in Jupyter notebooks and similar environments)

Limitations

This model is not intended or able to provide factual information.
As a statistical model this checkpoint might amplify existing societal biases.
The model may fail to generate output that matches the prompts.
Prompt following is heavily influenced by the prompting-style.