Name: Harnessing GPT-Image-1 to Create High-Quality Illustrated Texts
Rating: 4 (1 reviews)

Author 45 books16k followers

August 26, 2025

The whole world has been talking about the recent release of GPT-5, which for some mysterious reason many people appear to hate, but the slightly earlier release of OpenAI's new multimodal model GPT-Image-1 received less attention. In the C-LARA project, we are equally impressed with both models. It seems to us that GPT-5 is a lot smarter, and GPT-Image-1 does far better at illustrating texts in a sensible way.

A central challenge, which we've been grappling with for the past year, is to produce illustrations that are coherent. Usually, you want them all to have the same style, and if the same element (character, object, location) turns up in two images you want it to be depicted in more or less the same way. DALL-E-3 didn't do well here, but GPT-Image-1 is at a different level. We found it easy to to add an intuitive pipeline to C-LARA, where we first create an image exemplifying the style, then images exemplifying the elements, and finally images which combine the style and the elements to illustrate the text.

If you're curious to know more, we just presented this paper at the recent SLaTE workshop in the Netherlands. You'll probably want to start by looking at the presentation created by my gifted colleague Sophie Rendina; it's posted here and contains many examples of images created by the C-LARA/GPT-Image-1 integration. The paper itself (here) gives further details. C-LARA is available for use, we have information on the C-LARA site.

We're pretty sure that the process of creating illustrated pedagogical texts for language learners will soon be fully automated; we don't know just when this point will be reached, but it's hard to believe that it will take more than at most another couple of years. We're currently writing a follow-on paper about this.

_________________________________

Dean wrote: "23
A gate is sort of like a half door
Or a door outside leading outside
I suppose then a fence is just a half wall
The sky a half roof"

I used C-LARA to create an illustrated multimodal version which you'll find posted here.

I wondered whether the poem had been translated from Chinese, so I asked the AI to gloss it in Chinese and base the style of the illustrations on classical Chinese inkbrush art. Apart from that, it did everything.

What do you think?

australian-languages chat-gpt children

Harnessing GPT-Image-1 to Create High-Quality Illustrated Texts

Sophie Rendina, Manny Rayner, ChatGPT C-LARA-Instance

About the author

Sophie Rendina

Ratings & Reviews

Friends & Following

Community Reviews

Join the discussion

Can't find what you're looking for?