Jump to ratings and reviews
Rate this book

Harnessing GPT-Image-1 to Create High-Quality Illustrated Texts

Rate this book
We present a lightweight integration of GPT-Image-1, OpenAI’s new native multimodal model, into C-LARA, a platform for generation of multimodal learner texts. A three-stage prompt pipeline—style definition, reusable element creation, page-level composition—yields coherent, culturally appropriate illustrations for one page of text at a time while preserving global style and character consistency. The recipe can be invoked by users with a few button-presses and allows optional hand-tuning. To gauge quality we used 11 C-LARA texts—seven AI-generated pedagogical English passages and four challenging classical literary texts—and evaluated generated output using a visual questionnaire. Results show good scores for image-text correspondence and cross-page coherence, though some input from humans is still often needed to fine-tune the generated illustrations. We also present our initial observations on the use of this functionality within a low-resource Indigenous language context.

27 pages, ebook

Published August 25, 2025

5 people want to read

About the author

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
0 (0%)
4 stars
1 (100%)
3 stars
0 (0%)
2 stars
0 (0%)
1 star
0 (0%)
Displaying 1 of 1 review
Profile Image for Manny.
Author 45 books16k followers
August 26, 2025
The whole world has been talking about the recent release of GPT-5, which for some mysterious reason many people appear to hate, but the slightly earlier release of OpenAI's new multimodal model GPT-Image-1 received less attention. In the C-LARA project, we are equally impressed with both models. It seems to us that GPT-5 is a lot smarter, and GPT-Image-1 does far better at illustrating texts in a sensible way.

A central challenge, which we've been grappling with for the past year, is to produce illustrations that are coherent. Usually, you want them all to have the same style, and if the same element (character, object, location) turns up in two images you want it to be depicted in more or less the same way. DALL-E-3 didn't do well here, but GPT-Image-1 is at a different level. We found it easy to to add an intuitive pipeline to C-LARA, where we first create an image exemplifying the style, then images exemplifying the elements, and finally images which combine the style and the elements to illustrate the text.

If you're curious to know more, we just presented this paper at the recent SLaTE workshop in the Netherlands. You'll probably want to start by looking at the presentation created by my gifted colleague Sophie Rendina; it's posted here and contains many examples of images created by the C-LARA/GPT-Image-1 integration. The paper itself (here) gives further details. C-LARA is available for use, we have information on the C-LARA site.

We're pretty sure that the process of creating illustrated pedagogical texts for language learners will soon be fully automated; we don't know just when this point will be reached, but it's hard to believe that it will take more than at most another couple of years. We're currently writing a follow-on paper about this.

_________________________________

Dean wrote: "23
A gate is sort of like a half door
Or a door outside leading outside
I suppose then a fence is just a half wall
The sky a half roof"


I used C-LARA to create an illustrated multimodal version which you'll find posted here.

I wondered whether the poem had been translated from Chinese, so I asked the AI to gloss it in Chinese and base the style of the illustrations on classical Chinese inkbrush art. Apart from that, it did everything.

What do you think?
Displaying 1 of 1 review

Can't find what you're looking for?

Get help and learn more about the design.