Using AI to Generate Images for Articles
Hilliard, Ohio
Ask anyone that works in marketing, or maintains a blog, creating meta images for articles and other pages is a pain. With AI being the new hot thing, I decided to tackle this problem with some help from OpenAI.
Meta Images
Let's start with the basics. You might be asking yourself "what is a meta image?". It's the image that shows up on LinkedIn, Facebook and other sites when you post a URL. It's usually accompanied by the title and a summary of the website being posted. These images are defined in the code of a webpage using a "meta tag". This might be an OpenGraph tag (Facebook) or a Twitter card tag.
Now imagine you had to create engaging images for every article or page on your website. It would take a lot of time in Photoshop! So let's try automating that process.
The Idea
My idea was to create a small program that takes an an article's title and produces a Facebook/LinkedIn/Twitter ready meta image. Furthermore, I wanted the program to have three different approaches to producing images.
- Basic: Program simply draws a gradient using a random color palette and overlays the article's title.
- AI Hybrid: Program submits the title to GPT-3.5, asking for a list of keywords. These keywords are used to find a relevant stock photo on Unsplash, which the article title is then overlaid on.
- Full AI: Program submits article's title to Dall-E which returns a fully AI generated image. Program then overlays the article's title on the image.
Option 1 is the free, fast fallback, no API calls to OpenAI needed. Option 2 is the safe fallback, you're not going to get any crazy AI generated people with arms sticking out of their head from Unsplash. Option 3 is the exciting option, all in with AI.
The Code
I coded up the program using NodeJS and Typescript. The "node-canvas" library (which uses Cairo in the backend) made image manipulation easy. OpenAI's node library was also a breeze to work with, allowing me to submit requests to Dall-E and GPT in just a few lines of code.
Surprisingly, the hardest to work with was Unsplash. The library I found was untyped, and required "node-fetch" (a library for making HTTP calls), which was broken and had to be pinned to an older version. Once I finally made a successful request to Unplash, I then had to retrieve the image and load it as an object "node-canvas" understands, which was thankfully fairly trivial using helper functions included in "node-canvas".
The Results
Now for the exciting part, how did it perform? I'll break down the results by method, and include a number of sample images for each. Each sample will have the article title used to generate the image (and of course the article titles were generated by ChatGPT).
Basic Generation
This is the least surprising result. The gradient+text option worked just as well as you would expect, no real surprises here.
Here's a few samples from basic meta image generation:
Unsplash Generation
I was pleasantly surprised with the results from the GPT-3.5 + Unsplash method. Almost every image was relevant, and of course the images were high quality and engaging.
Here are the samples from Unsplash + GPT-3.5:
Dall-E Generation
Finally we have the all AI method. This method was by far the worst. Dall-E kept wanting to put text into the images, despite numerous variations of my prompt format. This is also the most expensive method at $0.020 per 1024x1024 image.
Here are the samples from Dall-E:
Given how terrible this first pass with Dall-E was, I came up with an idea. I'll first pass the article title to GPT-3.5 and ask it to generate a prompt for Dall-E. Then I'll pass GPT's prompt to Dall-E. Here's where that got me:
Wow! What a huge improvement (except number 2, it's terrifying). Now that GPT is able to pull out keywords from the article title, it's able to do a better job of writing requirements than I can. GPT is able to come up with alternative, related keywords that the original article title did not have. Maybe with an extra requirement of "no people" we would be good to go!
Improvements
With a few changes to fonts, palette generation and added support for a logo (or in this case, my face), we can improve our results even more. By exposing all of these configuration pieces to the end user, the application should be able to support just about anybody that would want to use it.
Conclusion
Phew, that was a lot! Now to look at my OpenAI bill...
In all seriousness, this was a interesting test. I don't claim to be the best (or even a good) prompt writer, but I was consistently underwhelmed by Dall-E. I realize there are probably better suited image generation LLMs out there, and maybe that's an article for another day.
That said, GPT + Unsplash performed very well, and I could see it being an actual, usable tool. Using GPT to generate search queries, then relying on the high-quality repository Unsplash offers is a perfect combination. You could probably get even better results by using a paid stock photo service (more options to match on) and by passing the entire article content (or at least a summary) to GPT.
With all that, what am I going to use for my personal blog? Probably just the basic method, it's the easiest on the wallet and it's "good enough". Additionally, although this was a fun experiment, I have a lot of concerns around data privacy and copyright infringement when it comes to AI and don't see myself incorporating it into my personal projects.
Source Code
The code written for this article is open-source and available on Github. There's a detailed guide on getting up and running in the README.md.