Friday, July 28, 2023

Is DALL-E 3 About to Be Released?


Back in the fall of last year the buzz in AI imaging, then only beginning to take hold of the popular imagination, was all about Open AI's DALL-E 2.  How long ago that now seems.  Since then, Midjourney has deservedly been been recognized by most as the premiere text-to-image AI generative app while DALL-E 2 has been largely forgotten.  It's no surprise.  Even at the height of its popularity its imagery seemed, to me at least, markedly inferior to that which I obtained from it's main rival, Stability AI's Stable Diffusion.

In spite of the above, it seems Open AI has not yet given up on DALL-E and, at least according to a YouTube video I recently viewed, is preparing to launch version 3.  The original video seems to have been removed since then, or I'm simply unable to locate it, but I remember thinking when I viewed it that the "leaked" images shown were not much better than those I'd seen generated by version 2.  At any rate, there is no official word as of this writing from Open AI on any plans to update DALL-E.  I'll of course post here if I hear anything definitive.

Tuesday, July 25, 2023

ChatGPT Formula for Generating Prompts

 

I recently came across a YouTube video by a creator named Monice in which she gave a formula that would enable one to use ChatGPT as an image generator for Midjourney.  The given formula to be entered into ChatGPT reads as follows:

"Here’s a Midjourney formula for you to understand.

‘/imagine prompt: [Image we are prompting], [5 descriptive keywords], [Camera type], [Camera lens type], [time of the day], [focal distance], [style of photograph], [by photographer name] –ar x:y Reply ‘yes’ if you understood the formula."

I decided to give it a try and ChatGPT quickly furnished me with the following parameters:

"/imagine prompt: [Image we are prompting], [5 descriptive keywords], [Camera type], [Camera lens type], [time of the day], [focal distance], [style of photograph], [by photographer name] –ar x:y

[Image we are prompting]: This is the main subject or content of the photograph that will be used as a basis for the prompt.

[5 descriptive keywords]: Five words that describe the main elements or characteristics of the image.

[Camera type]: The type of camera used to capture the photograph.

[Camera lens type]: The specific lens used on the camera to capture the image.

[time of the day]: The specific time of day when the photograph was taken (e.g., morning, afternoon, evening).

[focal distance]: The distance at which the camera's lens is focused when capturing the image.

[style of photograph]: The artistic style or approach used in the photograph (e.g., abstract, portrait, landscape).

[by photographer name]: The name of the photographer who took the picture.

–ar x:y: The aspect ratio (ar) of the photograph, represented by 'x:y'"

I in turn provided the following information:

"Here are the specifics for each parameter as numbered above: 1. image we are prompting = A depressed young woman sits on wet floor in an unfurnished darkened basement and stares forlornly at a tiny skylight high above her head; 2. five descriptive keywords = depressed, melancholy, darkened, forlorn, muted; 3. camera type = nikon df dslr; 4. lens type = nikon 23mm wide angle lens; time of day = 4pm EST; focal distance = 5 feet; style of photograph - artistic vintage color; name of photographer = paola m franqui; aspect ratio = 1620 px wide by 1080 px high"

ChatGPT then issued the following prompt:

"A depressed young woman sits on the wet floor in an unfurnished darkened basement and stares forlornly at a tiny skylight high above her head. Keywords: depressed, melancholy, darkened, forlorn, muted. Camera type: Nikon DF DSLR. Lens type: Nikon 23mm wide-angle lens. Time of day: 4 PM EST. Focal distance: 5 feet. Style of photograph: Artistic vintage color. By photographer Paola M Franqui. –ar 1620:1080"

Since I do not currently have an account on Midjourney, I instead entered the prompt into Clipdrop Stable Diffusion XL which generated four images from it, two of which are shown here.

I thought the images were very much in line with what I had been looking for, though I don't know if it really had been necessary to go through ChatGPT to get them.

Monday, July 24, 2023

Using Poetry for AI Prompts

 

As I'm always in search of new ideas for AI imaging prompts, it recently occurred to me to simply enter random lines of poetry and then wait to see what a particular AI model would make of them.  Rather than work from masterpieces of literature, I instead focused on poetry that contained the most vivid imagery.  Accordingly, the two images shown here were generated by Clipdrop Stable Diffusion XL from the opening lines of the 1912 poem Medusa by Clark Ashton Smith, always one of my favorite Weird Tales authors of whom no one less than H.P. Lovecraft once wrote: "In sheer demonic strangeness and fertility of conception, Clark Ashton Smith is perhaps unexcelled."  And on that score who would know better than Lovecraft?

Medusa itself is contained in The Star-Treader and other Poems, available as a free download from Project Gutenberg, and the lines in question read as follows:

"As drear and barren as the glooms of Death,

It lies, a windless land of livid dawns,

Nude to a desolate firmament, with hills

That seem the fleshless earth's outjutting ribs,

And plains whose face is crossed and rivelled deep

With gullies twisting like a serpent's track.

The leprous touch of Death is on its stones,

Where for his token visible, the Head

Is throned upon a heap of monstrous rocks,

Grotesque in everlasting ugliness,

Within a hill-ravine, that splits athwart

Like some old, hideous and unhealing scar."

Pretty strong stuff, isn't it?  And perfect for the use to which I put it.

My college degree was in English lit; and so I have, even many years later, a wide familiarity with world poetry from which to draw future inspiration.  It will be interesting, to me at least, to see what I can come up with.

Friday, July 14, 2023

Is Lexica the Best AI Imaging App?


While seemingly every time I sign on to YouTube I'm bombarded with buzz on the latest "hot" AI imaging apps - Midjourney, Leonardo AI, and Stable Diffusion XL, for example - one that I almost never hear mentioned is Lexica;  and that's rather strange considering the superior image quality this online app invariably provides.

Though prompting takes a bit more effort than is required with Leonardo AI, Lexica is still extremely easy to use, especially as users are not tied to the Discord app as they are with Midjourney but can generate images directly from Lexica's website in an amazingly short time.  And the pricing, similar to Midjourney's, is definitely reasonable.  Lexica may be stricter about forbidding questionable material, but since I never seek to create any NSFW or even suggestive images that's not really a problem for me.  Most importantly, the images generated by Lexica are, if anything, of even higher quality than those that can be acquired from Midjourney.  Just look at the two examples I've posted here and judge for yourself.

The bottom line, for me at least, is that the AI imaging app that's most worth a paid subscription, at this point in time at any rate, is Lexica.

Monday, July 10, 2023

How Good Is Leonardo AI?


I recently discovered a new text-to-image AI generator named Leonardo that's free to use, though there are paid pricing tiers available.  I don't think most users will need to go the paid route, however, since everyone is given 150 credits each day.  Even though several credits are used for each image generation and more for features such as upscaling, I still found I could get a good number of images without exhausting my daily quota of credits.  But those who feel the need for a larger number of credits, the ability to keep generated images private, and more control using such features as Alchemy, now in beta, which offers a number of style variations and generally results in higher quality images, may well want to consider upgrading to a paid plan.

The great advantage to Leonardo for those who do not have extensive experience with AI generative models is its ease of use, especially when compared to such apps as Midjourney which require a certain level of expertise.  At the most basic level all one need do is choose a model - there are several featured platform models as well as an extensive library of community-created models - that will determine the general look of the generated image(s), choose between Leonardo Style (which will usually result in greater realism) and No Style, and enter a prompt.  Leonardo will do the rest.  It even has a prompt generator that will improve on a submitted prompt within the image generation window itself.

As mentioned, there are a number of features available to those who want more control over their images.  In regard to the image generation itself, one can first of all (at a cost of extra credits) choose the output size and the number of variations.  For myself, I've found two variations on each image and the default 768x768 size to be sufficient.  If I need a larger size I can always upscale later in Topaz Labs Gigapixel.  

Even without the afore-mentioned Alchemy, one can the choose between Leonardo Style and no style at all.  Prompt Magic is turned on by default and its guidance set to .4.  I haven't had a chance to fully explore this feature myself but it is defined quite simply as follows: "Prompt Magic enables higher prompt adherence and greater deviance from training data."  High Contrast is also turned on by default.  A nice touch is the ability to enter negative prompts and thereby exclude from generated images items one does not wish to see in them.

Beyond this, there are several features one would not expect to find in a free app.  A Guidance Scale, for example, determines how closely a given prompt should be adhered to when generating images.  One can also upload images for use in image-to-image prompting while ControlNet "allows you to influence your generations with input images."

One can also edit within the generated image canvas itself, allowing one ""to visually correct and refine details, modify styles, seamlessly extend content, create composite artwork and striking photomontages through inpainting and outpainting."

Finally, and perhaps most importantly, Leonardo offers users the ability to train their own models in only a few clicks.

As far as image quality, Leonardo provides excellent results, though I believe that, unless one has access to Alchemy, Midjourney, Lexica, and the new Stable Diffusion XL may hold a slight advantage.  I did try using Crisp HD Upscale on a few photos, but it didn't really seem to appreciably improve the overall image quality.

In summary, I think that once Alchemy comes out of beta, Leonardo will be a viable alternative to Midjourney as far as quality is concerned.  It's certainly easier to use - one is no longer tied to the Discord server for one thing - and the pricing on the paid tiers is comparable to what Midjourney charges.

Thursday, July 6, 2023

Midjourney 6.0 Arriving Soon?

According to an article in Forbes, which in general is a fairly reliable source of information, AI imaging app Midjourney will release version 6.0 before the end of July.  The article unfortunately did not contain any information on what features the new version may offer.  It did, however, describe in a fair amount of detail two new features in the current version 5.2 and also posted examples to illustrate the results obtained with them.

The first of these new features is the ability to create panoramic images.  According to Forbes, it works as follows:

"When a regular image is upscaled to full resolution, the user gets the option to pan the image left, right, up or down. This effectively extends the image in the chosen direction, with the AI filling in the empty space.

Images can continue to be extended beyond one iteration, although the wider and wider aspect ratio becomes problematic.

Users can also enter Midjourney’s Remix mode and change the prompt at the same time as extending the image, effectively letting them create panoramic photo stories."

The second new feature is Weirdness.  As the name would imply, this feature allows users to dial in various degrees of "weirdness" into generated images on a scale of 0 to 3,000.  How useful Midjourney subscribers will find this feature will obviously depend on their individual styles of creativity.  Those who favor a straightforward approach to image generation will most likely opt not to work with it that often while those seeking to create the unusual and fanciful may find it quite helpful in achieving their goals.

Wednesday, July 5, 2023

Clipdrop Uncrop

 

In a previous post I mentioned that Stability AI had acquired a suite of online apps, the Clipdrop series, as a result of its acquisition in March of Init ML.  Recently I decided to experiment with one of those apps, Clipdrop Uncrop, which as far as I can make out is very similar in purpose to Adobe Photoshop's Generative Fill and is in contrast to the latter free to use.

The above image was generated by Clipdrop Stable Diffusion XL in response to a two-word text prompt, "hyperrealistic insanity."  The other three variations given by XL to this prompt were the expected closeups of faces so distorted by rage and anger their owners might very well have been rejected by Bedlam for having been too psychotic.  The above image was an anomaly.  It appears to show a devastated street in some war-torn country.  (Could it be Ukraine?)  I don't know what it was doing among the others unless the AI was far more perceptive than it is usually given credit for and was making a statement on the insanity of war.

At any rate, I decided to see what I would get if the image were "uncropped."  It has to be understood, of course, that in determining what should be put in the extended canvas of any image the AI must always be limited by the contents of the original.  Bearing that in mind, I thought the AI did a satisfactory job in generating the image shown below, although both the right and left sides of the extended image are lacking in detail.

I don't know how much use I will make of Uncrop in the future since it seems to offer an advantage only when, for one reason or another, a photographer needs a panoramic image in place of that actually shot.  If I do need a panoramic photo I would much prefer to simply shoot with my Sirui 50mm anamorphic lens rather than be bothered creating unreal images in post processing.

Tuesday, July 4, 2023

Picsart AI GIF Creator


The online photo editor Picsart has introduced a new AI feature, a GIF creator, that's at the moment at least free to use and that produces some truly offbeat imagery.  

As Picsart states on its website:
“A world of untapped GIF potential awaits at your fingertips. No longer will you have to rely solely on recycled content or endlessly search for the perfect GIF to convey your emotions or thoughts. Our AI GIF Generator empowers you to express yourself with unprecedented originality, opening up a realm of endless creativity. The GIF game will never be the same again.”
All one need do to use it is to enter a short text prompt.  It's not art, of course, but it might nevertheless prove useful to those looking to spice up their social media posts with unusual content.  It even offers a choice of styles to apply to the GIFs.  These include various moods, colors, and styles of famous artists.  I chose Steampunk for this example of a crowded NYC subway train moving down the track.

Monday, July 3, 2023

Text-to-Image AI Generated Prompts: ChatGPT vs. Google Bard

 

I recently came across a YouTube video in which its author compared images generated in Stable Diffusion XL with those generated in Midjourney.  There was nothing new there - such comparisons have been popping up all over the web during the past month.  What I found intriguing, however, was that the YouTuber used as prompts texts generated by ChatGPT based on only a single word.  I couldn't help wondering why the user had chosen GPT and not Google Bard which has lately been getting an equal amount of buzz.  I therefore decided to do a comparison of my own between the effectiveness of prompts generated by GPT from a single word as opposed to those generated in the same manner by Bard.  For this purpose, I limited myself to using SDXL as the sole image generator.

Going first to GPT, I asked the app to generate a prompt based on my own choice of word, in this case "beauty."  GPT immediately replied:
"Prompt: Create an AI-generated image that captures the essence of 'beauty' in nature.

Description: Imagine a serene landscape where beauty abounds. The image should showcase a harmonious blend of colors, shapes, and textures that epitomize the concept of beauty. Incorporate elements such as a stunning sunset or sunrise, vibrant flowers, majestic mountains, a serene lake, or any other natural elements that evoke a sense of awe and wonder. The goal is to convey the breathtaking beauty found in the natural world through an AI-generated image. Let your creativity and the power of AI bring this beauty to life!"
When I plugged this prompt into SDXL I got back the image shown below.  Just as GPT's prompt had been perfectly appropriate so was the image itself.  I thought it somewhat generic, though, and not particularly inspiring.


When I moved on to Bard I had a completely different experience.  Not only did the app provide five alternative prompts to the word "beauty" (each of them illustrated with watermarked images from, respectively, Open Art, Smithsonian, Unsplash, EarthSky, and Daily Hive) but before even beginning with those offered two unsolicited prompts of its own devising as follows:
"Sure, I can do that. Just give me a word and I will generate a prompt for you. For example, if you give me the word 'cat', I might generate a prompt like this:

A realistic 4k image of a cat sitting on a windowsill, looking out at the city skyline. The cat is wearing a red bow tie and has a content expression on its face."
"I can also generate prompts that are more abstract or creative. For example, if you give me the word 'dream', I might generate a prompt like this:

A surreal image of a dreamscape. The landscape is made up of strange and colorful shapes, and there are floating objects in the air. A figure is walking through the dreamscape, their face obscured by a mist."

I thought it pretty amazing that Bard could display such creativity.  As for the word "beauty," Bard suggested the following prompts:
"A realistic image of a field of flowers in full bloom. The flowers are a variety of colors, and they are swaying in the breeze. The sun is shining brightly, and the sky is a clear blue."
"A detailed image of a butterfly's wing. The wing is covered in intricate patterns, and it is reflecting the light in a beautiful way. The butterfly is perched on a flower, and its wings are slowly opening and closing."
"A stunning image of a sunset over the ocean. The sky is ablaze with color, and the clouds are reflecting the light in a magical way. The waves are crashing on the shore, and the seagulls are soaring overhead."
"A mesmerizing image of a nebula. The nebula is made up of gas and dust, and it is glowing with a beautiful light. The nebula is located in a distant galaxy, and it is millions of light-years away from Earth."
"A serene image of a mountain lake. The lake is surrounded by mountains, and the water is a clear blue color. The sun is shining brightly, and the trees are reflected in the water."

I was really floored by the versatility Bard showed in providing me with the above prompts and was in general very pleased with the images shown above that SDXL generated in response to them.  I intend to keep experimenting with Bard as a prompt generator and will continue to post here the images I obtain.