Friday, June 30, 2023

Dueling Versions of Stable Diffusion XL

 

In a previous post I mentioned a number of new offerings from Stabilty AI.  By far, the most interesting of these is a new version of Stable Diffusion, the company's text-to-image AI generative app.

Somewhat confusingly, the new upgrade, Stable Diffusion XL, is available in two versions, both of which are free of charge to use.  First, there is Stablility AI's own unofficial beta version on Hugging Face; and then there is a version offered by Clipdrop, a subsidiary acquired by Stability AI when it recently purchased Init ML.  Adding to the confusion, the app's previous version, on a page simply entitled Stable Diffusion Online, is also still available for use.

In my last post I quoted Google Bard on the differences between the two versions of XL that are now offered.  
"(1) Model Size - the full version images are larger (1.37GB) than the Clipdrop version images (,54 GB); (2) Image Quality - Clipdrop version images are of lower quality; and (3) Features - the full version offers more features, such as the ability to edit generated images."
In actual practice I did not find this to be accurate, perhaps because the full version is still available only in beta form.

Above are images I obtained from the Clipdrop version after entering the following text prompt:
"Elegant nighttime depiction in muted colors of extremely beautiful female model, face and figure shown in realistic detail, posing in futuristic designer clothing on rooftop of Tokyo apartment with glowing city lights and brightly lit neon billboards in background, a highly detailed epic cinematic concept art, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, arthouse."
I was extremely pleased with these results and found them to be aesthetically much more pleasing than anything I had obtained in prior versions of Stable Diffusion.  In particular, they were far more photo realistic.

Shown below are the results I obtained using the same prompt on the Hugging Face beta version page.  Not only are these images visually less appealing, but they are also smaller in size than the Clipdrop images.  That was quite a surprise to me, and as a result I will in the future work exclusively with Clipdrop until such time as Stability AI has released a final version.  It's extremely puzzling that Clipdrop seems to have left beta behind while its parent company is apparently still stuck there unless it is simply that Hugging Face has not acquired access to the final version, though I'm unable to locate it anywhere else either.  According to  Stability AI the version available on Dream Studio is also beta but it does go on to say that the final version of the new app "will be released as open source for optimal accessibility in the near future."


Tuesday, June 27, 2023

Stability AI: A Huge Array of Apps to Choose From


When well over a year ago I first began experimenting with AI imaging apps there were two companies that were getting the most buzz in the media - Open AI and Stability AI.  At the time, it was Open AI's DALL-E 2 that was the center of attention and I was curious to see what results I could obtain with it.  Unfortunately, I did not find the results as aesthetically pleasing as I would have liked, not to mention the drawback of being limited to 15 free credits per month, and I soon stopped using it.  In the meantime, Open AI appeared to shift its focus to ChatGPT, the wildly successful alternative search engine (though that may be too limited a definition of its functionality).

When I moved on to Stability AI I found the company had at the time two apps to choose from - Stable Diffusion and Dream Studio.  Both were AI imaging apps, but there were notable differences between the two which were outlined fairly succinctly by Google Bard.  Very basically, "Stable Diffusion is an open-source generative AI model that can be used to generate images from text descriptions.  Dream Studio is a web app that uses Stable Diffusion to create images."  I soon found that Stable Diffusion would best suit my needs, most especially as it was free to use while Dream Studio was a paid subscription service.

Working with Stable Diffusion, I found that the images, while more pleasing than those I had obtained with DALL-E 2, were still lacking.  Aside from the inconvenience of generating characters with three arms or legs and hands whose fingers resembled strands of spaghetti, Stable Diffusion images were too lacking in detail to be fully acceptable.  They certainly could never be termed photo realistic.  Subsequent upgrades to the app failed to sufficiently correct these problems.

Meanwhile, Stability AI was moving on.  For one thing, it introduced a new generative AI model Deep Floyd which, at least according to the article in Tech Crunch, is more correctly the name of a research group backed by Stability AI.  According to Bard, there are three main differences between Stable Diffusion and Deep Floyd: (1) Model Architecture - Stable Diffusion is a latent diffusion model, while Deep Floyd is a pixel-based diffusion model; (2) Image Quality - Stable Diffusion is generally considered to produce more realistic images than Deep Floyd; and (3) Speed - Deep Floyd is generally faster than Stable Diffusion.  In spite of Bard's explanation, I found in actual practice there was not that much noticeable difference between images generated by Stable Diffusion and those generated by Deep Floyd.  In other words, I did not find Deep Floyd to be that great an improvement on Stable Diffusion.

In another turn of events, in March 2023 Stability AI acquired Init ML, makers of Clipdrop, which then became a wholly owned subsidiary of Stability AI.  This, of course, gave Stabilty AI users free access to the whole range of Clipdrop apps which at last count I numbered at nine: Reimagine XL, Uncrop, Relight, Image Upscaler, Text Remover, Replace Background, Remove Background, Cleanup, and Stable Diffusion XL.

It's the last of the nine, Stable Diffusion XL, that I wish to call attention to since Stabilty AI has also released a full beta version of this same app.  While both versions are free of charge to use, there are differences between them which, again according to Bard, are as follows: (1) Model Size - the full version images are larger (1.37GB) than the Clipdrop version images (,54 GB); (2) Image Quality - Clipdrop version images are of lower quality; and (3) Features - the full version offers more features, such as the ability to edit generated images.

So far, for convenience sake, I've been experimenting only with the Clipdrop version of Stable Diffusion XL and have been happy to discover the app is a significant improvement over both the traditional Stable Diffusion and Deep Floyd.  The images I've obtained to date are far more realistic, even in the lower quality Clipdrop versions, than those I've been able to obtain with the earlier apps when using the same text prompts.  As a bonus, one nice feature in XL is the ability to apply styles, such as Cinematic, Digital Art, and Fantasy Art, when regenerating images.

As I work more with XL, I will post sample images here.

Monday, June 26, 2023

Adobe Express Beta - Animations


I recently upgraded to the new beta version of Adobe Express.  The app has a wealth of exciting new features, but the one I wanted to discuss here is that labeled Make Animations for YouTube.

Using this feature, I was actually able on the very first try to make a short animated video in which I announced my upcoming visit to Japan.  To accomplish this I first asked ChatGPT to create an article, told in the first person, in which I listed all the sights I hoped to see when in Tokyo and Kyoto.  ChatGPT did such a good job that it ended up mentioning several places of interest that hadn't been on my own radar, and I'm grateful to it for adding to my list.  I next cut & pasted the article into a free online app called Readloud.  (Unfortunately, Readloud, though legit itself, hosts large advertisements for Wave browsers, which are really nothing more than malware.  DO NOT click on the big banners that read "Start" or "Download.")  After selecting an appropriate voice from among a large number available, I clicked on "Voice It" to create an audio file which I then downloaded to my hard drive and then uploaded in turn to Adobe Express.  Important: note that Adobe Express will not upload audio files longer than two minutes.

With the audio uploaded, all that remained to be done in Adobe Express was to select the output size (e.g., YouTube, Instagram, etc.), an appropriate background, and finally the character, or avatar, who would narrate the animation.  After that it was a simple matter to download the animated video.

If one desires to create a traditional video rather than an animation, the beta version of Adobe Express contains a huge library of templates from which to choose.  I found an extremely helpful YouTube video by one Claudio Zavala Jr. that guides first time users through the process.

Friday, June 23, 2023

Midjourney v5.2


It's only fair to start this post by mentioning I'm not yet a subscriber to Midjourney, this though I've been experimenting with generative ai-imaging apps for well over a year now and in fact several months ago completed a novelAnd What If that's fully illustrated with ai imagery and is currently available for free download.

If I've chosen up to now to work with apps - principally Stable Diffusion, DALL-E2, and Lexica - other than Midjourney its primarily because these, unlike Midjourney, offer at least some free tiers of use, and I thought it prudent to keep my costs down while gaining some proficiency in ai imaging.  That being said, it's become obvious to me as I've studied any number of ai-generated creations that no other app I've come across can compare with Midjourney in terms of artistry and photo realism.  It really is in a class by itself.

I was very excited then to learn today that Midjourney has released a new version of its app, v. 5.2.  From what I've read, the new version contains several new features, but the most interesting by far is the "zoom out" which promises to widen an image's field of view and in this sense seems somewhat similar to Adobe Photoshop's new Generative Fill feature as well as Stability AI's Uncrop.

V5.2 contains other enhancements which, if not quite as exciting as "zoom out," are nevertheless well worth mentioning.  One is a "make square" command that allows a user to turn a rectangular image into one in square format.  Another is a "shorten" command that will analyze a text prompt and determine which words therein are given most weight in generating the final image.

I had intended to spend the summer in working only with photography, my stock in trade for almost fifty years now, but the new improvements to Midjourney are striking enough that I believe I will soon put my camera aside for the time being and begin subscribing to the app in order to see exactly what types of ai imagery I can create with it.

Thursday, June 22, 2023

DPreview Is Back

Actually, it might be more correct to say that the online publication never really went away in the first place.  To paraphrase Mark Twain, reports of its death had been greatly exaggerated.

The drama began in March when Scott Everett, DPR's General Manager, announced the site would be closing after twenty-five years of operation.  It turned out that the publication had been fully owned all along by Amazon, and when that behemoth began a series of cost-cutting moves DPR turned out to be sadly expendable.  But the site continued to maintain a presence on the web even after its announced shutdown date, ostensibly to work on archiving its contents.  More recently, it has continued to publish articles and reviews.

The latest news, announced on Tuesday, is that DPR has now been acquired by Gear Patrol, a print and digital publication founded in 2007 that describes itself on its YouTube channel as "a team of creators, users and enthusiasts, hell-bent on building the definitive resource for discovering products and exploring the stories that surround them."

From what's been announced so far, it seems DPR will remain unchanged and continue exactly as it did under Amazon.  That's welcome news for all photographers.

Saturday, June 17, 2023

Wave.video

I recently came across a site named Wave.video that promises text-to-video generative AI.  Since the site has several tiers, the most basic of which is free to use, I decided to give it a try.  For a lengthy text prompt I chose a short story, roughly five pages in length, I'd written many years ago and then did a cut & paste into the app's text prompt.  (I believe the limit, at least at the free level, is 50,000 characters.)  

I found the UI to be fairly intuitive and the app quickly generated a video on the first try.  The problem I encountered on this first attempt was that I had instructed the app to use free stock images to illustrate the story and, unfortunately, these were almost entirely out of character with the content (a ghost tale set in ancient Japan).  The video was very easy to edit, though, and I was able to quickly replace the stock images with copies of Japanese artworks I had accumulated over the years in my personal library.  This done, the roughly ten-minute video was entirely presentable and I was able to quickly download it to my hard drive in 720 mp4 format.  (I would have needed to have upgraded to higher tier to have saved it at a higher resolution.)

As for the the video itself, it was really not so much a video as one of those animations which move quickly from one still frame to the next.  To me, this wasn't really true video at all but more an animated Power Point presentation.  Still, the app does offer a new means of displaying one's writing and images to audiences who might otherwise be reluctant to read through a text presentation, even if illustrated, and for that reason should be a useful alternative for creators to keep in mind.

Friday, June 16, 2023

Nikon Strikes Back at AI


There's no question that the world, and image creators in particular, have grown obsessed with AI imaging.  One has only to look at the fantastic popularity of such apps as DALL-E 2, Stable Diffusion, and Midjourney to see that world of imaging has changed forever.  Seemingly overnight the ability to generate complex images from simple text prompts has taken hold of the public imagination as have very few technical advances before it.

And exactly what is AI-generative imaging?  If one were to search for a brief functional definition, one of my own that might do quite well is "digital photography without the camera."  However oversimplified, I believe this cuts right to the heart of what AI-generative imaging apps are offering their users.  It's no surprise then that camera manufacturers are taking note and responding, perhaps a bit fearfully, to this latest challenge to their dominance.

For one example, one need look no further than Nikon, the world's preeminent camera manufacturer ever since it took hold of the SLR market at the end of the 1950's with the introduction of the redoubtable Nikon F that quickly became the camera of choice for virtually every professional photographer the world over.  If Nikon's reputation were ever in doubt in the mirrorless age, any misgivings have been laid to rest once and for all with the recent release first of the Z9 and now the Z8.

So how is Nikon responding to the threat to its business posed by AI imaging?  A quote from a brief video makes its answer quite clear:
"This obsession with the artificial is making us forget that our world is full of amazing natural places that are often stranger than fiction."
And this puts in a nutshell the choice now facing content creators, Nikon users or not, and that is whether to capture the real world around them with camera in hand, or to create new worlds limited only by the extent of their imaginations.  There is no right choice here.  The decision depends on a given individual's unique proclivities.  For myself, though I began my career as a film photographer almost a half century ago, I've been drawn increasingly to the possibilities offered by the AI-generative apps listed above.  They bring to mind the famous Aldous Huxley quote that became the mantra of the psychedelic 1960's: "There are things known and there are things unknown, and in between are the doors of perception."

Tuesday, June 13, 2023

Photo AI

Over the past several years, as I've watched the incredible advances occurring in AI-generative imaging apps such as Midjourney, it's become increasingly obvious to me that the livelihood of all but the top commercial photographers is in grave danger of disappearing in the not too distant future.  As evidence of this, one need look no further than the appearance of the new startup company Photo AI.  As the company's website succinctly puts it:

"Save money and use AI to do a photo shoot from your laptop or phone instead of hiring an expensive photographer.  Train AI models and do photo shoots in different poses, places and styles.  Turn around time of just 30 minutes."

That pretty much sums up the allure of the concept as well as the threat to professional photographers.  Though the wording on the site seems more geared to personal projects in its suggestion that one train oneself as an AI model, the pricing options contain a Business plan for only $299/mo.  It's an offer I imagine many small and mid-sized businesses will jump at as a sure way to drastically cut their advertising budgets.  Even if Photo AI should fail for one reason or another, I'm certain many similar enterprises will immediately pop up in its place as the idea behind it is just one more practical application of AI technology.  As such, its eventual success seems inevitable.

Friday, June 9, 2023

Zonerama

I recently came across a gallery hosting site named Zonerama that offers free unlimited storage of photos that are displayed in album format.  (One can choose whether one wants a given album to be Public, Hidden, or Shared.)  Attracted mainly by the words "free" and "unlimited," I decided to give it a try.  I found it fairly easy to register (the site sends a link to the user's email address for verification purposes) and set up what few controls were needed to get started using my own name for both the gallery name and its URL.  I then uploaded roughly 20 photos I had shot last month at the NYC Japan Day Parade to create my first album.  The upload went fairly quickly - these were all web-sized images - but I noted that the photos did not appear in the same order as in the folder on my hard drive from which I had taken them.  True, there is a gear icon for each image that allows the user to, among other things, move or delete given photos; but I didn't feel a pressing need to establish a particular order, at least not on a non-pro site.

When one registers for Zonerama, one is informed that photo uploads will proceed more quickly when using another Zoner product, Zoner Photo Studio X; but it was only after installing it that I learned this app, unlike Zonerama, is definitely not free after the expiration of its 90-day trial period.  (Yes, my fault no doubt for not having read the fine print more carefully.)  As a quick look at the app showed it to be, as far as I could tell, little more than a poor man's Photoshop, and having no desire to spend $5.99 per month (or $59 per year), I quickly uninstalled it.

Getting back to Zonerama, I would suggest before using it one check out the Frequently Asked Questions, though these need to be updated.  From reading them I learned that if one wishes to prohibit viewer downloads from a given album it's necessary to click on the gear icon on the title bar above an opened album, go to "Show Controls," and then select "Hide Downloads" before choosing the selected settings as the default for future albums.

It's also possible to upload and store videos on Zonerama in a variety of formats, though length is limited to five minutes.  I haven't yet had a chance to use this feature but intend to start uploading certain of my videos in MP4 format very soon.

All in all, I'm fairly well satisfied with Zonerama as another venue on which to display my photography work, especially as albums can be shared on Facebook and Twitter as well as via email.  Certainly the price is right.

Thursday, June 8, 2023

Adobe Express Beta

 
I received an email from Adobe today inviting me to try the new Beta version of Adobe Express.  As far as I can make out from Adobe's announcement, the thrust of the improvements to the free web-based app has been to integrate it more fully with other of the company's Cloud apps (e.g., a file updated in Photoshop CC will automatically be updated at the same time in Express), thus facilitating ease of use and communication among members on group projects.  To be honest, this isn't of much use to me since I do not subscribe to any of Adobe's Cloud apps and do not work with teams of creators.  More interesting is the incorporation into Express of Firefly, Adobe's AI generative app, to facilitate the creation of content.  At least at present, though, I've found Firefly to be extremely limited when attempting to create realistic AI imagery.  Many results I've obtained using it appear almost cartoonish in quality, This, however, may in part at least be due to Adobe's laudable attempt to ethically train Firefly on non-copyrighted material.

While Express will undoubtedly prove useful to professionals subscribing to Adobe Cloud, it's evident that the app has been targeted to social media content creators.  The email I received is quite explicit on this point.  Its brief message reads:
"Level Up
"Design Reels, Tik Toks, flyers, and more with the new, all-in-one Adobe Express.
"Create videos fast with easy, drag-and-drop editing.  Add amazing artwork and text effects generated by Adobe Firefly AI.  Start inspired with thousands of professionally-designed templates and Adobe Stock videos, photos, and music.  Let's go!"
Note that at present Adobe Express beta is available only on desktop and must be accessed via one's computer.