Wednesday, March 15, 2023

How DALL-E 2 Creates Images


In an interview in Venture Beat, Aditya Ramesh, Open AI creator of DALL-E 2, was emphatic that the AI app no longer scrapes images from the web in order to generate its output.  He stated in no uncertain terms:

"Diffusion models start with a blurry approximation of what they're trying to generate, and then over many steps, progressively add details to it, like how an artist would start off with a rough sketch and then slowly flesh it out over time."

That sounds fine and I would never want to question Mr. Ramesh's word, but his description unfortunately does not quite explain the above image created with DALL-E 2 in response to a fairly straightforward text prompt that ran as follows:

"Late 19th century elevated subway line running past tenement buildings on Manhattan's Upper West Side as seen from the street below.  Gloomy cloudy sky overhead.  Digital art in the style of John Sloan."

As you can see, the image fits the prompt admirably.  My problem is that although I did not name any specific tenement building in my prompt, DALL-E 2 nevertheless came up with a remarkably accurate rendition of the Endicott Hotel on Manhattan's Columbus Avenue between 81st and 82nd Streets, a building I know well if only because it's located almost directly across the avenue from my own apartment building.  It simply doesn't seem possible in light of Mr. Ramesh's explanation that the Endicott's facade could be rendered so exactly, all the more so as an elevated subway line, the Ninth Avenue IRT, once did in fact run opposite the Endicott's third floor windows until it was finally torn down in the 1930's. 

All this came to mind because I'm in the process of drafting of vlogging entry on the Endicott in which I preface my video with the above shown still image.  It's a near perfect representation.

No comments:

Post a Comment