Illustration to Realism with Ai: Integrating Stable Diffusion into a Workflow

As I delve more into the world of generative Ai, I’ve been experimenting with different ways of incorporating Stable Diffusion into my workflow. Below is a summary of how I used a panel from a comic I illustrated to create a realistic representation of it. Interestingly enough, it ended up fairly time consuming, and it was a good discovery project to examine the various advantages and limitations of using Ai within the creative process.

Here is the original comic panel. Old-fashioned pencil and ink, then digitally painted in Photoshop.

To create an accurate rendering of the image, I needed to use the ControlNet extension in Stable Diffusion. I experimented with various preprocessors, but ultimately settled on Canny, Open Pose, and Depth Map. For the Canny preprocessor, I uploaded my original line art, as this preprocessor will use the line art as the guideline for the overall composition and placement of objects. For the depth-map, ControlNet typically can automatically create a depth-map from your image, however with my heavily detailed illustration, it had a hard time guessing the depth of the image. Therefore I had to paint one from scratch, which wasn’t overall too difficult, as long as I had the general depth approximations. And finally, I used the open-pose preprocessor to have ControlNet copy the pose of the subject of the illustration.

Next step was to generate the images. This was quite the time consuming process, as after each few iterations, I would make slight changes to the positive / negative text prompts, step counts, CFG scale, as well as experimenting with different control models (Ultimately settling on my go-to model “Analog Madness”).

Out of all the iterations, this one below I found turned out the best. Which still not close to useable, but would provide a good enough base to build on in Photoshop. At this stage some Ai users would continue making adjustments within Stable Diffusion using Inpainting, however I felt I would have more comfort and control to finish it off in Photoshop.

One part of the Ai render I didn’t like was the pose of the subject. Although it was following the pose based on the Open-pose preprocessor, the Canny preprocessor was forcing the Ai to use my line-art as a guideline for the pose as well, and the anatomy of how I drew the girl was far from perfect and wouldn’t translate to real-life too well. So I tried again, but only using the Open-pose processor this time, and limiting my text prompts to only describe the subject of the image. Once again, I got to a “good enough” stage, as I planned to do the remainder of the work in Photoshop.

With the new render of the subject, I incorporated her into the previous render of the total scene. At this stage is where I found Photoshop’s Generative Fill to be quite useful, as I was able to spot-fix areas here and there to better align with my original illustration. However I still had to resort to quite a bit of extensive digital painting – creating the subject’s shirt, edits to the mannequin and armor, the belt she is holding, extension cord, and general painting-in color corrections here and there.

All in all I’m pretty happy with the result. But it made me think on the whole Art vs Ai discussion; or more precisely, Artist vs Machine. How much of this piece could I say that I created? Or would we label this as “Ai art”? Although it was my original illustration that guided the concept and near-pixel accurate composition, the realism of it was Ai generated. However, I did edit the piece in Photoshop in the same manner I do with stock photos and digital painting. So in this case am I the artist and Ai was the assistant? Or is it the other way around? Was it an equal effort partnership, or was Stable Diffusion just simply a tool I used? Do my text prompts count for anything? Is it even “art”?

All questions and no answers – but I guess only time will tell if we will ever answer them!