Using ControlNet
What is ControlNet in StableDiffusion?
ControlNet is a powerful set of image-to-image tools. It works similarly to Remix, but is more stencil-like, reusing the overall shape or depth or pose of the input image to create an entirely new image.
Video – Controlnet in PirateDiffusion
If you’re already familiar with ControlNet, you’re probably curious how this works over chat commands, and it’s so fast and useful once you get the hang of it. We recommend starting with this video.
- 00:00 – Intro feat. Cat Grooming Masters
- 01:00 – $200 AI prompt challenge – “The Feast”
- 05:15 – How we made the intro graphics: ControlNet
- 10:00 – Two ways to start a Control
- 14:00 – Edges mode
- 17:50 – Masks and debugging
- 18:20 – You can type /cg instead of /controlguidance
- 22:50 – New Feature – The LCM Sampler
- 27:00 – Depth mode
- 30:00 – Contours mode
- 32:00 – Showprompt recap
- 35:00 – Using /mask to create a skeleton
- 38:00 – Recap: use a debug ID as a ControlNet
- 40:00 – Poses mode
- 41:00 – To use downloaded skeletons, facelift first to create the ID
- 42:00 – Skeleton mode
- 45:00 – ControlNet quirks and tips, show wrap
The many ControlNet Modes
On the upper left, the “control image” is a real, original photograph. This is the input.
On the right, these are the transformations from the control image from the various ControlNet modes. The naming should be self-evident for most: Segment, Edges, Contours and Depth use the physical aspects as the basis for the new image. The general shape is transferred. In the case of Reference, you can see that the sofa was sampled and moved to a different part of the room. Not shown here is Skeleton (but is covered in the video above), you can upload a special kind of pose mask called a Skeleton, and take full control of the poses of your characters.
Try it out
As explained in the video above, you can use any rendered image or facelifted image as a ControlNet starting point, or you can store an image as preset for reuse at any time.
For beginners, we recommend the preset method. Let’s do one together.
Step 1: Download this picture and paste it into Telegram
Step 2: Give it a name you’ll remember
Use the command /control /new: to save your image as a ControlNet template. In this case let’s call it bruce. So the command is:
/control /new:bruce
Just like this:
Tip: You can do this at the time of the upload, or “reply” to a photo that was already uploaded or rendered, like a result from a render command. Both ways do the same thing, they save your image as a preset. We’ll teach you how to use Controlnet without saving presets after this lesson.
Step 3: Choose a mode and try a prompt
After your template is saved, the system will remind you of all the modes that are available.
Here we can see the many modes available. Here we tried edges, and perhaps an edge detection of Bruce is too strict. If I wanted to generally copy this pose, but not necessarily so many characteristics of Bruce’s facial features, a different mode might work better.
Let’s try the third mode called Depth. This will not trace the character with line lines like Edges, but give us a depth map. Copy this prompt below or write your own.
/render /depth:bruce A strange colorful cartoon Muppet <level4>
The concept Level4 is one of our models. This is how we choose what art style. This one came out a little better, as the AI was given more freedom to fill in the blanks. Compare the Edges mask and Depth mask and you’ll understand why.
Tip: End prompts with /masks to see debug information, and to download the mask for use in Inpaint
Tada! It’s that easy.
FINE TUNE WITH CONTROLGUIDANCE
You can control how much the effect is applied using a parameter for guidance by adding the /cg parameter. ControlGuidance is a value between from 0.1 (lowest) to 2 (max). Use it like this:
A strange colorful cartoon Muppet /cg:0.5
What each mode does
- Edges (Canny) — best for objects and obscured poses, where it creates a line drawing of the subject, like a coloring book, and fills that in
- Contours (HED) — an alternative, fine-focused version of edges. This one and Edges retains the most resemblance to the preset image
- Depth – as the name implies, creates a 3D depth mask to render into
- Segment – detects standalone objects in the image
- Reference – attempts to copy the abstract visual style from a reference image into the final image
- Pose — best for people whose joints are clearly defined, but you want to completely discard the original photo’s finer details. Just the pose.
One of these modes is very different from the others:
- Skeleton — Upload the ControlNet-extracted mask from a pose, and render from that skeleton’s pose. Can only be used as an input here.
If you have time for one more lesson, try the Skeleton tutorial in the video at 42:00. You can do cool stuff like this:
2d, kicking a tiger, redhead, bangs, pigtails, absurdres, 1girl, angel girl, garter belts, training clothes, checkered legwear, white skin, cute halo, cross-shaped mark, colored skin, (monster girl:1.3), angelic, innocent, shiny, reflective, intricate details, detailed, dark flower dojo, thorns, [lowres, horns, blurry, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, (low quality, worst quality:1.4), normal quality, jpeg artifacts, signature, watermark, username, blurry, monochrome, error, simple background,] <breakanime>
EXPERIMENTAL
You can also use ControlNET style Reply commands to swap faces and replace backgrounds. This is how it works:
FACE SWAP
- Upload an image or render one
- Reply to it with /control /new:BillyBob (or whatever you want to call that face)
- Upload or render the “target” image, a second picture that will receive the swap
- Reply to the target with /faceswap BillyBob
There aren’t any additional parameters to Face Swap, and it only works on very realistic faces.
BACKGROUND REPLACE
- Upload an background or render one
- Reply to the background photo with /control /new:Bedroom (or whatever room/area)
- Upload or render the target image, the second image that will receive the stored background
- Reply to the target with /bg /replace:Bedroom /blur:10
The blur parameter is between 0-255, which controls the feathering between the subject and the background. Background Replace works best when the whole subject is in view, meaning that parts of the body or object aren’t obstructed by another object. This will prevent the image from floating or creating an unrealistic background wrap.
You can also completely remove the background and save it as a mask for prompting.