There’s a way to do this in Auto1111 (sort of):
This feels pretty janky, though. I think you could do it better (and in one shot) in comfyUI by processing the partially generated latent, feeding that result to a controlnet preprocessor node, then adding the resulting controlnet conditioning plus the original half-finished latent to a new ksampler node. You’d then finish generation (continuing from the original latent) at whatever step you split off.
Agreed on the Auto1111 UI; I like the idea of ComfyUI but making quick changes + testing rapidly feels like a pain. I always feel like I must be doing something wrong. I do appreciate how easy it is to replicate a workflow, though.
What are you running SDXL in? I tried it in comfy UI yesterday and it seems really powerful, but it seems like it always takes a long time to mess around with images. I haven’t tried it in SD.Next or Auto1111 yet.
Thanks for reporting on that! It’s honestly rare to hear anyone using one, so real-world info is sparse haha. I was seriously considering an RTX 7900 series, but skipped it because of reading a few scattered experiences like yours. Maybe someday I’ll switch to Linux haha.
This is the what I’m aware of for ROCm: AMD: Partial RDNA 3 Video Card Support Coming to Future ROCm Releases. TL;DR is that it’s still not clearly committed with a date, and consumer GPU support is pretty weak.
There’s DirectML, which is what SD.Next (Vlad Diffusion) and some others use in Windows. I think it works OK, but can be slow, and it seemed to have a lot of bugs and limited support from perusing the issues lists (though I could be wrong there). I haven’t tried it, so others may know better. For perspective, I analyzed the public Vladmandic SD benchmark data and saw 0 7900 XT(X) results using Windows. It seems like almost nobody uses windows + AMD.
Is anyone running SD on AMD GPUs in Windows? The AMD benches seem to all be from Linux because of ROCm, presumably, but I’d be curious to know how much performance loss comes from using DirectML on, say, a 7900XT in Windows.
It kind of lends it a “steam on water” vibe, which works pretty well for me
I don’t recognize that fork of it - what are the differences there? I’ve been using vladmandic’s fork for a while and found it quite good. I still have the original kicking around as well, but don’t use it much.
This is great haha
Yeah, I glanced over it and couldn’t immediately see why the laptop one was benchmarking faster. There were only 7 samples or something for the laptop one, though, so it could just be a fluke. Maybe the laptop folks are using only the best optimization or something. I’ll keep playing with it when I get some spare time.
Yeah, the data is definitely not perfect. If I get a chance, I’ll poke around and see if maybe it’s one person throwing off the results. Maybe next time I’ll toss “n=##” or something on top of the bars to show just how many samples exist for each card. I also eventually want to filter by optimization, etc. for the next visualization, though I’m not sure what the best way is to do that except for maybe just doing “best for each card” or something.
This is so cool to see. Thank you for sharing!