77% Of Employees Report AI Has Increased Workloads And Hampered Productivity, Study Finds

Stopthatgirl7@lemmy.world · 4 months ago

77% Of Employees Report AI Has Increased Workloads And Hampered Productivity, Study Finds

Hackworth@lemmy.world · edit-2 4 months ago

Voiceover recording, noise reduction, rotoscoping, motion tracking, matte painting, transcription - and there’s a clear path forward to automate rough cuts and integrate all that with digital asset management. I used to do all of those things manually/practically.

e: I imagine the downvotes coming from the same people that 20 years ago told me digital video would never match the artistry of film.

WalnutLum@lemmy.ml · 4 months ago

All the models I’ve used that do TTS/RVC and rotoscoping have definitely not produced professional results.

Hackworth@lemmy.world · edit-2 4 months ago

What are you using? Cause if you’re a professional, and this is your experience, I’d think you’d want to ask me what I’m using.

WalnutLum@lemmy.ml · 4 months ago

Coqui for TTS, RVC UI for matching the TTS to the actor’s intonation, and DWPose -> controlnet applied to SDXL for rotoscoping

Hackworth@lemmy.world · 3 months ago

Full open source, nice! I respect the effort that went into that implementation. I pretty much exclusively use 11 Labs for TTS/RVC, turn up the style, turn down the stability, generate a few, and pick the best. I do find that longer generations tend to lose the thread, so it’s better to batch smaller script segments.

Unless I misunderstand ya, your controlnet setup is for what would be rigging and animation rather than roto. I do agree that while I enjoy the outputs of pretty much all the automated animators, they’re not ready for prime time yet. Although I’m about to dive into KREA’s new key framing feature and see if that’s any better for that use case.

WalnutLum@lemmy.ml · 3 months ago

I was never able to get appreciably better results from 11 labs than using some (minorly) trained RVC model :/ The long scripts problem is something pretty much any text-to-something model suffers from. The longer the context the lower the cohesion ends up.

I do rotoscoping with SDXL i2i and controlnet posing together. Without I found it tends to smear. Do you just do image2image?

Hackworth@lemmy.world · 3 months ago

The voice library 11labs added includes some really reliable and expressive models. I’ve only trained a few voice clones, but I find them totally usable for swapping out short lines to avoid having to bring a subject back in to record. I’ll fabricate a sentence or two, but for longer form stuff, I only use AI for the rough cuts. Then I’ll practically record as a last step, once everything’s gone through revision cycles. The “generate a few and chop em together” method is fine for short clips, but becomes tedious for longer stuff.

Funnily enough, when I say roto, I really just mean tracing the subject to remove it from the background. Background removal’s so baked in to things now, I dunno if people even think of it as roto. But I mostly still prefer the Adobe solutions on this - roto brush in After Effects, for the AI/manual collaboration. As for roto in the A Scanner Darkly sense, I’ve played with a few of the video to video models, but mostly as a lark for fluff B-roll.

aesthelete@lemmy.world · edit-2 4 months ago

imagine the downvotes coming from the same people that 20 years ago told me digital video would never match the artistry of film.

They’re right IMO. Practical effects still look and age better than (IMO very obvious) digital effects. Oh and digital deaging IMO looks like crap.

But, this will always remain an opinion battle anyway, because quantifying “artistry” is in and of itself a fool’s errand.

Hackworth@lemmy.world · 4 months ago

Digital video, not digital effects - I mean the guys I went to film school with that refused to touch digital videography.