top of page

GENERATED VIDEO WORK

VELOCIUM

First place title prize winner for Project Odyssey season 2, 2025.

Connection and isolation; each can be a prison or an escape, both are needed to understand one another.

When I wrote this in 2023 it was a journal entry that I had no intention of making into a film, it was an emotionally charged vent. When I began to discover the possibilities of creating animation with Machine Learning tools, I thought this would be a good piece of writing to turn into an animated film.

 

CREDITS

Written, Animated, Edited, and Directed by Calvin Herbst @calvin_herbst

Sound Studio: ADY.AUDIO @ady.audio

Sound Design and Mix by Daryn Ady @ydanyrad

Audio Producer Kamilla Azh @kamillazh

Song "Falaise" by Floating Points

Velocium.png

PROCESS

-Images were generated using Text to Image with the Black Forest Labs FluxDev base model via comfy UI with tweaks in Adobe Photoshop.

-Images were animated using various Image to Video tools, including: Kling 1.6, Kling 1.5 start/end frame, Hailou, Haiper, Runway, and Luma Dream Machine.

-Additional assets were generated from Gaussian splats using Luma and processed in ComfyUI with Animatediff, IPadatapers, and Controlnets.

-Texture, Grain, and color were added with Dehancer-Pro in Davini Resolve and Adobe After Effects.

-Voiceover was created with Eleven Labs.

-The image generation model pipeline used a mix of open-source and custom-trained LoRAs.

-The LoRAs in the pipeline included training data from past and modern animation, live-action, photography, concept art, and experimental digital imagery.

-Some of the LoRAS I trained were distilled and trained on generated images via a separate model pipeline.

-The software, workflows, and custom nodes used in the generative image process were based completely on open source tools, built by those who were willing to give away what they created for free, using repositories like Civitai, Huggingface, and GitHub.

l2.jpg

start frame

z2b.jpg

start frame

s2.jpg

start frame

zb2.jpg

start frame

m.jpg

end frame (edited prompt, same seed)

z2b2.jpg

end frame 

t.jpg

end frame (same prompt, new seed)

zc.jpg

end frame (same prompt, new seed)

Velocium portal sfef for gif_1.gif

final shot

Velocium flower sfef for gif.gif

final shot

Velocium floating sfef for gif.gif

final shot

Velocium turn sfef for gif_1.gif

final shot

Start and End Frame Image to Video

The above shots were created in Flux. When I found a good start frame, I would either run the workflow with the same parameters and find a new seed for a similar end frame, or run the same seed with an altered prompt. 

Changing either only the seed, or prompt slightly allows for an end frame that is similar enough for a smooth transition while still being different enough to create a dynamic shot. 

velocium 3d camera bts 2 for gif.gif

1. 3D camera moves in splat

velocium 3d camera bts 3 for gif.gif

2. 3D camera moves in splat

traingle for gif.gif

3. Generated Asset (T2I>I2v)

velocium ae tracking timelapse for gif.gif

4. After Effects Comping

velociumportalcompforgif-ezgif.com-optimize.gif

5. Comp for V2V

velociumfinalportalforgif-ezgif.com-optimize.gif

6. Final Output Ran Through SD1.5 AnimateDiff V2V

3D Video to Video Process

I created 3 different gaussian splats (1,2), then animated a 3D camera inside the scenes so that they would flow together. 

I used a Flux Text to Image workflow to create a portal asset against a green backdrop (3) and animated it with a separate Image to Video workflow. I took the Animated asset and comped it into the POV 3D camera videos in After Effects (4,5), than ran the comp through an Animate Diff V2V workflow (6).

VIDEO CONCEPT ART

A collection of my favorite shots out of 50,000+ images generated across various projects in 2025

 

Creating quality AI videos relies primarily on generating exceptional still images, which later get used in a separate image-to-video process. I found that using popular paid service image generation tools, or even untuned open source models, always yielded outputs that looked more like a stock photo than a still from a movie. It doesn't matter if the words "cinematic" and "shallow depth of field" are used in the prompt; the base models sample from vast datasets, hundreds of millions of images - more than a single person can ever see in their lifetime - resulting in outputs that feel too much like a "collective average" of every image ever made. It's like mixing all the colors of paint on a palette and ending up with some form of dull brown or grey.The, lighting, characters, compositions, and fine grain textures in this reel were only possible with a combination of LoRAs I trained thanks to open source repositories like Hugging Face, GitHub, and Civitai. I conducted extensive A/B tests in ComfyUI, locking seeds and testing model strength combinations down the hundredth of a percent, sometimes stacking 10 LoRAs together at a time, then making X/Y grids of every scheduling + sampling combination across multiple guidance scales.

PENCIL SKETCH

A meditation on solitude, created with a three-base-model workflow utilizing trained LoRAs.

 

Trained a Flux LoRA on impressionistic pencil sketch style and fine-tuned it with 6 other LoRAs in Flux (1-1.5).

Used the Flux image outputs with WAN2.1_I2V_14B to create videos (2-2.5). Ran the WAN video outputs through an experimental Flux V2V workflow that utilizes Animatediff to re-sample each frame with a different seed back inside the original image model pipeline (3-3.5); allowing every frame to be slightly different as if hand drawn, and re-gaining the fine grain paperly textures of the training data that was lost in the WAN video model.

Pencil Flux.png

1. Flux T2I workflow

Pencil Sketch stills_01113_.png

1.1. Flux still

Pencil Sketch stills_01207_.png

1.2. Flux still

Pencil Sketch stills_00894_.png

1.3. Flux still

Pencil Sketch_00210.png

1.4. Flux still

Pencil Sketch_00231.png

1.5. Flux still

Pencil Wan.png

2. WAN2.1_14B I2V Workflow

Pencil train for gif.gif

2.1. WAN Video from flux still

Pencil face for gif 2.gif

2.2. WAN Video from flux still

Pencil Fish for gif.gif

2.3. WAN Video from flux still

Pencil coffee for gif.gif

2.4. WAN Video from flux still

Pencil doorway for gif.gif

2.5. WAN Video from flux still

Pencil Sketch FLux Final.png

3. Flux V2V ReSampling Workflow

Pencil train 2 for gif.gif

3.1. Flux ReSample from WAN video

Pencil face2  for gif 2.gif

3.2. Flux ReSample from WAN video

Pencil Fish 2 for gif.gif

3.3. Flux ReSample from WAN video

Pencil coffee for gif 2.gif

3.4. Flux ReSample from WAN video

Pencil doorway 2 for gif.gif

3.5. Flux ReSample from WAN video

GERTHSEMANE

A collection of my favorite shots I created for the music video "Gerthsemane" by Car Seat Headrest

Directed by Andrew Wonder

 

Trained Flux LoRAs from the film's raw camera data to generate visuals that match the style and characters, then used a separate closed-source Image to Video process to turn the stills into video.

02_selects.00_58_31_19.Still101.jpg
02_selects.00_58_41_15.Still104.jpg
02_selects.00_57_53_00.Still094.jpg
02_selects.00_57_33_13.Still088.jpg

1. Cast training data - Alexa Mini Camera Footage - (4 of 30)

G3RTHS3M4N3 - Face Morphs _00103_.png
G3RTHS3M4N3_00559_.png
G3RTHS3M4N3_00594_.png
G3RTHS3M4N3_00248_.png

Generated Images using trained Flux LoRA(1)

02_selects.00_59_50_17.Still114.jpg
02_selects.01_00_52_10.Still115.jpg
02_selects.00_59_23_00.Still110.jpg
02_selects.00_59_19_02.Still108.jpg

2. Cast training data - Alexa Mini Camera Footage - (4 of 30)

G3RTHS3M4N3 - Face Morphs _00078_.png
G3RTHS3M4N3 - Face Morphs _00294_.png
G3RTHS3M4N3_00355_.png
G3RTHS3M4N3_00344_.png

Generated Images using trained Flux LoRA

Generated Images using trained Flux LoRA(2)

02_selects.00_16_56_22.Still037.jpg
02_selects.00_17_28_18.Still040.jpg
02_selects.00_42_23_11.Still066.jpg
02_selects.00_42_39_09.Still069.jpg

3. Style training data - Alexa Mini Camera Footage - (4 of 30)

G3RTHS3M4N3_01404_.png
G3RTHS3M4N3_00231_.png
G3RTHS3M4N3_00306_.png
G3RTHS3M4N3_01272_.png

Generated Images using trained Flux LoRA(3)

CLOUDS

ComfyUI workflow with LoRAs, IPAdapter, AnimateDiff, Motion LoRA, Prompt Scheduling, Depth Map, and OpenPose

CLouds.png

ComfyUI workflow with LoRAs, IPAdapter, AnimateDiff, Motion LoRA, Prompt Scheduling, Depth Map, and OpenPose

cloudsfallingforgif-ezgif.com-optimize.gif

1. falling asset for comp

cloudssittingupforgif-ezgif.com-optimize.gif

2. Rising asset for comp

cloudsbgforgif-ezgif.com-optimize.gif

3. bg asset for comp

Cloudsmaincompforgif-ezgif.com-optimize.gif

Final comp for AnimateDiff V2V (1+2+3)

MAGIC CONCH

Video to video workflow in ComfyUI with LoRAs, IPAdapter Embeds, AnimateDiff, Controlnet, and Latent upscaling.

conchhandsforgif-ezgif.com-optimize.gif

1. Hands asset for comp

conchshellforgif-ezgif.com-optimize.gif

2. Conch asset for comp

conchbgforgif-ezgif.com-optimize.gif

3. bg asset for comp

Conchcompforgif-ezgif.com-optimize (1).gif

Final comp for AnimateDiff V2V (1+2+3)

TIMESCAPE

An escape from reality through an endless loop, created with images using a custom trained Flux Lora.

BUTTERFLIES

A feeling visualized as a texture, created in ComfyUI with AnimateDiff, IPAdapter Embeds, and Prompt Scheduling.

GOODBYE FANTASY

A poem about letting go of love delusions, created with animated Gaussian Splats, V2V, and ElevenLabs

WATERCOLOR DANCE

A choreographed performance about growth into womanhood, created in ComfyUI with AnimdateDiff and Controlnets

WatercolorDanceforgif-ezgif.com-optimize.gif

Dancing asset for comp

Watercolorbgforgif-ezgif.com-optimize.gif

Bg Asset for comp

Watercolorcompforgifs-ezgif.com-optimize.gif

Final comp for AnimateDiff V2V

VIDEO STYLE TRANSFERS

Various explorations of AnimateDiff with controlnets and IPAdapters using Gaussian splats as input video

AUDIO REACTIVE ART

Melting patterns of the abyss, created in ComfyUI with Yann Audio Reactive Nodes, IPAdapter Embeds, and control

Screenshot 2025-06-05 132029.png

FALLEN ANGEL

A reflection on the cycle of birth and decay, created in ComfyUI with IPAdaper Embeds and AnimateDiff

Fallen Angel.png

IN AND OUT OF THIS WORLD

A depection of what it feels like to be with somebody special

This was one of my early AI creations. I used Gaussian splats of two oil bottles from my kitchen to block out a camera move between two figures, then used Luma Gen 2 with a reference image from Midjourney to style transfer the rudimentary scene into two human bodies. I interpolated two different outputs and did a simple comp and retime in Premiere. To soften out the textures from the early video models, I filmed the output on a screen with a 2005 Kodak ccd digitial camera.

FLORA LOGO ANIMATION

Logo Animation created in ComfyUI with LoRAs, AnimateDiff, IPAdapter, Controlnets, and a simple motion graphic vector as the source video input. 

Flora motion gfx for gifs.gif
FLora.png
bottom of page