Creating a 10MB video file

by AJ "Tyron" Martinez @ worldsbe.st • May 2 2026

2-ish years ago, Discord reduced their file size limit from 25MB to 10MB. Kinda tight. Pain in the ass for sharing terrible fighting game clips to my friends. But probably no big deal.

get hype for tri-lumina

How hard could it be to compress a video to 10MB?

prior work

Ground floor: 8mb.video, a web service I see people using occasionally.1

Throwing Ring Racers footage into a video encoder is a bit like throwing a cinderblock into a running dryer, but let’s give it a try:

8mb.video - looks like shit guys keep it up

This looks like ass, but that’s normal—this is a free web service, probably trying to keep hardware utilization to a minimum, being asked to compress 2 minutes of repeating high-contrast 3D patterns into a postage stamp. It’s probably a technological miracle it doesn’t look worse.

However, this video is 6.6MB, even though our target size was 8MB. Why? Shouldn’t the encoder have used the other 1.4MB, especially if the result is so obviously in need of help?

Whatever, weird web service. Let’s try VidCoder—open source, uses Handbrake for encoding (everyone seems to swear by this one), explicitly provides a “target size” setting when you’re making an encoding preset. We’ll use a more modern codec (AV1 instead of H2642) and let it churn for a while.

VidCoder - hey there’s pixels here

That definitely looks a lot better. However, it’s 14.6 MB out of a requested 10MB, so it’s useless as a Discord upload. Is this just…not something that software supports? Surely there’s some combination of FFmpeg options that can get us there?

not really

The usual solution you see shared online, and the one in FFmpeg’s docs, is to divide target size by duration to get a target bitrate, then have FFmpeg encode with that bitrate as an average.

This seems reasonable on paper, but it’s sort of annoying. Your audio comes out of the same budget, which means that your actual practical overhead differs based on whether you’re reencoding it or not (and at what bitrate), and then you have to leave a little room for miscellaneous overhead,3 and by the way you have to multiply by 8388.608 to get from mebibytes to kilobits what the fuck ever…

(If I need my computer to do a task and I’m the one doing math for it, I feel intuitively like something’s gone wrong.)

It also still isn’t guaranteed to work, at least for this case. I gave it a try myself; I disabled audio to make the math easy, rounded the video duration up and the bitrate down, and still went over budget by 300KB. Given the same target bitrate, HandBrake (at 720p30 AV1) came in at 8.5MB, whereas boram’s VP8 settings produced a 22MB 720p60 encode, not even close.

My bullshit test output is, like, robustly and comprehensively unusable as a watchable copy of its input, but that’s because I didn’t tune any of the parameters on it besides target bitrate. FIREBLU type shit

This is for a bunch of overlapping technical reasons, like “container overhead” and “codec internals” and “average bitrate is not a mathematical average”…

…mostly, it seems like encoders are just not really designed to do this? Like, it’s not naturally well-suited to how modern video codecs work, and it’s not something that most people care about; many encoders are designed to break bitrate suggestions if it’ll preserve visual quality.4

In practice, every one-stop solution involves trading overshoots for undershoots; nobody seems to care much about doing this. But if you have to use automation to calculate the bitrate anyway, then…maybe an iterative approach could work? What if the answer is just trial-and-error with multiple full encodes until something uses more or less the right amount of data? What if we just guess-and-check the whole process?

What if I did something so stupidly inefficient that no one had bothered to try it yet?

interlude: what works good enough?

You are about to enter the part of this article where I talk about vibecoding a personal solution to this problem for fun. This is your bail point, and I’ll leave you with some concrete software recommendations before I introduce my Horrible Fucking Thing that you probably shouldn’t use.

Golden rule: If you’re using off-the-shelf software and want it to correctly enforce a filesize limit, downscale your output to a reasonable size for the target bitrate. If you only have 600kbps of headroom for video, but you tell your encoder to output a 1440p video, it will tell you to kick rocks and bullshit the bitrate.

enter the lying machine

My previous solution to this problem was, like, six or seven batch scripts, and writing batch makes me feel like I should be in third grade every time I see a quotation mark.

Maybe I should do literally anything else?

Codex’s ability to shift from lying to The Sycophancy Machine when presented with one (1) point of verifiable data makes me feel sort of uncomfortable.

LLMs are somewhat reliably terrible at almost everything, so I almost never use them, but there’s a narrow window of software development where I think they can save a lot of time; when project requirements are narrow and completely set in stone, when you know the exact steps of a procedure but not how to express it with the language or API you’re using, and when failure is easily verifiable and has zero consequences.5

That’s only one dimension of the forever war around LLMs, but “stupid dipshit tool script for encoding fighting game clips” seems like a pretty low-stakes playground for this kind of thing—could make for a fun afternoon, especially if I can freeload off a housemate’s Codex subscription and learn about The Future Of Programming™6 in the meantime.

Introducing DipshitCut.

Those vector icons were defined as raw sequences of points in the code, which strikes me as sort of funny but also huh. Maybe in another year or two these things will do competent ASCII art

DipshitCut’s procedure is pretty simple; pick a target bitrate (mildly undershooting on purpose), pick a reasonable maximum resolution and FPS for that bitrate, try an encode, and try again if the size overshoots or significantly undershoots. Codex configured this to loop a maximum number of eight times—I’m pretty sure I said “lol” out loud, then didn’t change it—but in practice it usually only takes one, and rarely takes more than three, since it can kinda binary-search after the first two attempts.

I think that “pick your constraints” step is the one missing from most tools, and for good reason; most tools are concerned with maximizing output quality given a user-selected set of constraints,7 but basically everything about the design of DipshitCut is intended to make a good enough video relatively quickly, so it makes somewhat-sane constraints up. This approach produces generally worse results on low-motion video, this is definitely not the type of thing you should be using for screencasts, and if you’re sharing a music video, you’re probably unhappy with getting downgraded to 64kbps mono audio.8

I wanted to see how far this could go, so it’s also got a codec selector (including one-pass encodes for NVENC stuff), the ability to switch or disable audio tracks, aspect-ratio cropping for my wackass ultrawide recordings, and the ability to crop to a start and end time, copying the video instead of reencoding if it can. (Theory borrowed from LosslessCut, which is excellent software.)

9.59MB AV1. This is definitely not the best quality you can achieve, but if I insist on keeping 60fps and keeping encoding time relatively quick, it seems like this is a pretty reasonable set of tradeoffs.

I feel obligated to point out that, like, this kinda works. DipshitCut, a project name that accurately reflects my investment in the project, will probably help me post terrible videos in the future. It has all of the features I could think of at the time, set up to work exactly for my workflow and no one else’s, and for the most part I goofed off and made music and talked to my friends while it was being made.

I also definitely felt like Indiana Jones outrunning the technical debt boulder the entire time.

Working with Codex is like directing a two-headed junior dev; every Slack message has an 80% chance to be answered by the right head, who always tells the truth and never lies, and a 20% chance to be answered by the left head, who types exactly like the other guy but is so dumb and confused that it can’t even lie correctly. The left head misdiagnoses bugs, removes working features in order to get credit for fixing their bugs, makes batshit structural changes that you have to go back and rip up later, and is generally committed to misunderstanding everything in the most destructive way possible. If the right head fixes a bug, the left head doesn’t know, and will happily reintroduce that bug because the code Seems Cleaner This Way.

I assume the left head is also the one who insists on getting “Reddit cute” whenever I give it an instruction that is even 1% less dry and technical than it could be.

Neither head ever remembers to give itself log output, instead writing weird inline PowerShell scripts to recreate events, or inventing a dogshit unattended test mode that failed to catch even 100% repro-rate crashes. Every bugfix got faster and smoother when I reminded it “add more logging around this problem area, then read the logfile”. It would then immediately forget that it could do this, and I would have to remind it again—fix root causes instead of removing features. Check your inputs and control flow instead of assuming you know what’s running and when. Look at the fucking output instead of guessing.

Codex is lazy. As a result of that (and my “okay let’s see what you’ve got hotshot” style of oversight—I gave it zero help), this tool is probably kind of ass, and I expect to incrementally hack on it when I discover new and unknown ways that it’s ass. It’s definitely not cross-platform without some work, encoding speed-versus-quality tradeoffs are randomly selected based on how irritated I was with waiting for test encodes, Codex decided whether to copy code or create bad abstractions by flipping a coin, I blindly pulled in LibVLCSharp for the preview pane and its playback isn’t perfect,9 there’s zero error handling, the “precise cut”10 checkbox has created internal behavior branches that look like some shit from Stranger Things, it makes arbitrary choices like switching to mono audio under bitrate pressure (based entirely on my vibes), and Codex randomly put in Intel/AMD hardware-encoding stuff that I cannot possibly test.

Like, it’s a 200MB single-package executable because I want it to live in my Videos folder and take up as little space in the file list as possible, and apparently that self-extracting process is slow enough that sometimes it takes five seconds to show up. I didn’t bundle FFmpeg because it was already in my PATH. Whatever. Maybe a later problem, maybe a never problem.

9.82MB H264. The Parking Garage Rally Circuit DLC came out, by the way, it’s really good.

I don’t really like programming. I like designing procedures and debugging logic and identifying Things That Need Doing, and I accept (begrudgingly) that programming—continually getting slapped in the dick by other peoples’ software design, needing to format exacting instructions for a machine even more inflexible and literal than I am, needing to place those instructions in the exact right place and perform the arcane rituals of The Build System so that all the Supporting Software lets me interact with the fun parts—is the cost of doing this.

In that way, it seems like I’m the target audience for LLM software tooling, and I have to admit, it sure did…uh, produce an output. I learned a handful of incidental things while “working on” it, useful observations about video codecs and the software responsible for working with them, and I ended up with a tool that I’ll probably use in the future; I’m mostly glad to have done this, and I like the idea that what I’ve learned might help people navigate similar problems.

But I think I might like programming more than I expected. The result of this strange, feverish process doesn’t really feel like My Software, and I didn’t really get any of the feel-good neurotransmitters that I associate with building useful things. Even after spending an afternoon and change on it, it still feels like something that teleported half-formed into my Videos folder and lives there as a brutalist solution to a problem; I think I skipped a lot of decisions in the process of “making” it. It makes me feel a little strange.

Also, when I encode a long test file with NVENC H264 targeting 10MB, it does this:

attempt 1: 860x360, 523k video, audio 64 kbps mono, source fps, 10.23 MB (102.3% of target)
attempt 2: 860x360, 502k video, audio 64 kbps mono, source fps, 9.05 MB (90.5% of target)
attempt 3: 860x360, 517k video, audio 64 kbps mono, source fps, 9.04 MB (90.4% of target)
attempt 4: 860x360, 521k video, audio 64 kbps mono, source fps, 9.04 MB (90.4% of target)
attempt 5: 860x360, 522k video, audio 64 kbps mono, source fps, 10.23 MB (102.3% of target)
attempt 6: 860x360, 521k video, audio 64 kbps mono, source fps, 9.04 MB (90.4% of target)

Huh? Why does changing the target bitrate from 522,000 to 521,000 shave 10% off the file size? Why does this only happen with NVENC H264? What is the codec quantizing and why?

takeaways

(This is the way I end articles when I have no fucking ending.)

don’t download

DipshitCut is vibecoded ass that could delete your computer. Actually, it could do literally anything. You probably shouldn’t use it, and I’m only providing it because it would feel a bit like dangling chocolate in front of my readers to say “btw I have a tool that solves a problem you might have” and then refuse to give it up.

YOU ACCEPT EVERYTHING THAT WILL HAPPEN FROM NOW ON.

Download DipshitCut + source (~200MB).

Also includes the narrower console tool that I prototyped first, fitvideo, which is literally only for turning 21:9 Steam Game Recording clips into Discord uploads. (It’ll probably work fine on 16:9 video, too.) fitvideo might be easier to adapt to your use case or port to other platforms. (I don’t know anything about .NET on Linux, either of these might randomly work.)

ffmpeg and ffprobe must be in your PATH or next to DipshitCut.exe. On Windows, I recommend ffmpeg-git-full from gyan.dev.

Right-clicking the “start” button will produce a log about the encode. Right-clicking the “go to start” button will play your selected clip start-to-end, then stop. Codec selection doesn’t do anything if you don’t select an operation that requires the video to be reencoded (“precise cut”, size limit, aspect ratio crop).

literally me when i produce software without writing it(?)

  1. 8MB was the “old” limit, before it was raised to 25MB and then dropped back down to 10MB. ↩︎

  2. I’m not a visuals guy, so it sort of surprised me that modern codecs really are noticeably better than H264. I’ll still probably end up using H264 for Advent Calendar writeups (compatibility reasons), but…now it’ll make me slightly sadder? ↩︎

  3. Ring Racers has an in-game video recorder that actually tries to cover for some of this; if you set a limit of 10MB, it stops at 9.8 or 9.9 to allow the encoder to finish up without going over the line. But it does that by stopping the recording↩︎

  4. Boram’s VP8 parameters shit the bed at 720p, but came in at a comfy 9.8MB in 240p—because the encoder saw “240p” and went “okay, less detail seems fine”. Also, Boram is cool software and I wish it was maintained, its AV1 support uses libaom and takes approximately the age of the universe to encode one frame. ↩︎

  5. That last condition feels like the important one for all LLM use, honestly. Fundamentally, these are probabilistic pattern noticers. ↩︎

  6. To be clear, I don’t actually think LLMs are the future. I’m not really convinced they’re the present, either; they’re a tool that can observably save a lot of time in specific circumstances, but I think we’re all going to have a really Exciting™ couple of years once they stop getting propped up by infinite-growth freakazoid VC hype and get priced in correctly.

    I am also not a superforecaster, economist, or AI expert. I am Some Guy. Please don’t mindlessly import my opinions, I put my pants on backwards sometimes. ↩︎

  7. Boram actually uses the slowest possible mode for every codec it supports, which produces nice-looking files but is definitely not compatible with a try-multiple-times approach. A VP9 encode of that test file processed at 0.1x speed. ↩︎

  8. During the writing of this article, I patched overrides for some of these behaviors in, but they’re still worse than a well-thought-through process for that specific purpose. ↩︎

  9. I wanted to just use default Windows stuff, but Windows doesn’t speak WebM. I think most projects end up using mpv in some way for this. ↩︎

  10. Without reencoding the head and tail of a video clip, FFmpeg has to crop at keyframes, which means that the duration of your clip might be different from the selected in/out points. ↩︎