When techies and philosophers discuss the implications of AI, it’s often about limits and eventualities (eg, the paperclip maximizer, The Matrix).
In some cases, it's a debate about “when, not if”:
Bill Gates famously said, “Most people overestimate what they can achieve in a year and underestimate what they can achieve in ten years.” AI is certainly throwing a wrench in this tidy heuristic – shortening the one year and making the ten years even more uncertain – but generally I think it's a useful reminder that a lot of stuff can happen in a decade, often thanks to the counterintuitiveness1 of exponential growth.
Text- and image-based large language models (LLM) like GPT-3 and DALL-E exploded on the scene over the last couple years. Culturally, it felt like a turning point in accessibility: AI went from being this thing that researchers and coders at Google do, to something that anyone on Earth can pick up and play with.
While it’s possible I may be overestimating the exact present day cultural influence of AI, it’s equally likely that this whole essay will feel stale in six months. (Thanks for being an early subscriber).
In trying to predict the future a bit, it seems reasonable enough that we can extrapolate the consumer-friendly, text-based GPT-3 and image-based DALL-E to a video version equivalent: a program that spins up video that could pass as something humans made. Sure, video is way more complex than static images or text, but the jump from image+text to video seems less earth-shattering than the jump from nothing to what we have today.
So what's the simplest form of video content an AI can reliably produce?
Yep, you guessed it – reality TV!
Take a show like Indian Matchmaking or Love is Blind (sorry Netflix, you're just really good at producing filler content). There's a very clearly defined template, or overarching data structure, and all you need on top is AI-generated human participants (that's easy), a script (also easy, just scrape Instagram comments to learn how Millennials talk), maybe a couple of recurring characters (Sima Aunty, Nick Lachey, etc.) and then some medium-sized decision trees of all possible character interactions – oh, and that trademark nausea-inducing transition music, which may as well also be AI-produced – and you have all you need for the show.
(Obviously I'm being facetious and downplaying what would be an orders-of-magnitude lift from the current image and text AI programs. That said, I fully believe some version of it is doable within ten years.)
This brings us to an optimist-pessimist fork in the road. On one hand, producing truly engaging video content with the click of a button is incredibly inspiring – imagine hyperlocalized versions of shows or movies tailored to a small, cinematically underrepresented segment of the population. And it could be applied to fields beyond entertainment, such as education, where we sorely need more toolage and capital.
On the other hand, we are already pretty addicted to a somewhat limited supply of filler content. If we consume infinite Love is Blinds, what happens to our minds? That might be a dark outcome.
Now, you may say we effectively have infinite filler content at our disposal today, and introducing more of it doesn't really change our consumption patterns. You'd be arguing for individual choice, for market efficiency. After all, it's not like we only eat sugary foods, totally ignoring our physical health – people make decisions and indulge as they see fit.
Where you fall on the optimist-pessimist interpretation of a world with AI video is not really the important thing here. The important thing is that this is one silly, low-hanging example of what lies ahead, and even with this one silly example, there’s a lot to think about.
And keep in mind, it's a fairly linear progression from what we have today, in the real world2.
One of the best visualizations I've ever heard for a counterintuitive concept is the Infinite Monkey Theorem, which states that a monkey typing on a keyboard *over an infinite amount of time* will eventually reproduce the works of Shakespeare. It's the kind of thing that falls into the "limits and eventualities" bucket (because I am not good at math, and couldn't come up with better jargon), but it crystallizes a concept that's otherwise pretty hard to stomach.
Sorry, I had to. I was really struggling with the conclusion; perhaps it’s an appropriate nod to the open-endedness of this topic. Also, this is the most I’ve thought about reality TV in quite a while*, so forgive any mental fuzziness.
*That’s a lie. I formulated predictions for what happens at the altar for the couples in Love is Blind Season 3. Final score: predicted 4/5 correctly.
It’s here! https://makeavideo.studio/
..but can only make tiny clips right now in a reasonable amount of time
... and I don’t think I’d watch a show that had this in it
https://makeavideo.studio/assets/A_knight_riding_on_a_horse_through_the_countryside_second_upsample__12.webp
unless it had Mystery Science Theater 3000-style commentary over it