科技博主讲一下
博主帽子上印着AI
中英文双语字幕
传入的视频是卡帕西的推文录屏
发布在抖音,前两秒要迅速抓住观众注意力
讲AI大神卡帕西Andrej Karpathy 最新的一篇推文
Very impressed with Veo 3 and all the things people are finding on r/aivideo etc. Makes a big difference qualitatively when you add audio.
There are a few macro aspects to video generation that may not be fully appreciated:
1. Video is the highest bandwidth input to brain. Not just for entertainment but also for work/learning - think diagrams, charts, animations, etc.
2. Video is the most easy/fun. The average person doesn't like reading/writing, it's very effortful. Anyone can (and wants to) engage with video.
3. The barrier to creating videos is -> 0.
4. For the first time, video is directly optimizable.
I have to emphasize/explain the gravity of (4) a bit more. Until now, video has been all about indexing, ranking and serving a finite set of candidates that are (expensively) created by humans. If you are TikTok and you want to keep the attention of a person, the name of the game is to get creators to make videos, and then figure out which video to serve to which person. Collectively, the system of "human creators learning what people like and then ranking algorithms learning how to best show a video to a person" is a very, very poor optimizer. Ok, people are already addicted to TikTok so clearly it's pretty decent, but it's imo nowhere near what is possible in principle.
The videos coming from Veo 3 and friends are the output of a neural network. This is a differentiable process. So you can now take arbitrary objectives, and crush them with gradient descent. I expect that this optimizer will turn out to be significantly, significantly more powerful than what we've seen so far. Even just the iterative, discrete process of optimizing prompts alone via both humans or AIs (and leaving parameters unchanged) may be a strong enough optimizer. So now we can take e.g. engagement (or pupil dilations or etc.) and optimize generated videos directly against that. Or we take ad click conversion and directly optimize against that.
Why index a finite set of videos when you can generate them infinitely and optimize them directly.
I think video has the potential to be an incredible surface for AI -> human communication, future AI GUIs etc. Think about how much easier it is to grok something from a really great diagram or an animation instead of a wall of text. And an incredible medium for human creativity. But this native, high bandwidth medium is also becoming directly optimizable. Imo, TikTok is nothing compared to what is possible. And I'm not so sure that we will like what "optimal" looks like.