Video Creation in the AI Era

In recent years, I have undergone some changes. I used to be a book blogger (lol), and the public channels I operated were mainly text-based. Last year, I started to get involved in video creation in my work and life, writing about my experiences and immature ideas in this field.

Elements of Video Creation#

The production of a video mainly involves the following elements:

Element	Description	Corresponding Tools
Storyline	A clear, complete, and coherent story, including rhythm, narration, camera work, etc.	Large language models like ChatGPT
Visual Assets	Images and videos that meet the aesthetic requirements of the story	Midjourney for image creation, Runway, Pika, Stable Video for video creation
Audio Assets	Background music and sound effects that fit the story and rhythm	Suno for audio creation, Optimizer AI for sound effects creation
Editing Skills	Editing, transitions, and special effects that effectively convey the theme of the story	There are some AI tools on the market that can automatically arrange stories, but their usability is average. Regular editing tools are still mainly used for editing.

1-Product Comparison

（Product Comparison Chart）

The explosion of AI technology has driven the development of many video creation tools, and some video elements can be completed with the assistance of AI tools.

With the support of powerful AI tools, creating a video may seem simple, but in reality, it still requires a lot of effort to make a good video.

Video Creation in the Age of AI#

Although AI tools have helped us a lot, the ability to tell a good story well is still the core competitiveness of a creator.

First and foremost, a good story is the most important part of a video. Although ChatGPT can generate story scripts, it still relies on the inputter's control of the prompt and the quality judgment of the output story. Machines have no emotions, while humans have emotions and desires. A good story that can move people still requires human intervention for optimization.

2-Script Design

（ChatGPT Assisted Story Script Generation）

For experienced creators, materials and editing are relatively easy, but coming up with a story can be time-consuming.

With a good story, the process of telling the story well also requires a considerable cost, despite the powerful AI tools available today. Here are a few examples:

In terms of visual assets, achieving the desired visual style and character consistency for a story can be a challenge. Midjourney is a product that balances operational costs and usability. In the early stages, it could maintain visual style and character consistency through placeholder images (placeholder URL + prompt) and style references (prompt + --sref reference image URL), but the accuracy was not very high. Therefore, to create ten shots, constant adjustments are often needed, which can take hours and generate hundreds of images.

The recently released Character Reference (prompt + --cref reference image URL) in Midjourney has greatly improved this aspect, reducing the production cost.

（Midjourney Character Design）
The usability of video creation tools like Runway, Pika, and Stable Video for text-based video and image-based video is average. Currently, the videos generated by these tools are mostly short clips that are then combined into a video. The customization options for text-based videos are poor, making it difficult to control the characters and visual style in the clips. Image-based videos have better control over visual style consistency but are generally lacking in animation effects. They work well for generating explosion effects, flames, linear movements, etc., but tend to struggle with complex animations.

（Alien Creature via Stable Video）
Music and sound effects, which are often overlooked but essential, can increase the enjoyment and interest of a video and evoke resonance. There are few products on the market that focus on this area, as most editing tools have a large library of materials that meet the needs of creators. However, there is still demand in this niche field, and some teams are investing in it. Suno is a product for creating text-based music, Pika has introduced sound effects generation[^2], and there is a product called Optimizer AI that is working on text-based sound effects. The capabilities of AI complement the shortcomings of material libraries and greatly facilitate the addition of sound effects.

（Pika's Sound Effects Generation Example）
Video editing, combining materials based on the story, adjusting the rhythm with music, and adding transitions and special effects are all significant workloads. This is basically manual labor, with limited assistance from AI tools. Once you become proficient, you can improve efficiency using certain tools, such as batch tools created by developers to add keyframes to similar works.

To tell a good story and create a good video, the cost is not low.

My Thoughts on AI-Generated Videos#

Tools like Midjourney for image generation and Runway for animation have accelerated the creative process and lowered the barrier to entry, allowing ordinary people to create decent promotional videos without the need for a professional team. The once dazzling and unattainable gem is now within reach for ordinary individuals. The abundance of cyberpunk, outer space, and futuristic videos on the internet is largely thanks to these types of tools.

However, the majority of these videos are just "wow, that's cool" and then nothing more. They lack watchability due to fragmented segments, shaky footage, and repetitive styles, which easily lead to aesthetic fatigue. Many of these videos, in my opinion, do not even meet the criteria of "aesthetic". To create works with a certain level of aesthetics, significant effort is required to fine-tune the generated results, which is even more challenging than generating images with Midjourney.

（Inconsistent Character Movements via Pika）

Results with limited controllability do not resonate with content consumers. By observing such videos on social platforms, the audience interaction is generally low, and the attention from viewers is minimal, with only a few thousand views at most. This indicates that consumers are not buying into this type of content. Such hasty production only brings the "pleasure" of creating something for the authors.

However, as mentioned earlier, the characteristics of AI-generated content are efficiency improvement and great creativity. If creators make full use of these characteristics, they can achieve significant effects.

The above is the current situation, and the future development is still unknown, as AI technology is advancing rapidly. Take Sora, showcased by OpenAI, as an example. It may lead to the demise of various products like Runway or their transformation into other tools. However, it is worth mentioning that Sora is just an intermediate product, and its impact on video creation tools may be equivalent to the impact of ChatGPT on translation tools when it was released.

Although Sora can further reduce production costs, one thing remains unchanged: how to create a compelling story will always be the most important challenge for creators.

What I'm Doing with AI#

Since December 1, 2022, when I posted "ChatGPT YES!" on my social media, various AI tools have indeed helped me a lot. Regarding AI-assisted video generation, I only started to get involved in it last year through my work and daily life. As mentioned above, I also create some cool but useless videos purely for my own entertainment and to explore the new capabilities of these tools. But what I truly intend to do and am currently doing is creating content for children. Here are the reasons:

Simple stories, short duration, and suitable visual styles for AI generation. Familiar stories can be turned into picture books, such as "The Boy Who Cried Wolf" and "Little Red Riding Hood". AI can assist in creating scripts, and Midjourney can generate visuals.
Valuable and meaningful content with a wide audience. Compared to the "self-pleasing" cool videos I make, this type of content is more meaningful. For example, bedtime stories and English enlightenment can help parents better accompany and educate their children.
Personal reasons, recording for future children and immersing myself in the role.

The content I create cannot match the excitement of stories created by professional teams, and I have no intention of making comparisons. Even the visuals are not animated. First, the current technology for creating animations is costly in terms of various aspects. Second, the generated shaky footage and unrealistic movements do not meet my aesthetic standards.

In addition, it is also fun to turn interesting experiences around me into videos. For example, my nephew was trying to put the baby to sleep, but he fell asleep first and started snoring, which made the baby frown and open his eyes wide, as if saying, "Brother, are you being polite? I haven't fallen asleep yet." Based on this, I adapted and created a short film:

From planning to exporting the video, it took a total of two to three hours. The general process is as follows:

Conceptualize the theme and discuss the script and narration with ChatGPT.
Use Midjourney to generate and select the character images for the mother, brother, and little baby.
Generate visual scenes based on the storyboard and characters.
Generate TTS narration and import all the materials into the editing tool for editing.

Although it is not easy, I enjoy the process and put real effort into creating these videos, including the previous story collection.

Conclusion#

With the support of AI, the barrier to video production has been lowered, allowing ordinary people to reap the rewards. However, creating a truly outstanding video will always be the domain of a few. These individuals possess unique talents and can write compelling stories and express them well.

So, go ahead and write, go ahead and create! You won't know what you can achieve until you try.

P.S.: I welcome interested friends to exchange ideas, as I am also learning and striving to improve...