Tech tools for business: best text-based video editing apps

AI is a game-changer in content creation for podcasters and filmmakers, including time-saving text-based video (and audio) editing apps.
5 July 2023

Looking for a less complicated content creation workflow? Text-based video (and audio) editing is proving to be popular with training video producers, podcasters, and other corporate users.

Getting your Trinity Audio player ready...

In May 2023, Adobe released a tutorial on how to use text-based video editing in Premier Pro, having added the feature to its flagship software. And the official arrival (previously, there were workarounds using XML imports) of this breakthrough way of editing video in leading content creation software shows how popular text-based video (and audio) editing is becoming.

The amount of time required to turn raw footage into a finished video depends on a number of factors, such as the complexity of the project and the talents of the production crew. But, as a rule of thumb, editors can expect to spend anywhere from 30-60 minutes on a project for every minute of the finished video. Or at least that used to be the case.

What is text-based video editing?

As content creators are discovering, text-based video editing – which allows users to edit a video as if they were editing a document (and it works for audio too, if you are making a podcast) – dramatically speeds up film production and splicing up clips for use on social media.

Videographers (or podcasters) can have a rough cut of their work in minutes rather than hours. And for those that are new to editing videos or spoken audio works, being able to use familiar document editing skills makes it easy to jump up the multimedia learning curve.

And how does text-based video editing work?

AI transcription speech-to-text services have been around for some time (and applied to various industrial use cases), and their success paves the way for text-based video editing. Adobe’s Premier Pro integration gives a good overview of how the process works. Firstly, AI speech-to-text algorithms transcribe the source footage. And when complete, the text appears in a transcript window.

Cleverly, because the text has been matched to the video, users can edit the video timeline by simply moving words or phrases rather than having to cut and paste clips. Typically, users find text-based video editing a much more intuitive process and can focus their efforts on the content rather than having to worry about the technicalities of splicing footage.

Even seasoned video editors, writing on user forums, talk about how creating text-based video edits has changed their life. Some even go as far to say that being able to perform text-based video editing natively within a non-linear editing (NLE) system is one of the most significant advancements in digital editing. And the concept has been gathering pace for a while.

In fact, the idea of editing film and audio as if you were editing a text document has been around for over a decade. And tools have existed for a similar period of time – for example, prEdit has long allowed users to make subclips from transcribed media files and send the story to Final Cut Pro or Premiere Pro as a cut sequence ready for further editing.

What’s changed is the growth and capability of cloud-based speech-to-text transcription services that fully automate spoken audio processing and support a variety of languages. And the availability of a wide number of apps means that you don’t need to be a professional filmmaker, or even own a copy of Adobe Premier, Final Cut Pro, or DaVinci Resolve to benefit from text-based video (and audio) editing.

And there are a ton of ways that having auto-generated, time-matched transcriptions helps content creators. Users can very quickly search for topics rather than having to scroll through and listen to the audio. Text-based video editing makes it easy to remove any so-called disfluencies or hesitations in the dialogue, such as mentions of ‘um’ or ‘er’, vocal mistakes, and any overused filler phrases – like, you know, that kind of thing.

Solutions can be air-gapped too – for example, if you’re working on sensitive interviews, confidential recordings, or a blockbuster movie that you want to keep under wraps until its official launch date.

Synthetic voice: no need for microphone overdubs

Also, app users don’t have to give up on pro features. And there are some interesting software additions too – for example, Descript and Simon Says allow users to collaborate remotely on text-based multimedia editing.

Descript’s podcast studio tool moves the needle further by integrating synthetic voice capabilities. Users can delete an unwanted word or phrase in the time-synchronized transcript, type new dialogue, and the text-based editor will synthesize and insert the audio content to match the existing voice track.

On the video image side of things, AI algorithms can even digitally rotate a speaker’s eyeballs (link to video demo) so that the subject can (in real life) be looking down, reading from a script, but (in video) appear to be looking at the camera. And this really only begins to touch on what’s possible.

Opinions vary, but text-based video editing can be anywhere from 5 – 12 times faster than using conventional methods. In principle, as Milk Video’s creative team points out, trimming scenes using a transcript allows users to edit video at the speed of reading. And for organizations that produce a lot of video content, training materials, podcasts, and other multimedia, that’s a big time-saving.

12 best text-based video (and audio) editing apps

  1. Descript
  2. Streamlabs Podcast Editor
  3. Riverside
  4. Trint
  5. Simon Says
  6. Camtasia Audiate
  7. Sonix AudioText Editor
  8. Kapwing
  9. Milk Video
  10. Imvidu
  11. Pictory
  12. Reduct Video

There are some great community projects too, and examples include the text-based video editing proof of concept built by Radamés Ajna and shared on Hugging Face.

When big players such as Adobe make text-based video editing a standard feature, you can be sure that the landscape has changed. And once content makers start using these tools, they won’t want to give those time-savings back.