AI based transcription in the browser

OpenAI whisper is an automatic speech recognition system that can transform any speech in audio or video file into written text. Thanks to transformers.js, ONNX.js and web assembly, it can run in your browser. So, you can transcribe audio and video files without uploading them to an external server. This page will show you some of the cool things you can do with this browser-based version of whisper, such as:

How to transcribe audio and video files without an external server. (This is a statically generated website, so no server is involved.)
How to interactively search through an HTML5-based video/audio element using the generated transcripts.
How to export the generated transcripts to a CSV file.

Currently, ONNX.js has no GPU support, so the transcription process can be a bit slower for the larger models. However, the tiny and small models’ transcription process is still very fast. Want to try it out yourself? All you need to do is provide an audio or video file in the form below.

If you find this application useful and would like to support its development, you can buy me a coffee.

AI based transcription in the browser

Input

Transcription