AI based transcription in the browser

OpenAI whisper is an automatic speech recognition system that can transform any speech in audio or video file into written text. Thanks to transformers.js, ONNX.js and web assembly, it can run in your browser. So, you can transcribe audio and video files without uploading them to an external server. This page will show you some of the cool things you can do with this browser-based version of whisper, such as:

  • How to transcribe audio and video files without an external server. (This is a statically generated website, so no server is involved.)
  • How to interactively search through an HTML5-based video/audio element using the generated transcripts.
  • How to export the generated transcripts to a CSV file.

Currently, ONNX.js has no GPU support, so the transcription process can be a bit slower for the larger models. However, the tiny and small models’ transcription process is still very fast. Want to try it out yourself? All you need to do is provide an audio or video file in the form below.

We only support audio and video files.
Large will be more accurate, but slower. In the models are listed in order of size. The models with .en at the only support English but are slightly more accurate.



Transcription progress:
The transcribed text may change as the generation process continues. This is normal and happens because the model picks the most likely sentence (beam search), which can change when more audio has been processed.