High-quality transcription is a big job, but there are times that your project can make do with a quick-and-dirty transcription. For example, I am writing this post while developing a system for using Generative AI to write digital content summaries to websites. This job does not require the same high level of transcription quality that we would expect to be performed in an empirical research project. We are basically condensing an hour of discussion into a fifty word summary, which does not require that kind of granular-level accuracy.
Currently, my experience is that Chat-GPTo does not do a good job of transcribing MP3 files, and OpenAI’s API entails out-of-pocket costs. Fortunately, CUNY does offer an excellent tool for low-quality transcription: Adobe Premiere Pro, which is part of the Adobe Creative Suite. You can access this software on campus labs. Faculty may wish to to contact their campus software administrator about a personal license.
Here are the steps to generate a transcript. Start by creating your workspace. Create a temporary folder and put a copy your episode MP3 into it. We are going to erase this folder at the end, so do not put the original in there.
Then, open Premiere Pro and create a new project in that temporary folder. You should see a screen that asks whether you want to import the MP3 file in that folder, like the image below: (1) choose the file, (2) ensure automatic transcription enabled and other settings correct, and (3) click create
If the software changes or presents itself to you differently, then your task is to find an alternative method to import the MP3 file and then apply Speech to Text in Premiere Pro to that media file.
The results should present themselves to you along the lines of this:
If you click the three horizontally-aligned dots at the top right corner of the transcript panel (#1 in the image above), and navigate to Export > Export to text file… (#2 in image) you can get a text file that looks like this. You can parse out the Speaker labels and time stamps using R, or feed the transcript to a Generative AI model: