Could you confirm if you need the text to be available all the time throughout the call so users can look back on the text or just when someone is speaking?
Depending on what outcome/user experience you're after we currently have Closed captions which is a client sdk implementation that allows users in calls/meetings to see caption of what is spoken, on the other hand we are also working on releasing real-time transcription which is a server side implementation where your application will receive real-time transcript of the call and you can choose what you wish to do with that transcript i.e. surfacing it to your users in your application.