It looks like this link mentions voice styles is possible with custom neural voice. I guess main question remaining is whether different styles are separate models or not in terms of endpoint hosting in pricing : )
--
"Work with your voice talent to develop a "persona" that defines the overall sound and emotional tone of the custom neural voice. In the process, you'll pinpoint what "neutral" sounds like for that persona. Using the Custom Neural Voice capability, you can train a model that speaks with emotions. Define the "speaking styles" and ask your voice talent to read the script in a way that resonates the styles you want"