Use auxiliary text semantics to assist speech generation (language remains same as main text)
Note: Avoid using command-style text (e.g., 'Happy'). Use emotionally rich text (e.g., 'I'm so happy!!!')
Leave it blank to disable.
If mispronunciations occur, try replacing characters and inputting the original here with weight set to 1.0 for semantic retention.
Ratio between main and auxiliary BERT embeddings