Thanks for the job. this is the first pure speech modal conversation I have read. but there is some question in my mind.
currenty speech conversational synthesize jobs are bridged with text, so that it can use NLP- LLM as the brain. many security and controlibility are done in the LLM model.
but in the prue audio speech-to-speech, there is no such convenient intermedia, the 8Hz speech codes are unreadable and are not semantic centered. so how to control the response content of the hertz model?
Thanks for the job. this is the first pure speech modal conversation I have read. but there is some question in my mind.
currenty speech conversational synthesize jobs are bridged with text, so that it can use NLP- LLM as the brain. many security and controlibility are done in the LLM model.
but in the prue audio speech-to-speech, there is no such convenient intermedia, the 8Hz speech codes are unreadable and are not semantic centered. so how to control the response content of the hertz model?