AI LLM make virtual anchors no longer out of reach
Release time:2023/11/22
Back list
The year 2021 marked the beginning of the virtual anchor era. By the end of 2022, the ChatGPT was launched, and in 2023, there was a surge in the development of LLM for voice, text, and multimodal inputs. So, has the virtual anchor, riding on the wave of LLM technology, taken off?
The answer is a resounding yes!
Recently, the 'MIT Technology Review' published an article titled 'Deepfakes of Chinese Influencers are Livestreaming 24/7'. The article focuses on the significant advantages of AI virtual anchor technology in the e-commerce industry, notably in cost reduction and efficiency enhancement. With just a few minutes of training video material and a cost of around 1000 USD, brands can achieve round-the-clock live product sales.
Deep Integration of LLM and Virtual Anchors
Below is the classic technological route for the construction of virtual anchor deeply integrated with AI models. Under the augmentation of LLM, both text and voice inputs are recognized by these LLM (be it text or voice models), which then clearly analyze user intentions. Following this, AI model decision-making processes occur, culminating in voice feedback. This process involves many customized scenarios, such as the custom design of the virtual anchor's appearance, motion design, and application scenario scripting, among others. These custom designs require further fine-tuning on top of the LLM, adapting originally high-performance models like voice synthesis to fit the current scenario, tone, or business needs.
Virtual anchor enhanced by LLM have numerous advantages and development potential compared to those originally set with predefined scripts. Compared to previously designed virtual anchor, those augmented with LLM possess improved interactive skills.
Utilizing advanced LLM technology, virtual anchor are continually evolving in their ability to express, interact, and customize. They can engage in real-time conversations and interactions with people, adeptly understanding and meeting human needs. This degree of personalization enables them to provide tailored services and support.
Due to the substantial capabilities of LLM, their broad data coverage, and robust model resilience, these models can significantly extend the commercial applications of virtual anchors. The realm of virtual anchor now spans various sectors, including livestreaming, advertising, marketing, customer service, education, social media, gaming, and entertainment.
As technology advances, their range of applications is set to expand even further. Expanding their application fields requires only minimal fine-tuning with domain-specific data or adaptation, enabling rapid customization of virtual anchor according to user preferences using LLM, thus saving on production costs. This approach enables AI-driven virtual anchor to offer services and recommendations tailored to individual user preferences and needs, not only enhancing user satisfaction but also fostering brand loyalty.
Domain-specific Data is Crucial
For instance, in the case of a large voice synthesis model customized for e-commerce live streaming, domain-specific data from e-commerce live streams is needed to fine-tune the entire model, making it more aligned and compatible with current application scenarios and user preferences. Additionally, customizing personalized voice tones requires a small amount of targeted voice data, guiding the model to synthesize the desired voice tone. Therefore, during the adaptation of LLM to virtual anchor, domain-specific voice data is essential.
Due to the high costs of collecting and annotating e-commerce data, professional data companies like DataOceanAI are necessary. DataOceanAI is dedicated to producing large-scale voice, text, image, and multimodal data, striving to contribute to the development and implementation of LLM. Our company has a vast amount of voice live streaming data. This data can be used on the one hand to adaptively fine-tune LLM,
and on the other hand, to independently construct or study voice synthesis technologies. These data include:
- Russian Female Speech Synthesis Corpus(Multi Style)
King-TTS-163 >>>Learn more
- Korea Korean Male Speech Synthesis Corpus(Multi Style)
King-TTS-028 >>>Learn more
- Thai Male Voice Synthesis Library - Light-hearted Style
King-TTS-154 >>>Learn more
- Chinese Female Speech Synthesis Corpus (Live Streaming Style)
King-TTS-179 >>>Learn more
The styles of these data sets are diverse and include the smaller language of Thai, which is essential for applications in Southeast Asia and multiple scenarios. With the enhancement of LLM, perhaps one day each of us could have a virtual anchor based on ourselves in a two-dimensional world. This echoes the prediction of Kevin Kelly, author of 'Out of Control,' about the technological trends in the thirty years following the advent of the internet: 'In the future world, everything in the real world will have a chip, the entire world will be digitized, and everything will have a counterpart in the virtual digital world, like a mirror to the real world.