Caryn.AI is a virtual persona synthesized using AI virtual character technology, based on the internet celebrity Caryn Marjorie. The Caryn Marjorie is a influencer with 2 million followers on SnapChat. She launched Caryn AI, an AI chatbot based on the GPT-4 API interface, which possesses her voice, speech, and personality.
Caryn AI does not have a standalone app and can only be accessed through a Telegram group, with a chat cost of 1 USD per minute, which is dozens of times more expensive than international long-distance calls. However, this product is surprisingly popular and well-received by fans. Within a week of its launch, Caryn AI's revenue has already exceeded 100,000 USD, and according to Caryn's estimates, the monthly income is expected to reach 5 million USD.
Having an AI-generated girlfriend is no longer an exclusive privilege seen in science fiction movies like 'Her'. In recent years, there has been an exponential trend in allowing the creation of companion apps tailored to individual tastes, and their products have become more realistic. With the advancements in generative AI chatbots like ChatGPT, Bard, and others, it's not surprising that conversations with machines have become a part of the interpersonal relationship sphere. There are numerous options like Replika, Eva AI, Intimate, DreamGF, and RomanticAI, all featuring similar functionalities and characteristics.
Create a AI Companion
Actually, creating a virtual girlfriend or boyfriend is quite simple. The website Eval AI can design and produce a virtual companion based on your personal preferences.
The first step is to choose an avatar, which can be male or female, although some apps are designed exclusively for heterosexual male audiences, offering only female companions. To enable unlimited interaction, including sending written messages, voice notes, and accessing photos and videos of the girlfriend, you must pay a fee. The most advanced apps offer the possibility to choose all physical features of the future companion, from eye color to hairstyle, from body type to race.
The slogan of one app perfectly summarizes the degree of creative freedom and control over the virtual girlfriend: 'Immerse yourself in your desires with Eva AI. Control it as you wish.'
The login page of the website states: 'Create and connect with a virtual AI companion that will listen, respond, and value you. Build relationships and intimacy in your own way.' A person can find a customized companion, which is impossible in real life, but thanks to AI, this impossibility becomes possible in the virtual world.
Technical Challenges
The biggest technical challenge lies in emotionally charged interactions. The preferred mode of interaction is speech.
Since virtual companions are primarily focused on providing company and conversation, they require an understanding of and the ability to accompany emotions. When you are sad, the virtual person needs to comfort you; when you are happy, the virtual person needs to share your joy. To truly empathize and accompany in both sorrow and joy, emotional conversation skills are essential.
This presents significant technical challenges in both speech recognition and speech synthesis within voice interactions. To endow a virtual companion with emotions, a framework must be designed that is capable of both accurate speech recognition and precise emotional detection. This falls under the category of multi-task learning, with a general framework as follows:
The most challenging aspect of the entire process framework is the need to acquire speech data that includes both text annotation and emotional annotation. The collection of such data is costly, requiring professional guidance and annotation, and is therefore relatively scarce. This bottleneck becomes the main obstacle limiting the development of emotional communication in virtual humans.
Emotional Speech Recognition Data
In response to the challenges mentioned above, DataOceanAI has launched our new professionally recorded emotional annotation voice recognition database. This dataset can be used to train virtual companions or other conversational robots, perfectly fitting everyday use scenarios and emotional needs.
American Spanish Recognition Voice Corpus – Conversation
The corpus contains 25 groups of daily spontaneous conversational speech, which were from 50 speakers. The pure recording time is about 30 hours including the reasonable leading and trailing silence. The total size of this database is 8.36 GB.
The annotation is meticulous, including text annotation, topic annotation, and two-layer emotional annotation. The seven two-layer emotional annotations include (neutral, happy, sad, disgusted, angry, fear, positive surprise, negative surprise, and NA) and (neutral, positive, negative, and NA).
The range of topics is very broad, covering life, entertainment, health, geography, etc., such as work, travel, education, movies, music, hobbies, sports, health, food, pet, geography, countries, and more.
Contact us for data samples: contact@dataoceanai.com