News

HOME About us

News

Llama 2 Global Partner DATAOCEAN AI announced LLM datasets -DOTS-NLP-216
Release time:2023/08/08
Back list

DATAOCEAN AI is proud to be a Llama 2 Launch Partner, empowering large models with high-quality training datasets. As supporters of statement of support for ’s Open Approach to Today’s AI, DATAOCEAN AI's Chief Operating Officer, Ke Li, and Chief Technology Officer, Yukai Huang, encourage such open-source approach, “We support an open innovation approach to AI. Responsible and open innovation gives us all a stake in the AI development process, bringing visibility, scrutiny and trust to these technologies. Opening today's Llama models will let everyone benefit from this technology.”

https://about.fb.com/news/2023/07/llama-2-statement-of-support/

 

Meanwhile, DATAOCEAN AI officially announced the "Chinese 10-Million-Rounds Conversation Corpus DOTS-NLP-216" for LLM research.


Dataset introduction:
The natural conversations in line with Chinese natural habits collected under real scenes will bring new momentum to the Chinese Large Language Model (LLM). On the basis of security compliance, the dataset provides better performance and robustness for large models, helping enterprises to build high-quality generative AI applications with ease. This datasets covers multiple scenarios, such as  work, life, in campus, and as well as finance, education, entertainment, sports, auto, technology fields.

Dataset Advantages:
· Multiple rounds of conversational datasets in Chinese: in line with Chinese natural habits, natural conversations collected under real scenes
· Ultra-large scale: hundreds of millions of tokens
· Easy to Train: finished, complete dataset
· For Commercial use : can be authorized for commercial use

Samples:

 

 

 

Contact us for more information of DOTS-NLP-216:

https://en.speechocean.com/datacenter/details/3243.html

Follow Us
Related recommendations

contact@dataoceanai.com

分享到微信朋友圈

打开微信,点击底部的"发现"

使用“扫一扫”即可将网页分享至朋友圈。