Multi-modal diversified training data of various scenarios helps achieve accurate mutual sensing and interaction between household appliances, and contributes to the development of a fully intelligent home Image data of general s, pets, and people of different ages and genders, and daily content and speech data for voiceprint recognition enable recognize of different members of the family to achieve intelligent feedback and interaction