Nvidia filed 2 patents, all in AI & Machine Learning.
The filings cover conversational AI technology that allows chatbots and NPCs to insert relevant images into text responses without requiring expensive multimodal models. Additionally, the patents describe auto-regressive auto-encoder systems that generate detailed 3D meshes with thousands of faces in artist-specific styles for applications in games, VR environments, and digital content creation.
NVIDIA's 2 AI and machine learning patents tackle distinct challenges in conversational AI and 3D content generation. The first addresses the computational expense of multimodal chatbots by separating image retrieval from language processing. Instead of training costly models that handle both text and images simultaneously, the system pre-indexes associations between text and images from document collections, then uses vector similarity matching to find relevant visuals for generated responses. The second patent describes an auto-regressive auto-encoder that produces complex 3D meshes containing over 8000 faces while maintaining artist-quality topology. A more efficient face tokenization algorithm works alongside the auto-encoder architecture to compress meshes into fixed-length representations, enabling the system to generate meshes roughly 5 times more detailed than previous approaches while supporting multiple conditioning inputs for better performance across different domains.