Share This Article
showcased a groundbreaking generative AI model named Fugatto. This model is designed as a versatile tool for creating and modifying sounds using text and audio prompts. Fugatto can generate and transform a mix of music, voices, and soundscapes, offering unprecedented capabilities to musicians, developers, and content creators.
Fugatto, short for Foundational Generative Audio Transformer Opus 1, supports multiple tasks, such as generating new music, altering accents or emotions in voices, and crafting entirely novel soundscapes. These features mark a significant leap in audio AI innovation.
Fugatto empowers users to create audio that combines various instructions and prompts. For example, it can produce a trumpet sound mimicking a barking dog or generate a voice with a specific accent and tone.
Beyond music, Fugatto opens possibilities for advertising, education, and gaming. Advertisers can adjust campaign voiceovers for regional audiences, while educators can personalize content with voices familiar to learners. Game developers can modify audio assets or generate them dynamically based on gameplay.
Fugatto, powered by a 2.5-billion-parameter generative transformer, was trained on Nvidia DGX systems with 32 H100 Tensor Core GPUs. Its development involved a diverse team spanning several countries, which enhanced its multilingual and multi-accent capabilities. The model’s training relied on millions of audio samples, carefully curated to enable complex and diverse tasks.