Fun Audio Chat
🎁FREEGitHub
Fun-Audio-Chat is an open-source GitHub repository from FunAudioLLM featuring an advanced 8B-parameter Large Audio Language Model (Fun-Audio-Chat-8B) designed for natural, low-latency, multilingual voice interactions. It supports real-time spoken conversations, question answering, audio understanding, function calling, instruction following, and emotional empathy, with efficient dual-resolution speech processing and a Gradio/web demo for interactive testing.
✨Key Features
- ▸Efficient Architecture: Dual-Resolution Speech Representations (5Hz backbone + 25Hz refined head) reduce GPU compute by ~50% while maintaining high-quality speech understanding and generation.
- ▸Core-Cocktail Training: Preserves strong text LLM capabilities alongside advanced audio processing for balanced multimodal performance.
- ▸Top Benchmark Performance: Leads ~8B models on major audio/voice benchmarks (OpenAudioBench, VoiceBench, UltraEval-Audio, MMAU, etc.) for spoken QA, empathy, and function calling.
- ▸Supported Capabilities: Speech-to-Text (S2T), Speech-to-Speech (S2S), multiturn conversations, speech function calling; integrates CosyVoice for TTS synthesis.
- ▸Multilingual Support: Broad language coverage via underlying components like SenseVoice and CosmoSpeech.
- ▸Easy Setup & Demo: Includes installation scripts, pretrained model downloads (Hugging Face/ModelScope), inference examples, and a full web/Gradio interface with server-client setup.
- ▸Open-Source & Active: Apache-2.0 licensed, ~723 stars, active contributions (e.g., vLLM integration for 20-50x speedup), with technical report and evaluation scripts provided.
- ▸Hardware Needs: Runs inference on ~24GB VRAM GPU; full training requires more (e.g., 4×80GB).
Advertisement
728 x 90 Ad Space