Fun Audio Chat

🎁FREE

GitHub

Fun-Audio-Chat is an open-source GitHub repository from FunAudioLLM featuring an advanced 8B-parameter Large Audio Language Model (Fun-Audio-Chat-8B) designed for natural, low-latency, multilingual voice interactions. It supports real-time spoken conversations, question answering, audio understanding, function calling, instruction following, and emotional empathy, with efficient dual-resolution speech processing and a Gradio/web demo for interactive testing.

Visit Website

✨Key Features

▸Efficient Architecture: Dual-Resolution Speech Representations (5Hz backbone + 25Hz refined head) reduce GPU compute by ~50% while maintaining high-quality speech understanding and generation.
▸Core-Cocktail Training: Preserves strong text LLM capabilities alongside advanced audio processing for balanced multimodal performance.
▸Top Benchmark Performance: Leads ~8B models on major audio/voice benchmarks (OpenAudioBench, VoiceBench, UltraEval-Audio, MMAU, etc.) for spoken QA, empathy, and function calling.
▸Supported Capabilities: Speech-to-Text (S2T), Speech-to-Speech (S2S), multiturn conversations, speech function calling; integrates CosyVoice for TTS synthesis.
▸Multilingual Support: Broad language coverage via underlying components like SenseVoice and CosmoSpeech.
▸Easy Setup & Demo: Includes installation scripts, pretrained model downloads (Hugging Face/ModelScope), inference examples, and a full web/Gradio interface with server-client setup.
▸Open-Source & Active: Apache-2.0 licensed, ~723 stars, active contributions (e.g., vLLM integration for 20-50x speedup), with technical report and evaluation scripts provided.
▸Hardware Needs: Runs inference on ~24GB VRAM GPU; full training requires more (e.g., 4×80GB).

728 x 90 Ad Space

🔗Similar ToolsGitHub

View All

Fun Audio Chat

✨Key Features

🔗Similar ToolsGitHub

Awesome design MD

CoPaw

OpenSandbox

ZeroClaw

Clawdbot

AionUi