Baichuan-7B

Large language model achieving state-of-the-art performance on Chinese and English language benchmarks.

Baichuan-7B is a family of LLMs. It achieves the state-of-the-art performance of its size on standard Chinese and English authoritative benchmarks (C-EVAL/MMLU). 4-bit weights and 16-bit activations making it suitable for on-device The model is quantized to deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-KVCache-Quantized's latency.

Model Repository Hugging Face Research Paper

Technical Details

Number of parameters:7B

Model size:3.9GB

Model-1 (Prompt Processor):Baichuan-PromptProcessor-Quantized

Max context length:1024

Prompt processor input:1024 tokens

Prompt processor output:1024 output tokens + KVCache for token generator

Model-2 (Token Generator):Baichuan-TokenGenerator-KVCache-Quantized

Token generator input:1 input token + past KVCache

Token generator output:1 output token + KVCache for next iteration

Decoding length:1024 (1 output token + 1023 from KVCache)

Use:Initiate conversation with prompt-processor and then token generator for subsequent iterations.

QNN-SDK:2.19

Applicable Scenarios

Dialogue
Content Generation
Customer Support

Supported Mobile Form Factors

Phone
Tablet

Licenses

Source Model:APACHE-2.0

Deployable Model:APACHE-2.0

Supported Mobile Devices

Samsung Galaxy S24 Ultra

Supported Mobile Chipsets