Llama-v2-7B-Chat
State-of-the-art large language model useful on a variety of language understanding and generation tasks.
Llama 2 is a family of LLMs. The "Chat" at the end indicates that the model is optimized for chatbot-like dialogue. The model is quantized to 4-bit weights and 16-bit activations making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-KVCache-Quantized's latency.
Snapdragon® 8 Gen 2 Mobile
Samsung Galaxy S23 Ultra
TorchScriptQualcomm® AI Engine Direct
8.48/s
Tokens
34,841NPU
Layers
Snapdragon® 8 Gen 2 Mobile
Samsung Galaxy S23 Ultra
TorchScriptQualcomm® AI Engine Direct
397/s
Tokens
31,769NPU
Layers
Technical Details
Number of parameters:7B
Model size:3.6GB
Model-1 (Prompt Processor):Llama-PromptProcessor-Quantized
Max context length:1024
Prompt processor input:1024 tokens
Prompt processor output:1024 output tokens + KVCache for token generator
Model-2 (Token Generator):Llama-TokenGenerator-KVCache-Quantized
Token generator input:1 input token + past KVCache
Token generator output:1 output token + KVCache for next iteration
Decoding length:1024 (1 output token + 1023 from KVCache)
Use:Initiate conversation with prompt-processor and then token generator for subsequent iterations.
QNN-SDK:2.19
Applicable Scenarios
- Dialogue
- Content Generation
- Customer Support
Supported Form Factors
- Phone
- Tablet
Licenses
Source Model:LLAMA2
Deployable Model:LLAMA2
Terms of Use:Qualcomm® Generative AI usage and limitations
Tags
- llmLarge language models. Useful for a variety of tasks including language generation, optical character recognition, information retrieval, and more.
- generative-aiModels capable of generating text, images, or other data using generative models, often in response to prompts.
- quantizedA “quantized” model can run in low or mixed precision, which can substantially reduce inference latency.
Supported Devices
- Samsung Galaxy S23 Ultra
Supported Chipsets
- Snapdragon® 8 Gen 2 Mobile