HomeMobile ModelsLlama-v2-7B-Chat

    Llama-v2-7B-Chat

    State-of-the-art large language model useful on a variety of language understanding and generation tasks.

    Llama 2 is a family of LLMs. The "Chat" at the end indicates that the model is optimized for chatbot-like dialogue. The model is quantized to 4-bit weights and 16-bit activations making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-KVCache-Quantized's latency.

    Snapdragon® 8 Gen 2 Mobile
    Samsung Galaxy S23 Ultra
    TorchScriptQualcomm® AI Engine Direct
    8.48/s
    Tokens
    34,841NPU
    Layers

    Technical Details

    Number of parameters:7B
    Model size:3.6GB
    Model-1 (Prompt Processor):Llama-PromptProcessor-Quantized
    Max context length:1024
    Prompt processor input:256 tokens
    Prompt processor output:256 output tokens + KVCache for token generator
    Model-2 (Token Generator):Llama-TokenGenerator-KVCache-Quantized
    Token generator input:1 input token + past KVCache
    Token generator output:1 output token + KVCache for next iteration
    Decoding length:1024 (1 output token + 1023 from KVCache)
    Use:Initiate conversation with prompt-processor and then token generator for subsequent iterations.
    QNN-SDK:2.19

    Applicable Scenarios

    • Dialogue
    • Content Generation
    • Customer Support

    Supported Mobile Form Factors

    • Phone
    • Tablet

    Licenses

    Source Model:LLAMA2
    Deployable Model:LLAMA2

    Tags

    • llm
      Large language models. Useful for a variety of tasks including language generation, optical character recognition, information retrieval, and more.
    • generative-ai
      Models capable of generating text, images, or other data using generative models, often in response to prompts.
    • quantized
      A “quantized” model can run in low or mixed precision, which can substantially reduce inference latency.

    Supported Mobile Devices

    • Samsung Galaxy S23 Ultra

    Supported Mobile Chipsets

    • Snapdragon® 8 Gen 2 Mobile