Mobile
Enabling intelligent connections and personalized applications across devices
Sample Ready Apps
Super Resolution
Sample application to deploy an optimized super resolution solution on device
Image Classification
Sample application to deploy an optimized image classification solution on device
Semantic Segmentation
Sample application to deploy an optimized semantic segmentation solution on device
Deploy optimized models on real devices in minutes
Qualcomm® AI Hub simplifies deploying AI models for vision, audio, and speech applications to edge devices within minutes. This example shows how you can deploy your own PyTorch model on a real hosted device. See the documentation for more details. If you hit any issues with your model (performance, accuracy or otherwise), please file an issue here.
import qai_hub as hub
import torch
from torchvision.models import mobilenet_v2
import requests
import numpy as np
from PIL import Image
# Using pre-trained MobileNet
torch_model = mobilenet_v2(pretrained=True)
torch_model.eval()
# Step 1: Trace model
input_shape = (1, 3, 224, 224)
example_input = torch.rand(input_shape)
traced_torch_model = torch.jit.trace(torch_model, example_input)
# Step 2: Compile model
compile_job = hub.submit_compile_job(
model=traced_torch_model,
device=hub.Device("Samsung Galaxy S24 (Family)"),
input_specs=dict(image=input_shape),
)
# Step 3: Profile on cloud-hosted device
target_model = compile_job.get_target_model()
profile_job = hub.submit_profile_job(
model=target_model,
device=hub.Device("Samsung Galaxy S24 (Family)"),
)
# Step 4: Run inference on cloud-hosted device
sample_image_url = (
"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/apidoc/input_image1.jpg"
)
response = requests.get(sample_image_url, stream=True)
response.raw.decode_content = True
image = Image.open(response.raw).resize((224, 224))
input_array = np.expand_dims(
np.transpose(np.array(image, dtype=np.float32) / 255.0, (2, 0, 1)), axis=0
)
# Run inference using the on-device model on the input image
inference_job = hub.submit_inference_job(
model=target_model,
device=hub.Device("Samsung Galaxy S24 (Family)"),
inputs=dict(image=[input_array]),
)
on_device_output = inference_job.download_output_data()
# Step 5: Post-processing the on-device output
output_name = list(on_device_output.keys())[0]
out = on_device_output[output_name][0]
on_device_probabilities = np.exp(out) / np.sum(np.exp(out), axis=1)
# Read the class labels for imagenet
sample_classes = "https://qaihub-public-assets.s3.us-west-2.amazonaws.com/apidoc/imagenet_classes.txt"
response = requests.get(sample_classes, stream=True)
response.raw.decode_content = True
categories = [str(s.strip()) for s in response.raw]
# Print top five predictions for the on-device model
print("Top-5 On-Device predictions:")
top5_classes = np.argsort(on_device_probabilities[0], axis=0)[-5:]
for c in reversed(top5_classes):
print(f"{c} {categories[c]:20s} {on_device_probabilities[0][c]:>6.1%}")
# Step 6: Download model
target_model = compile_job.get_target_model()
target_model.download("mobilenet_v2.tflite")
Filter by
Domain/Use Case
Device
Chipset
Model Precision
Tags
- A “backbone” model is designed to extract task-agnostic representations from specific data modalities (e.g., images, text, speech). This representation can then be fine-tuned for specialized tasks.
- A “foundation” model is versatile and designed for multi-task capabilities, without the need for fine-tuning.
- Models capable of generating text, images, or other data using generative models, often in response to prompts.
- Large language models. Useful for a variety of tasks including language generation, optical character recognition, information retrieval, and more.
- A “quantized” model can run in low or mixed precision, which can substantially reduce inference latency.
- A “real-time” model can typically achieve 5-60 predictions per second. This translates to latency ranging up to 200 ms per prediction.
161 models