IoT
Deploy real-time AI to various devices providing next-generation user experiences
Deploy with Foundries.io
Deploy optimized models on real devices in minutes
Qualcomm® AI Hub simplifies deploying AI models for vision, audio, and speech applications to edge devices within minutes. This example shows how you can deploy your own PyTorch model on a real hosted device. See the documentation for more details. If you hit any issues with your model (performance, accuracy or otherwise), please file an issue here.
import qai_hub as hub
import torch
from torchvision.models import mobilenet_v2
import requests
import numpy as np
from PIL import Image
# Using pre-trained MobileNet
torch_model = mobilenet_v2(pretrained=True)
torch_model.eval()
# Step 1: Trace model
input_shape = (1, 3, 224, 224)
example_input = torch.rand(input_shape)
traced_torch_model = torch.jit.trace(torch_model, example_input)
# Step 2: Compile model
compile_job = hub.submit_compile_job(
model=traced_torch_model,
device=hub.Device("RB3 Gen 2 (Proxy)"),
input_specs=dict(image=input_shape),
)
# Step 3: Profile on cloud-hosted device
target_model = compile_job.get_target_model()
profile_job = hub.submit_profile_job(
model=target_model,
device=hub.Device("RB3 Gen 2 (Proxy)"),
)
# Step 4: Run inference on cloud-hosted device
sample_image_url = (
"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/apidoc/input_image1.jpg"
)
response = requests.get(sample_image_url, stream=True)
response.raw.decode_content = True
image = Image.open(response.raw).resize((224, 224))
input_array = np.expand_dims(
np.transpose(np.array(image, dtype=np.float32) / 255.0, (2, 0, 1)), axis=0
)
# Run inference using the on-device model on the input image
inference_job = hub.submit_inference_job(
model=target_model,
device=hub.Device("RB3 Gen 2 (Proxy)"),
inputs=dict(image=[input_array]),
)
on_device_output = inference_job.download_output_data()
# Step 5: Post-processing the on-device output
output_name = list(on_device_output.keys())[0]
out = on_device_output[output_name][0]
on_device_probabilities = np.exp(out) / np.sum(np.exp(out), axis=1)
# Read the class labels for imagenet
sample_classes = "https://qaihub-public-assets.s3.us-west-2.amazonaws.com/apidoc/imagenet_classes.txt"
response = requests.get(sample_classes, stream=True)
response.raw.decode_content = True
categories = [str(s.strip()) for s in response.raw]
# Print top five predictions for the on-device model
print("Top-5 On-Device predictions:")
top5_classes = np.argsort(on_device_probabilities[0], axis=0)[-5:]
for c in reversed(top5_classes):
print(f"{c} {categories[c]:20s} {on_device_probabilities[0][c]:>6.1%}")
# Step 6: Download model
target_model = compile_job.get_target_model()
target_model.download("mobilenet_v2.tflite")
Filter by
Chipset
Domain/Use Case
Model Precision
IM SDK Support
Tags
- A “backbone” model is designed to extract task-agnostic representations from specific data modalities (e.g., images, text, speech). This representation can then be fine-tuned for specialized tasks.
- A “foundation” model is versatile and designed for multi-task capabilities, without the need for fine-tuning.
- Models capable of generating text, images, or other data using generative models, often in response to prompts.
- A “quantized” model can run in low or mixed precision, which can substantially reduce inference latency.
- A “real-time” model can typically achieve 5-60 predictions per second. This translates to latency ranging up to 200 ms per prediction.
147 models