Instructions to deploy YOLOv7 as TensorRT engine to [Triton Inference Server](https://github.com/NVIDIA/triton-inference-server).
Triton Inference Server takes care of model deployment with many out-of-the-box benefits, like a GRPC and HTTP interface, automatic scheduling on multiple GPUs, shared memory (even on GPU), dynamic server-side batching, health metrics and memory resource management.
There are no additional dependencies needed to run this deployment, except a working docker daemon with GPU support.
## Export TensorRT
See https://github.com/WongKinYiu/yolov7#export for more info.
See [Triton Model Repository Documentation](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md#model-repository) for more info.
See [Triton Model Configuration Documentation](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#model-configuration) for more info.
Minimal configuration for `triton-deploy/models/yolov7/config.pbtxt`:
See [Triton Model Analyzer Documentation](https://github.com/triton-inference-server/server/blob/main/docs/model_analyzer.md#model-analyzer) for more info.
Throughput for 16 clients with batch size 1 is the same as for a single thread running the engine at 16 batch size locally thanks to Triton [Dynamic Batching Strategy](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#dynamic-batcher). Result without dynamic batching (disable in model configuration) considerably worse:
{dummy,image,video} Run mode. 'dummy' will send an emtpy buffer to the server to test if inference works. 'image' will process an image. 'video' will process a video.
input Input file to load from in image or video mode
optional arguments:
-h, --help show this help message and exit
-m MODEL, --model MODEL
Inference model name, default yolov7
--width WIDTH Inference model input width, default 640
--height HEIGHT Inference model input height, default 640
-u URL, --url URL Inference server URL, default localhost:8001
-o OUT, --out OUT Write output into file instead of displaying it
-f FPS, --fps FPS Video output fps, default 24.0 FPS
-i, --model-info Print model status, configuration and statistics