About 74 results
Open links in new tab
  1. INT8 and INT4 performance on ORIN AGX - NVIDIA Developer Forums

    Jan 29, 2025 · My ORIN AGX developer kit has the following specs: Jetpack 6.0 L4T 36.3.0 Cuda: 12.2 Pytorch: 2.3.0 While running some LLM Inference code locally using the Transformers library and …

  2. YOLOv5 Model INT8 Quantization based on OpenVINO™ 2022.1 POT API

    Sep 20, 2022 · Fig.2 shows the general workflow of INT8 quantization based on POT API. Fig.2 General workflow of INT8 quantization based on POT API YOLOv5 uses custom pre- and post-processing …

  3. Requesting INT8 data type but platform has no support, ignored

    Nov 25, 2022 · Requesting INT8 data type but platform has no support, ignored Accelerated Computing Intelligent Video Analytics DeepStream SDK

  4. Quantizing ONNX Models using Intel® Neural Compressor

    Feb 1, 2022 · By leveraging Intel® Neural Compressor, we achieved less than 1% accuracy loss and gained significant speedup in INT8 model performance compared to the FP32 model. We continue …

  5. How to confirm whether my CPU support VNNI or not?

    Apr 28, 2020 · Hi experts, I have one Cascade Lake sever and run AI inference(INT8 precision) tasks with intel-tensorflow. According to Introduction to Intel® Deep Learning Boost on Second Generation …

  6. Does TensorRT 8.6.1 support INT8 quantization for HardSwish?

    Oct 20, 2023 · TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL > Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT …

  7. Effective Weight-Only Quantization for Large Language Models with …

    Oct 2, 2023 · As large language models (LLMs) become more prevalent, there is a growing need for quantization methods to maintain accuracy while reducing computational costs. Compared to …

  8. INT8 vs FP16 results - Jetson AGX Xavier - NVIDIA Developer Forums

    Oct 28, 2020 · Hi, I’ve been using the method described in the article below in order to run our network in INT8 instead of FP16. The speedup is really cool, and the visual results (i.e. after I process the …

  9. Converting ONNX model to INT8 - NVIDIA Developer Forums

    Apr 8, 2021 · Description I am trying to convert an FP32 ONNX model to INT8. One technique for conversion is to have a file with the dynamic range of each tensor (used for building the engine). I am …

  10. TensorRT 5 Int8 Calibration Example - NVIDIA Developer Forums

    Mar 21, 2019 · If possible, can TensorRT team please share the Int8 Calibration sample using the Python API ? I have been following this link: but I have run into several problems. I checked the …