AI model quantization is a technique to intentionally reduce the precision numbers used by the neural network. Below are some pros and cons of quantizing AI models.
| Pros | Cons |
|
|
In the neural network, there are nodes (neuron), which typically have multiple inputs and an output. The output is called activation. Each input has a weight to signify its importance. An activation function also has bias as a parameter. A neuron's output can be an input of other neuron.
When FP32 is used for AI models, the weight, the bias and the activation are represented by 32-bit floating point values. Typically, the weight and the activation are quantized whereas the bias sometimes is kept the same.
When the number of parameters is mentioned to represent the model size, it primarily represents the number of weights and biases. So, when 30 millions parameters exists in an AI model, each parameter needs 4 bytes memory when the parameter is FP32. That means about 240M bytes of the memory can be required for the AI inference with the model. In general, mathematical process using floating point value takes more time compared with integer value processing, too.
| Type | Description | Pros | Cons |
| Static quantization | Quantize both weights and activations statically |
Fast inference
Less accuracy degradation compared with dynamic method.
|
Require calibration data |
| Dynamic quantization | Quantize the weights statically while activations are quantized dynamically during inference |
Calibration data not required
Suitable for models that the calibration is difficult to perform.
|
Inference time may not improve as desired. |
| Quantization-aware training | Integrate quantization into training process | Best outcome of AI inference | Too complex to start |
The static quantization is supported by a development kit from Silex. Our development kit for AI application development is based on Qualcomm's SDK.
The required calibration data can be a subset of training data set that is used to generate an original AI model.
Contact us to learn more.