基于 Pytorch 量化模型
Pytorch 原生量化之 FX Graph Mode Quantization?
- FX Graph Mode Quantization 是 PyTorch 中一个新的自动量化框架,目前它是一个原型功能。它通过添加对函数的支持和自动化量化过程来改进 Eager Mode Quantization
1 | import torch |
Pytorch 原生量化之 Eager Mode Quantization?
- Eager Mode Quantization 是一项测试功能。用户需要进行融合并指定手动进行量化和去量化的位置,而且它只支持模块而不支持功能
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32import torch
# define a floating point model where some layers could be statically quantized
class M(torch.nn.Module):
def __init__(self):
super(M, self).__init__()
# QuantStub converts tensors from floating point to quantized
self.quant = torch.quantization.QuantStub()
self.conv = torch.nn.Conv2d(1, 1, 1)
self.relu = torch.nn.ReLU()
# DeQuantStub converts tensors from quantized to floating point
self.dequant = torch.quantization.DeQuantStub()
def forward(self, x):
# 自己指定开始量化的层
x = self.quant(x)
x = self.conv(x)
x = self.relu(x)
# 指定结束量化的层
x = self.dequant(x)
return x
# create a model instance
model_fp32 = M()
# model must be set to eval mode for static quantization logic to work
model_fp32.eval()
model_fp32.qconfig = torch.quantization.get_default_qconfig('fbgemm')
# 指定融合的层
model_fp32_fused = torch.quantization.fuse_modules(model_fp32, [['conv', 'relu']])
model_fp32_prepared = torch.quantization.prepare(model_fp32_fused)
input_fp32 = torch.randn(4, 1, 4, 4)
model_fp32_prepared(input_fp32)
model_int8 = torch.quantization.convert(model_fp32_prepared)
res = model_int8(input_fp32)
参考: