数据增强实战图像/文本/表格数据增强1. 图像增强AlbumentationsimportalbumentationsasAfromalbumentations.pytorchimportToTensorV2 transformA.Compose([A.RandomCrop(width256,height256),A.HorizontalFlip(p0.5),A.RandomBrightnessContrast(p0.2),A.Rotate(limit15,p0.5),A.GaussNoise(p0.2),A.Blur(blur_limit3,p0.1),A.Normalize(mean[0.485,0.456,0.406],std[0.229,0.224,0.225]),ToTensorV2(),])# 使用augmentedtransform(imageimage)image_augaugmented[image]2. 文本增强importnlpaug.augmenter.wordasnawimportnlpaug.augmenter.charasnac# 同义词替换augnaw.SynonymAug(aug_srcwordnet)augmentedaug.augment(机器学习是人工智能的分支)# 随机插入augnaw.RandomWordAug(actioninsert)augmentedaug.augment(text)# 回译增强augnaw.BackTranslationAug(from_model_namefacebook/wmt19-en-de,to_model_namefacebook/wmt19-de-en)3. MixUp / CutMiximportnumpyasnpdefmixup(x1,y1,x2,y2,alpha0.2):lamnp.random.beta(alpha,alpha)xlam*x1(1-lam)*x2 ylam*y1(1-lam)*y2returnx,ydefcutmix(x1,y1,x2,y2,alpha1.0):lamnp.random.beta(alpha,alpha)bbx1,bby1,bbx2,bby2rand_bbox(x1.size(),lam)x1[:,:,bbx1:bbx2,bby1:bby2]x2[:,:,bbx1:bbx2,bby1:bby2]returnx1,lam*y1(1-lam)*y2总结方法适用数据效果Albumentations图像最常用同义词替换文本增加多样性MixUp图像/表格提升泛化CutMix图像区域替换