Compression of Neural Machine Translation Models via Pruning

March 08, 2018

Authors: Abigail See, Minh-Thang Luong, Christopher D. Manning (Stanford)
Venue: Arxiv

This paper applies pruning techniques to encoder-decoder deep multi-layer recurrent architecture with LSTM as the hidden unit type. The paper tries various pruning types, but finds the most effective to be simply pruning weights of least magnitude overall. While overall the techniques are mostly brought over from pruning techniques used in CNN's and other networks, the papers does make note of interesting artifacts of pruning.

Firstly, deeper neurons are more sensitive to pruning that early neurons. In other words, the deeper units are actually of more importance and more sensitive to even low-magnitude weights. Additionally, they find that the sparse models can even out perform the originals, and claim that this is most likely due to the "generalizing" effect that pruning has. They say that while training set accuracy decreases, validation set accuracy actually increases! Additionally, they note that the architecture generated by the sparse model cannot simply be applied by default. In other words, results are far better when you start with the original model, train it completely, and then iteratively prune.

Full Text

Search This Blog

Karl Taht's Research Paper Blog

Compression of Neural Machine Translation Models via Pruning

Comments

Post a Comment

Popular posts from this blog

Fundamental Latency Trade-offs in Architecting DRAM Caches (Alloy Cache)

ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions

AutoFDO: Automatic Feedback-Directed Optimization for Warehouse-Scale Applications