Deep Learning - GPU memory limitation, How to overcome it?

Research

Shakeratto 2018. 3. 26. 22:36

Limited GPU Memory

1. First Solution: distributed Deep Learning

Source: M. Cho et al., "PowerAI DDL", 2017

PowerAI DDL provides a unified infrastructure for distributed Deep Learning over multiple GPUs for a single node, multiple nodes and a cloud environment
PowerAI DDL leverages an innovative multi-ring communication algorithm that balances communication latency with the communication overhead
The PowerAI DDL library provides functionality for high-performance distributed Deep Learning that can be employed in multiple frameworks
Currently there are PowerAI DDL enabled versions

2. Second Solution: Unified Memory

/opt/DL/caffe-ibm/bin/caffe time --model=/opt/DL/caffe-ibm/models/bvlc_googlenet/deploy.prototxt -gpu=0 -iterations=1

/opt/DL/caffe-ibm/bin/caffe time --model=/opt/DL/caffe-ibm/models/bvlc_googlenet/deploy.prototxt -gpu=0 -lms 8192 -iterations=1

3. Third Solution: Swap Out/In Atomic operations

Source: C. Meng et al., "Training Deeper Models by GPU: Memory Optimization on Tensorflow", NIPS 2017

Transferred to CPU memory, Transferred back to GPU memory
Table 1. (a) of this paper, The max batch size can be increased by up to 4 times

Table 1. (b) of this paper, conventional model is only possible to train a ResNet-200 model (“OOM” means “Out of Memory”), But, 'swap out/in' enables to train ResNet-10001, ResNet-2000 without OOM