Systems for AI
Recently, machine learning (ML) and deep learning (DL) have been rapidly advanced and successful in a wide variety of fields, including image classification, translation, and text-to-image generation. As this trend, the size of ML/DL models and datasets is also growing. This trend makes it increasingly difficult to train and/or serve large-scale ML/DL models on commodity devices such as GPUs, or requires more large hardware resources, which results in more cost and leads to higher energy usage.
The goal of systems for artificial intelligence (AI) is to research new systems or algorithms that can effectively train or serve various ML/DL models by efficiently utilizing and/or scheduling hardware resources in various system environments, including small systems, distributed systems, and cloud computing.
Micro-Batch Processing: An Improved Training Methodology for Enabling Large Batch Size Training Beyond the GPU Memory Limit
Micro-Batch Processing (MBP) is a method to allow deep learning models to train using large batch sizes that exceed the GPU memory size. The proposed MBP uses a batch streaming method and a loss normalization to effectively train large batches in the limited GPU memory. In the experiments, the MBP succeeded in training up to 128× larger batches than previously untrainable. Theoretically, the MBP allows the increase of the batch size up to the total size of the dataset.
GMM: An Enhanced Model Serving System for Multiple Models on GPU Memory-Constrained Systems
The GMM is a new model serving system that can serve more DNN inference models than previously allowed on a single GPU system. The GMM uses an efficient GPU memory sharing method to efficiently share the GPU memory for all models to access, and a fast model allocation method to quickly transfer parameters to the GPU memory. Overall, the GMM was successful in serving up to 10× more models compared to previous serving systems.