Research

Heterogeneous Distributed Computing

Heterogeneous Distributed Computing (HDC) is a research area that utilizes different types of computing resources (CPUs, GPUs, FPGAs, etc.) to efficiently process tasks in a distributed environment. In modern computing environments, with hardware increasingly optimized for specific tasks, it has become essential to learn how to effectively leverage these heterogeneous resources to maximize computational performance, increase resource utilization, and improve power efficiency. The goal of the HDC is to research new approaches for scheduling and load-balancing, optimizing energy efficiency, resource scalability, and parallel algorithms for heterogeneous architectures.

Heterogeneous
Multi-Core Processors

Heterogeneous Multi-core systems have cores which are not identical in terms of micro-architecture, clock frequency, cache size, etc, whereas homogeneous multi-core systems include only identical cores.

Heterogeneous Multi-core systems have cores which are not identical in terms of micro-architecture, clock frequency, cache size, etc, whereas homogeneous multi-core systems include only identical cores. Current operating systems are not designed to handle heterogeneous multi-core processor. Especially, it is difficult to exploit the heterogeneity/affinity of resources and tasks. To fully exploit heterogeneous multi-core systems, intelligent scheduling of tasks becomes one of the critical issues. 

Heterogeneous Multi-robot
Task Scheduling

Robots have significant advantages over humans at performing tasks in extreme environments such as object collection in disaster sites. In such cases terrain of the disaster site is very uneven and can cost differently for different robots to travel over. Also in many cases, robots are required to move to multiple locations and perform multiple different tasks. In these cases robots performing the task to be composed of different types can be beneficial. Such as flying robots for traveling harsh surface with ease but incapable of carrying heavy object, ground robots for carrying heavy objects and robots with various equipment for various tasks can be used in each cases. To efficiently perform tasks in each cases task scheduling algorithms are needed. Scheduling algorithm should consider resources, such as energy and time, consumed to move to places for tasks and to perform the task. Since robots have limited capacity of energy, it needs to be considered as well.

 We research on algorithms that can schedule heterogeneous multi robots to perform tasks in different conditions. Cases where all the robots and task locations are previously known, or unknown, where each robot is capable of achieving more than one tasks or each robot can only perform one task, where tasks are continuously generated while robots are active or all the tasks are present before scheduling is done.

Distributed Mobile Computing

Distributed Mobile Computing (DMC) is the environment that consists of mobile devices that communicate via wireless manner thus limited energy battery capacity is constraint on the device usage and energy management are a critical issue for the overall system. The devices in this environment may be heterogeneous feature (e.g., computing performance, battery capacity) and tasks must be completed by their deadline and tasks may have affinity to different devices.

In multi-hop DMC, each device has its own resource management system (RMS) which takes a responsibility for selecting the most appropriate device to execute a task. The mobile devices form an ad-hoc network with no base station to directly/indirectly communicate each device. 

In single-hop cell-based DMC, there is only one RMS in the whole system and assumed to be fixed location (base station). Every mobile node communicate via RMS in the cell. RMS has all the status information of all the tasks and mobile nodes within communication range. With that information, RMS determine destination node to complete task.

<Featured Publications>

"A method to construct task scheduling algorithms for heterogeneous multi-core systems," S. Kim and J.-K. Kim, IEEE Access.
"Energy aware task scheduling for a distributed MANET computing environments," J. Kim and J.-K. Kim, Journal of Electrical Engineering and Technology.
"DiSCo: Distributed scalable compilation tool for heavy compilation workload," K. Jo, S. Kim, and J.-K. Kim, IEICE Transactions on Information and Systems.

Deep Learning

Deep learning (DL) is a core technology in artificial intelligence (AI) that solves complex problems by learning from large amounts of data based on multi-layer neural networks. Mimicking the neural networks of the human brain, deep learning extracts patterns from data and understands their structure, which enables it to make predictions and decisions. In recent years, advances in data and computing resources have led to rapid advances in deep learning technology, revolutionizing a wide range of applications such as translation, anomaly detection, and text-to-image generation. 

Multimodal Deep Learning

Multimodal learning is a model to represent the joint representations of different modalities. Multimodal deep learning model combines two deep Boltzmann machines each corresponds to one modality. An additional hidden layer is placed on top of the two Boltzmann Machines to give the joint representation [1]. The goal of this research is profiling a person's preference through verbal and non-verbal data(Audio and Video) with Deep Learning Architecture. 

Figures show example illustrations of a previous study, where it introduces an enhanced fake news detection model using multimodal deep learning approach.

Generative Models

Bird image generation

Enhancing Text-to-Image generative model using RL

Generative models have the ability to generate new data in many forms, including images, text, audio, video, and more, demonstrating both the creativity and practicality of AI. Typical examples of generative models include generative adversarial neural networks (GANs) and variational autoencoders (VAEs). Plus, with significant advances in diffusion models in particular, generative models are showing remarkable growth.

Figures show example illustrations of images generated by the previously proposed approaches. The images in the above figure are generated by GAN models. On the other hand, the images of the below figure are generated by RL-based diffustion models.

Reinforcement Learning

Reinforcement learning is a type of machine learning where an agent learns to make decisions by interacting with an environment. In this process, the agent observes the current state of the environment and chooses an action in response. The environment then transitions to a new state and gives the agent a reward based on the action's effectiveness. The goal of the agent is to maximize the cumulative reward over time, effectively learning a policy that dictates the best actions to take from any given state. This learning paradigm is powerful for tasks where explicit supervision is unavailable, and is commonly used in areas like robotics, gaming, and autonomous vehicles, where learning through trial and error is vital.

Figure shows an exmaple illustration of our previous study where it introduces a multi-agent reinforcement learning approach for traffic control.

<Featured Publications>

"TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer," E. Choi and J.-K. Kim, ISIF'24.
"Enhancing Reinforcement Learning Finetuned Text-to-Image Generative Model using Reward Ensemble," K. Back, X. Piao, and J.-K. Kim, ITS'24.
"Audio-to-Visual Cross-Modal Generation of Birds," J.Y. Shim, J. Kim, and J.-K. Kim, IEEE Access.
"Micro Junction Agent: A Scalable Multi-agent Reinforcement Learning Method for Traffic Control," B. Choi, J. Choe, and J.-K. Kim, ICAART'22.

Systems for Artificial Intelligence

With significant advances in deep learning, the scale of DL models and datasets are also increasing. This trend makes it increasingly difficult to train and/or serve large-scale ML/DL models on commodity devices such as GPUs, or requires more large hardware resources, resulting in more cost and leads to higher energy usage. The goal of systems for artificial intelligence (AI) is to research new systems or optimization methodologies that can effectively train or serve various ML/DL models by efficiently utilizing and/or scheduling hardware resources in various computing system environments, including small embedded systems, distributed systems, and cloud computing. 

Techniques for Large Batch Training

With the recent increase in the size and complexity of deep learning models and the scale of datasets, it have become difficult to train them efficiently, requiring more large-scale memory and computing systems. To overcome this limitation, many parallelism techniques (e.g., data-, model-, and pipeline-parallelisms) are introduced and tried to alleviate the problems that deep learning methods face. The goal of systems for AI is to explore fast and resource-efficient large-scale training approaches.

The above figure shows a new learning method called Micro-Batch Processing (MBP), that allows deep learning models to train using large batch sizes that exceed the GPU memory capacity. The proposed MBP uses a batch streaming method and a loss correction to effectively train large batches in the limited GPU memory. In the experiments, the MBP succeeded in training up to 128× larger batches than previously untrainable. Theoretically, the MBP allows the increase of the batch size up to the total size of the dataset.

Enhanced Model Serving Systems

Model serving system is a platform or framework to deploy and manage pre-trained models for use as network-callable services. Such systems handle numerous requests from users, process input data, and return predictions or results computed by the pre-trained models with minimal latency. The goal of system for AI is to explore more efficient and fast model serving systems archtecture or methodologies, such as energy-aware or resource-efficient model serving systems.

The above figure shows a new model serving system we previously studied, called GMM, which serves more DNN inference models than previously allowed on a single GPU system. The GMM uses an efficient GPU memory sharing method to efficiently share the GPU memory for all models to access, and a fast model allocation method to quickly transfer parameters to the GPU memory. Overall, the GMM was successful in serving up to 10× more models compared to previous serving systems.

<Featured Publications> 

"GMM: An Efficient GPU Memory Management-based Model Serving System for Multiple DNN Inference Models," X. Piao and J.-K. Kim, ICPP'24.
"Enabling Large Batch Size Training for DNN Models Beyond the Memory Limit While  Maintaining Performance," X. Piao, D. Synn, J. Park, and J.-K. Kim, IEEE Access.
"ADaPT: An Automated Dataloader Parameter Tuning Framework using AVL Tree-based Search Algorithms," M. Ryu, X. Piao, J. Park, D. Synn, and J.-K. Kim, KTCCS.