With the rapid growth of machine learning, Machine-Learning-as-a-Service (MLaaS) clusters appear to meet the needs of researchers. However, such clusters generally suffer from high resource utilization and energy consumption. For this reason, most work has focused on resource efficient scheduling for resource consolidation. However, in the process of resource consolidation, job performance is also affected, and it is a serious challenge to minimize energy consumption while ensuring quality of service (QoS) of jobs. In this paper, we analyze the differences in quality of service between offline and online jobs. For the performance interference problem caused by shared resource contention in online tasks, we portray the relationship between shared resources and job performance. As a result, we propose an online learning scheduling algorithm for colocation MLaaS cluster (OLSC), which based on alternating multiplier gradient descent, aims to guarantee the QoS of jobs and minimize cluster energy consumption. The results show that the cluster energy consumption under OLSC is only 59% of RI-FFD and 83% of RS-FFD under colocation MLaaS clusters.
|