Parallel multipath transmission for burst traffic optimization in point-to-point NoCs

Network-on-chip (NoC) is a promising solution to connect more than hundreds of processing elements (PEs). As the number of PEs increases, the high communication latency caused by the burst traffic hampers the speedup gained by computation acceleration. Although parallel multipath transmission is an...

Full description

Saved in:
Bibliographic Details
Main Authors: Chen, Hui, Zhang, Zihao, Chen, Peng, Zhu, Shien, Liu, Weichen
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2021
Subjects:
Online Access:https://hdl.handle.net/10356/152657
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Network-on-chip (NoC) is a promising solution to connect more than hundreds of processing elements (PEs). As the number of PEs increases, the high communication latency caused by the burst traffic hampers the speedup gained by computation acceleration. Although parallel multipath transmission is an effective method to reduce transmission latency, its advantages have not been fully exploited in previous works, especially for emerging point-To-point NoCs since: (1) Previous static message splitting strategy increases contentions when traffic loads are heavy, degrading NoC performance. (2) Only limited shortest paths are chosen, ignoring other possible paths without contentions. (3) The optimization of hardware that supports parallel multipath transmission is missing, resulting in additional overhead. Thus, we propose a software and hardware collaborated design to reduce latency in point-To-point NoCs through parallel multipath transmission. Specifically, we revise hardware design to support parallel multipath transmission efficiently. Moreover, we propose a reinforcement learning-based algorithm to decide when and how to split messages, and which path should be used according to traffic loads. Experiments show that our algorithm achieves a remarkable performance improvement (+12.1% to +21.0%) when compared with the state-of-The-Art dual-path algorithm. Also, our hardware decreases power and area consumption by 23.2% and 10.3% over the dual-path hardware.