Efficient and fault tolerant HLA-based simulation

Distributed simulation subdivides a complex simulation (federation) into a group of simulation components (federates) and executes them in distributed manner. The High Level Architecture (HLA), an IEEE 1516 standard, provides a general framework for developing large-scale distributed simulations. Th...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Zengxiang
Other Authors: Stephen John Turner
Format: Theses and Dissertations
Language:English
Published: 2012
Subjects:
Online Access:https://hdl.handle.net/10356/48170
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Distributed simulation subdivides a complex simulation (federation) into a group of simulation components (federates) and executes them in distributed manner. The High Level Architecture (HLA), an IEEE 1516 standard, provides a general framework for developing large-scale distributed simulations. The Runtime Infrastructure (RTI) is a middleware that controls the communication among federates according to the HLA interface specification. The simulation executions may involve a large number of computationally intensive federates and thus are time and resource consuming. What is worse, these federates may be subject to crash-stop and Byzantine failures and the risk of federation failure increases with the federation scale. In this thesis, we propose mechanisms to support efficient and fault tolerant HLA-based simulation by exploiting the advantages of decoupled federate architecture, in which a federate connects to federation through its corresponding Decoupled RTI Component (DRC). Workload imbalance generally leads to poor distributed simulation performance. To achieve load balancing, we propose to migrate federates from heavily-loaded computing nodes to lightly-loaded ones. Using the decoupled federate architecture, only needs the federate to be migrated to the destination computing node; whereas the DRC can stay at the same place and keep the connection to the federation. One-phase migration protocol is first proposed to illustrate the federate migration process. Then, two-phases and relay-based migration protocols are further developed to reduce migration overhead by overlapping federate migration with continuous federate execution.