RESTORASI CITRA PENANDA FIDUCIAL DENGAN SUPER RESOLUSI UNTUK PEMANDU PENDARATAN PRESISI PADA QUADROTOR
Unmanned aerial vehicles (UAVs) are finding increasing applications in several domains, such as mapping, agriculture, aerial surveillance, and disaster management. Accurately landing at the desired position is a crucial component of UAV operations. Due to the accuracy limitations of GPS-based landin...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/86701 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Unmanned aerial vehicles (UAVs) are finding increasing applications in several domains, such as mapping, agriculture, aerial surveillance, and disaster management. Accurately landing at the desired position is a crucial component of UAV operations. Due to the accuracy limitations of GPS-based landing guidance, computer vision is being investigated as a potential alternative to improve UAV landing precision especially in quadcopters. Reference points, such as fiducial markers like ArUco markers, are necessary for vision-based landing. But problems occur because of the limited resolution of the camera and motion blur from the UAV movements. Fiducial markers became smaller and hazier as altitude rose, and motion blur made images less sharp.
Reconstructing motion-blurred ArUco marker images and improving image resolution are two benefits of single-frame super-resolution using deep learning. When it comes to reconstructing images impacted by motion blur, interpolation-based super-resolution techniques fall short while the Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) excels at it. A dataset of ArUco marker images with high resolution (HR) and low resolution (LR) is used to train ESRGAN. The LR images are motion blurred and reduced in size from 448x448 to 112x112 pixels, or one-quarter of the HR image size. After processing the hazy LR images, ESRGAN outputs 448x448 pixel images, which is the same resolution as the original HR images. To improve the Structure Similarity Index (SSIM), a Loss Function Structure Similarity Index (LSSIM) is integrated into the overall loss of the ESRGAN generator.
The Structure Similarity Index Method (SSIM), the Feature Similarity Index Method (FSIM), and the energy spectrum of the image's frequency components were used to assess the performance of ESRGAN and bicubic interpolation. For this analysis, fifty 112x112 pixel ArUco marker images that were randomly subjected to motion blur were used. Resolution enhancement and reconstruction using ESRGAN achieved an average similarity of 84.14% in SSIM, 82.80% in FSIM, and a total energy spectrum reaching 99.90% compared to the original image. Super-resolution using bicubic interpolation yielded a similarity of 70.46% in SSIM, 63.20% in FSIM, and 92.73% in total energy spectrum. Meanwhile, bicubic interpolation combined with the Wiener filter produced a similarity of 69.58% in SSIM, 62.49% in FSIM, and a total energy spectrum of 92.53%.
Image reconstruction and enhancement using ESRGAN’s super-resolution showed better image quality, as seen in the similarity of its image spectrum, which closely matches the original spectrum. In contrast, the bicubic interpolation and Wiener filter methods resulted in degraded quality, with darker spectrum areas, especially at the edges. Moreover, the total energy spectrum closest to the original image was only found in the ESRGAN output, which also demonstrated higher SSIM and FSIM values. This indicates that ESRGAN is more capable of reconstructing and improving ArUco marker images affected by motion blur.
Keywords: super resolution, deep learning, reconstruction, ESRGAN.?
|
---|