Repository logo
Article

A novel adaptive checkpointing method based on information obtained from workflow structure

Loading...
Thumbnail Image

Date

Presentation Date

Editor

Other contributors

Access rights

Access: otwarty dostęp
Rights: CC BY 4.0
Attribution 4.0 International

Attribution 4.0 International (CC BY 4.0)

Other title

Resource type

Version

wersja wydawnicza
Item type:Journal Issue,
Computer Science
2016 - Vol. 17 - No. 3

Pagination/Pages:

pp. 387-406

Research Project

Event

Description

Bibliogr. s. 404-405.

Abstract

Scientific workflows are data- and compute-intensive, thus, they may run for days or even weeks on parallel and distributed infrastructures such as grids, supercomputers, and clouds. In these high-performance computing infrastructures, the number of failures that can arise during scientific-workflow enact- ment can be high, so the use of fault-tolerance techniques is unavoidable. The most-frequently used fault-tolerance technique is taking checkpoints from time to time, when failure is detected, the last consistent state is restored. One of the most-critical factors that has great impact on the effectiveness of the checkpointing method is the checkpointing interval. In this work, we propose a Static (Wsb) and an Adaptive (AWsb) Workflow Structure Based checkpointing algorithm. Our results showed that, compared to the optimal checkpointing strategy, the static algorithm may decrease the checkpointing overhead by as much as 33% without affecting the total processing time of workflow execution. The adaptive algorithm may further decrease this overhead while keeping the overall processing time at its necessary minimum.

Access rights

Access: otwarty dostęp
Rights: CC BY 4.0
Attribution 4.0 International

Attribution 4.0 International (CC BY 4.0)