Paul J. Holenstein Executive Vice President, Gravic, Inc.
Dr. Bruce Holenstein President and CEO, Gravic, Inc.
Dr. Bill Highleyman Managing Editor, Availability Digest
The need to back up data has existed as long as data itself. Sometimes the backup is needed for historical purposes, for example, to preserve a snapshot of the information in time. In other cases, it is used to maintain an accurate and up-to-date copy of the information if the primary copy is lost or corrupted.
Magnetic tape is the oldest backup medium still in use. It was introduced in 1951, but tape sales began to fall with the introduction of high-speed and high-capacity hard disks, DVD’s, CD’s, and other innovations such as cloud storage. However, utilizing magnetic tape is on the rise again. With so much big data created by mobile devices and IoT sensors, there is a growing need for an economical and efficient way to back up this data. Many companies are returning to tape to fill this need.
Physical Tape, Virtual Tape, and the Backup Problem
Magnetic tape, however, has its disadvantages. Physical tapes are bulky and handling large numbers of them is a cumbersome and time-consuming process, including shipping offsite or retrieving from storage. Tape is primarily a streaming medium, and it is relatively slow to access an arbitrary position to write or read the data stored on it. Inserting additional information often has to be appended at the end instead of in the middle where other related data may be stored.
To solve the most problematic of these issues, tape was virtualized to allow disks and other storage media to archive the information, thereby allowing automatic processing for recording or retrieving the information with high-speed supporting networks to more easily transfer the information offsite or onsite. Despite these advances, classic backup and restore methodologies using tape, virtual tape, or other technologies still suffer from numerous inefficiencies that must be overcome to allow backups and restores to function in the new big data environments. This article will discuss advances to address these issues.
In current systems, the volume of data being generated that needs to be backed up can easily overwhelm even the fastest virtual tape methods. The problem compounds itself if the body of data grows or quickly changes (big data volumes), and/or the database is constantly and actively being accessed to provide a critical service. We call these mission-critical databases and mission-critical services. In this article, we primarily focus on backing up and restoring transactional mission-critical databases, since these databases support most companies and organizations’ critical applications.
Most mission-critical databases cannot be taken offline, even briefly; therefore, enterprises must create backups of actively updated databases. Unfortunately, since transaction processing is active while the backup occurs, some of the data changes that are being backed up may abort and subsequently come undone, which means the backup has “dirty” data in it. Additionally, as the database is being backed up, the data that was previously backed up is being changed, causing an inconsistent backup.
Fortunately, methods have evolved over time to not only back up the database, but to also capture the subsequent change data that has occurred since the backup started (or completed) so that the inconsistent and stale copy can be made consistent and brought current when retrieved and restored.
How is an online backup process accomplished, and how can it be improved and made more efficient? Doing so would lead to faster backup and recovery methods, use less storage, and provide more consistent and current information when the backup copy is maintained and eventually restored.
Online Backup of an Active Database
A method to back up an active (“online”) database is needed to ensure that the backup is current, consistent, and complete:
- Current means that the backup is up-to-date and not stale. A snapshot of the data means that all of the data that was backed up is kept current to a specific point in time.
- Consistent means that the backup is accurate, (e.g., referential integrity is preserved; the so-called dirty data is removed).
- Complete means that the backup represents the entire database (or a specific/important subset of the data).
- Additionally, the backup should not consume more resources (such as disk or other persistent storage) than is needed to reconstruct the database – either to a point-in-time or to the current state.
The Traditional Backup Method
It is common practice (the Traditional Backup Method) periodically to back up a database onto a medium such as magnetic tape, virtual tape, cloud infrastructure, solid-state storage, or other persistent storage as shown in Figure 1 1. Throughout this article, the use of the phrase tape for the backup copy medium is meant to include all of these storage medium locations and technologies and is not meant to limit the reference to just classic electronic tape technologies2. The use of the word tape or phrase backup medium implies a persistent storage device.
1 For more information on the rise and fall and rise again of tape, please see the Availability Digest article,
2 The recent advances in tape density, writing and reading speeds, and the longevity of tape media over other storage technologies has reinvigorated the use of tried-and-true physical tape for saving copies of information for long periods of time.
As shown in Figure 1, a backup is taken of a source database (2) while it is actively supporting transaction processing (3). Thus, the source database is changing as the backup takes place. This is known as an online backup (4).
The problem with an online backup is that it takes time to complete, and changes are occurring to the database during this time. Data written to the backup could be changing, and if the transaction aborts, the changes will be undone. Data written early in the backup phase is missing subsequent changes, but data written later in the backup contains more of the application’s changes. Therefore, the data in the backup is inconsistent. The classic method to resolve this issue is to capture all changes made to the database while the backup occurs, and eventually to replay them over a subsequently restored copy of the database to “roll” it forward to make it consistent and current.
More specifically, in order to restore a consistent (e.g., from a relational perspective, logically complete and usable to applications) database on a target system, the changes that are occurring during and following the backup must be written to a persistent change log such as an audit trail, a redo log, a journal, or equivalent data structure. In Figure 1, the oldest changes were written to Change Log 1 (5) and the newest changes to Change Log 4 (6).
The restore process then typically involves marking the persistent change log via various methods to note the time or relative position in the change log at which the backup began (7). The database is restored onto the target system by loading the backup copy onto it, and the pertinent change logs are sequentially rolled forward (8) to apply the changes that occurred after the backup started in order to make the target database current, consistent, and complete.
In Figure 1, the pertinent change logs are Change Logs 2, 3, and 4. (Change Log 1 was created before the backup began, and its changes are already reflected in the source database and were captured by the backup operation at the time the backup began.) Therefore, in Figure 1, once the backup copy has been loaded onto to the target database, the changes in Change Logs 2, 3, and 4 must be applied to the target database to bring it current and to a consistent state. It at least must be brought current to the time that the backup operation ended, since additional changes were likely made to the source database after the backup ended.
A problem with this technique is that several change logs may be required to hold the changes that occurred during the backup. For a very active source application with many changes occurring per second, there may be many such change logs required to hold all of the changes that occurred during the backup. These change logs all must be saved and made available (typically very quickly) if a restore sequence is needed.
For instance, as shown in Figure 1, Account 374 initially is backed up with an account value of $10. This change was made in log file 1, which occurred before the backup began. Account 374 subsequently is updated by the application to $74, then $38, and finally to $92; this sequence is reflected in the log files. These values are applied to Account 374 as the roll forward takes place. More specifically, the restore writes the initial value of account 374 from when the original backup occurred ($10). The log files then replay in succession, starting with log file 2, then log file 3, then log file 4 as shown in Figure 1. Unfortunately, the old values for this account replay before ultimately ending at the correct account value of $92. Besides being a lengthy process, which also requires a lot of storage for the log files, any access to the database during this time experiences old and inconsistent information while the replay of the data occurs. If the original database fails, denying users access to this information during this time will prolong an outage.
Furthermore, as shown in Figure 2, many of the changes that occur during the backup operation already may have been captured by the backup if they occurred after the backup operation started, but before those particular data objects (or part of the database) were copied to the backup medium. Thus, these changes are a duplicate of data that already was backed up. Worse, there could be a series of changes to the same data that occurred after the backup began, but before that data was subsequently backed up, and rolling forward through those changes will actually cause the restored data to reflect older (and inconsistent) values while it is being rolled forward, as shown in Figure 2. Account 374 starts off at $10 (when the backup starts), is updated to $74, then $38, and finally to $92; however, it is not backed up until it is $38, as represented by the change captured in log file 3. Using this method of restore and roll forward, Account 374 is initially restored from the backup to $38, but then is updated to old account values ($74 in log file 2, then $38 in log file 3, then $92 in log file 4) while all of the log files are processed and the changes are rolled forward.
Consequently, restoring a backup requires rolling forward through several change logs, which may take a great deal of time and consume a great deal of storage medium resources for all of the change log files. Furthermore, rolling forward through all of the changes that occurred during the backup makes the restored data out-of-date and inconsistent until the final set of changes are replayed from the log file(s). Additionally, during this process, the source database is still being updated; these changes must be logged and rolled forward to update the restored backup to a current and consistent state to when the backup operation ended. All of this processing takes a considerable amount of time to accomplish.
The Better Backup Method
The Better Backup Method is shown in Figure 3. It is similar to the Traditional Backup Method shown in Figure 1 in that the contents of the source database (2) are written to a backup medium (1).
The Better Backup Method – Change Logs
Since the source database is actively being updated, restoring it from the backup medium does not provide a consistent database, because some of the data may be dirty, and changes made to that portion of the source database that were previously backed up are not included in the backup copy. These changes must be captured in a change log and applied to the restored version in order to make it consistent, current, and complete.
The Better Backup Method recognizes that changes for data that are not backed up yet do not have to be written to a change log. These changes were made to the data in the source database and will be carried to the backup medium when they are written to that medium as part of the backup operation. Thus, the consistency of the backup database is preserved without having to roll forward these changes.
The Better Backup Method – Database Restore
During the restore process, the captured changes in the change logs must be rolled forward to the restored copy of the backed-up database. In Figure 3, Change Log 1 (3) contains changes that were made to the source database before the backup began (4). Therefore, its contents do not have to be rolled forward to the backup copy of the database when it is restored. However, Change Log 2 (5) contains some changes that were made to the source database following the initiation of the backup; and these changes must be rolled forward to the restored backup copy to make the database consistent. Once the changes have caught up with the online backup, there is no further need to log changes and to roll them forward. All changes to the source database will be included in the online backup data stream (6), guaranteeing the consistency of the backup database. Therefore, Change Logs 3 and 4 (and perhaps some changes in Change Log 2) do not have to be saved nor applied to the backup when it is restored.
Note that during the restore process, the database is not in a consistent state; it is made consistent once all of the changes in the change log are rolled forward to it. Thus, the restored database eventually is consistent, current, and complete, which is also known as eventual consistency.
Also, note that the data being restored is not going to revert to previous values during the restore process. For instance, assume that the backup begins at time T1, and data D1 is changed after T1 to D2, then to D3, then to D4. This data object backs up at time T2 when its value is D2. The classic approach backs up D2, then rolls forward changes and sets it back to D1 (as that is the first change restored), then D2, D3, and finally D4. Therefore, the database is very inconsistent during the restore process and in fact, is rolled back to a previous value when D1 is applied.
One alternative approach is to capture the database at D2 and not replay the D1 or D2 changes, and only replay the D3 and D4 changes. Over time, the database is consistent; it resets to older values than the final value, but not older than the initial value. Another alternative approach is to capture D2 and then overlay it with D3 and later D4 (either in the change log or the backup copy itself) before beginning the restore process.
To resolve backed up dirty data, either aborted information is removed from the logs during replay, or the dirty data is overwritten by the eventual “backout” data that is written when a transaction aborts. Removing the aborted information is a simple process if the logs are read in reverse, as discussed later, or if a list of aborted transactions is maintained along with the change logs so that when the change logs are applied (rolled forward), any aborted transactions can be skipped.
Only portions of the change logs that are required under the Traditional Backup Method are needed in the Better Backup Method. The fewer the change logs, the less processing is required to create them and the less storage is required to save them. Perhaps even more importantly, the fewer the change logs, the less time is required to roll them forward, and the online backup/restore processing becomes much faster and more efficient. Additionally, the restored data goes through fewer data consistency issues (and in some implementations no issues) while it is being restored to a current and complete value.
Performance and Efficiency Improvements
An improvement in performance and efficiency can be achieved by saving only the last change to a specific data object that is being modified multiple times, as shown in Figure 4. In the figure, only the most recent change to a particular data item is shown; previous changes to that same data item are removed. More specifically, if a change is made to a data object that was previously changed, the first change can be located in the change log and replaced with the new change. If the first change previously was backed up, it can be located on the backup medium and replaced with the new change.
Alternatively, changes to previously backed up data directly can be made to the backup medium as shown in Figure 5. This method eliminates the need for change logs and roll-forward operations.
Another potential performance improvement can be achieved by reading the log files in reverse during the backup, and eliminating any data for transactions that abort as well as only saving the most recent (committed) change for each data item encountered.
In a similar manner, the backup operation can physically process the source database, block by block, rather than logically processing it by ascending (or descending) key path or some other logical or physical order (as mandated by the technology being used). This physical process can make the determination of whether to save a change that has occurred, since the backup is much faster. More specifically, using a physical path (such as the physical order the blocks appear in the file) to access the data is often much faster than using a logical path (such as an index tree) to access the data when the backup is initially taken.
The Continuous Backup Method
The Continuous Backup Method provides the capability to continuously save further changes made to the source database after the backup is taken in a persistent change log. As the backup copy is initially copied, any changes that were made to the previously copied portion are written to the continuous backup change log. Thereafter, all further changes to the source database also are written to the continuous backup change log. The backup copy becomes consistent, current, and complete at that (and every) point in time by continuously rolling forward the changes in the continuous backup change log to the backup copy.3 When it is time to restore the database, the backup copy simply is written to the target database to bring it consistent, current, and complete.
3Of course, performing a continuous backup starts to approach the availability and consistency/completeness of using a classic data replication engine to create and maintain the backup copy. While we advocate using data replication techniques to provide a viable backup copy of your production database (visit www.ShadowbaseSoftware.com/solutions/business-continuity/ for such a data replication engine implementation), we understand that some customers will continue to require backup copies via the more traditional methods, especially for creating snapshot point-in-time copies of data. We hope that the new methods discussed in this article will help improve state-of-the-art solutions for such backups.