Students who have been away for the summer may be hearing from their professors about the campus-wide data crash that interrupted research and reportedly even delayed some graduations.
The problem started over the Memorial Day weekend when a bottom-up software upgrade at the BYU Data Center went awry, corrupting data and locking users out of data across campus. Some problems, like interruptions in email service, were short-term, but BYU’s Office of Information Technology is still working on data recovery, and going through the difficult process of assessing the importance and recovery costs of large volumes of computer data.
“Priority is a difficult thing to determine,” said Nyle Elison, OIT product line manager, who said several major factors determine priorities for data recovery. “What’s important to me probably isn’t important to you. But certainly, it’s priority and the greatest number of people affected first.”
The university is closer to knowing the scope of the data failure but has not released any information about the costs, which include damaged computer hardware, personnel time, lost opportunity costs, missed deadlines for grants and the costs of recovering data from damaged hard drives.
An off-site data recovery center worked on the damaged servers soon after “the event” — as Elison has named the large crash. A trial run of 20 terabytes was first repaired to see whether any data could be salvaged. With the successful return of that data, other servers were sent to specialized data recovery firms.
Data recovery centers close to BYU gave estimated costs on the amount they would charge to recover 20 terabytes of data on multiple hard drives with a Windows system. More technical details would come into play, each center said, but overall the estimated price was well into the thousands with a turnaround time of two to three weeks for each job.
[pullquote]”I wouldn’t be surprised if it was up to fifty, sixty, seventy thousand dollars to recover that data.” — Yu Chao, owner of Complete Data Recovery Service[/pullquote]
Depending on the number of damaged drives the data is spread over, the cost can dramatically increase. Averaging the costs from local recovery centers, the cost to recover data on one drive is between $2,000 to $5,000. If the 20 terabytes were spread over 20 one-terabyte drives in a cluster, the cost could be anywhere from $40,000 to $100,000.
“Obviously it would depend on the drive and I would need access to the drive to see for sure,” said Yu Chao, owner of Complete Data Recovery Service in Provo. “But I wouldn’t be surprised if it was up to fifty, sixty, seventy thousand dollars to recover that data.”
Brent Jackson, an engineer with Advanced Data Recovery in Provo, said they recently dealt with a case involving 35 to 40 terabytes of information and it took them more than two weeks to complete the data recovery process. He estimated the cost of that case to be around $50,000.
Dave Robinson, vice president of marketing for the online backup service, Mozy, said he he didn’t want to sound “too geeky” but examples like this are why it’s smart for every person and business to keep backups off-site.
“If your server failed, it’s not too much money to get your data from a remote location,” Robinson said. “When you get into the second scenario, when you lose the back-up, that’s when it becomes a very expensive endeavor.”
The Data Center houses much of the back-up data on campus, but about 30 campus departments manage their own backups with each department using its own procedures. The amount of damage each department experienced depended, in part, on how its backups were being managed. The different data management practices have significantly complicated OIT’s work in the recovery phase.
Elison said potential failure in the physical drive recovery process is the reason OIT sent only one server to be fixed.
“This recovery I know of probably 10 to 12 servers so far that need to be sent off-site,” Elison said in June. “So far we have sent one, simply because we wanted to make sure this was going to work. It’s fairly expensive to do; we want the data back but before we said ‘do them all’ we wanted to make sure it worked.”
The process of recovering data completely depends on what failed in the server, Jackson said. A usual procedure would begin in the “clean room” where the failed drive is physically cleaned. Next, sector-by- sector copies of each drive are made on either one larger hard drive or multiple smaller hard drives. An engineer than rebuilds the drive working through the complex puzzle, piece by piece. Finally, files are sent to be tested and verify the data has been recovered.
OIT has projected its repair work will continue at least through the end of the year. Departments that lost their summertime research window have said they have projects that will be delayed at least for a year.