Fixing BYU’s massive data crash will take months

1081

Some long-term effects of a major BYU computer system meltdown are still unknown after more than two weeks of around-the-clock work to repair damage from a failed software upgrade.

Most university-wide systems are functioning, but the university’s Office of Information Technology continues to work around the clock with individual departments on campus.

Gary Glade, desktop support manager for FHSS Computer Services, said he heard from students whose inaccessible projects posed problems for graduation, and professors who could miss deadlines and lose grants because their research was now unavailable.

Visual Arts secretary Sonya Schiffman said the department’s website was down during the BFA animation online application deadline. The program the application used has an off-campus server, but because the web site was down, students needed to contact the department directly to get the link.

“We may have missed some students,” Schiffman said. “If I didn’t have their email, I don’t know who’s having problems.”

Effects from the server outage varied across campus because of different data management strategies.

“The university Data Center is divided into two large domains,” said Tracy Flinders, Managing Director of the Office of Information Technology. “One is the managed domain, where core enterprise university systems are actively managed by OIT. The other domain, referred to as the independent domain, is where email and some department and college data resides.”

The May 27 software upgrade had been planned for some time, scheduled when campus activity and computer activity were minimal: a Sunday evening on a holiday weekend after the end of the core academic year. Changes at the basic storage level had the potential to affect the entire base data layer. Engineers were on hand as the software upgrade began. Problems began popping up quickly.

“We noticed error messages between the storage layer and the virtualization layer as the upgrade process began,” Flinders said. “As you can imagine, the storage layer goes underneath everything, so if you have a failure at that low level then it ripples through the rest of the system. That’s essentially what happened during the upgrade process.”

Problems were immediate and widespread. Sleeping bags and food came in at the OIT headquarters as around-the-clock fixes began.

Some critical data had been secured before the update. Additional data was able to be restored quickly because recent data backups were available. Programs like payroll and the BYU.edu homepage were largely unaffected, Flinders said.

The independent domain, housing email accounts and college and department data, was more widely impacted. According to Flinders, a significant percent of @byu.edu email accounts were unavailable between 24 and 48 hours. Voicemail, learning outcomes and teaching assessments were also among those affected.

Flinders estimates another week will pass before engineers can turn full attention to the core causes of the upgrade failure. He said there is not yet an accounting of the effects on the many independently managed data systems on campus — each of which has unique data management and data backup practices.

“The OIT data in the managed domain has a very rigorous backup schedule,” Flinders said. “I would say the same is true for most of the departmental data. There were pockets where a backup may not have occurred recently. My experience with most of the departments is that many of them have fairly mature backup strategies, but it is likely that there were some that were a little less mature.”

Flinders believes permanent data loss is “minimal,” but recognizes that any data loss is unfortunate.

“We believe a very high percentage of the data will be able to be restored, but it is going to take weeks, or perhaps months, to work through this process,” Flinders said. “We recognize that all data is critical to those who rely upon it and recognize the pain associated with any data loss. The critical university data is protected in the managed domain and has been restored. It’s the departmental data that we’re trying to get back as much as possible.”

Flinders said Monday that OIT staff work still continues around the clock.

A report on BYU’s homepage chronicles individual computer system problems and tracks fixes. Also as of Monday, voicemail, Life Sciences resources, FHSS resources, LDS Philanthropies’ U drive and P drive, Counselling and Career Center U drive, IPTV, Fine Arts resources, Student Life resources,  Active Directory, Digital Dialog, Faculty Center S drive, visualarts.byu.edu, Student Financial Services’ S drive, internationalservices.byu.edu and BOB Portlet may still be experiencing problems. The OIT website states workers will next undergo recovery for FHSS and Life Sciences.

For latest updates and information, OIT has a web page with updates to BYU services as they happen. Campus departments still experiencing problems are advised to call the IT Service Desk at 801-422-4000 or email .

Print Friendly, PDF & Email