BYU Law creates language database to help interpret Constitution

Dean Smith
BYU Law Dean Gordon Smith shares at a corpus linguistics conference. (Lynette Rands)

The Constitution is America’s central legal document. However, it was written a long time ago, and language has since evolved. Changing language can make the law difficult for lawyers and judges to interpret.

What does it really mean to “bear arms?” How should readers understand the phrase “high crimes and misdemeanors?” BYU Law created a database to help answer questions like these.

This database is called the Corpus of Founding Era American English, also known as COFEA. “Corpus” refers to a collection of written texts on a particular subject. The corpus holds founding-era documents that can be used by legal professionals for free as a tool to make educated legal decisions.

BYU linguistics professor Mark Davies creates various corpora for the linguistics department and was involved in the beginning stages of the corpus. 

“We have all these words in the Constitution — words and phrases that, 200-250 years later, we don’t really know what they meant at that time. We can’t go in a time travel machine to go back 240 years, but what we can do is scoop in hundreds of millions worth of text from that time and say, oh well, when people were using a word or phrase, they were using it in this context,” Davies said.

D. Gordon Smith, dean of the J. Reuben Clark Law School, said the idea for the corpus began with a simple discussion.

Stephen Mouritsen, a former BYU law student who also has a master’s in linguistics, asked Smith about the difference between the words “stockholder” and “shareholder.” They found that though the words are used interchangeably, they mean different things. 

After Mouritsen graduated and began working as a lawyer, Smith became the dean of the J. Reuben Clark Law School and worked to develop a course based on corpus linguistics in law. Mouritsen taught the course via Skype from a remote location, and Smith began to blog about the idea and received positive feedback from his colleagues at institutions like Georgetown University.

The concept of corpus linguistics was also embraced by Utah Associate Chief Justice Thomas Lee.

In Utah, the concept of corpus linguistics was used in the case State v. Rasabout, in which the court had to interpret the meaning of the word “discharge” in regards to a statute that mandates a class B misdemeanor if a firearm is “discharged” from an automobile. Several rounds were fired, making it difficult to interpret what “discharge” actually meant. Lee was able to use corpus linguistics to accurately sentence the perpetrator. 

After the success of the case, Lee began to visit other states’ Supreme Courts to introduce the database and train judges on how to use it. 

Based on decisions like this, BYU Law began to develop the corpus. According to Smith, it has been adopted by Supreme Courts in Michigan, Idaho, and most recently, Utah. Federal Courts have started to use the database as well.

Despite all of its success, the database is still in its beta form. Brett Hashimoto, a corpus linguistics research fellow for BYU Law, is currently working on further developing the corpus. The database is still available for use in its present state.

“One project that I’m involved with right now involves what is called linguistic canons of statutory interpretation,” Hashimoto said. “These cannons are basically the guidelines by which many judges interpret the law.”

BYU holds annual conferences about corpus linguistics and continues to develop its research.

The Corpus of Founding Era American English can be found at

Print Friendly, PDF & Email