
Leavey MSBA Students Discover AI’s Power as a Financial Tool Benefits from Helping Human Hands
When it comes to measuring financial sentiment, large-language models (LLMs) outperform traditional tools used by data scientists, with a caveat: They still require careful and deliberate guidance from people to work properly.
That’s the high-level summary of “Can AI Read Between the Lines? Benchmarking LLMs on Financial Nuance,” recent research published in the arXiv digital journal in May 2025. The work comes not from seasoned faculty researchers but rather from an impressive group of master’s students at the Leavey School of Business.
“LLMs are so powerful for financial analysis, but they need hand-holding,” says Dominick Kubica, one of the student researchers. “They require domain expertise. That human element really is one of our key takeaways.”
The research paper emerged from students in Leavey’s master’s in business analytics (MSBA) and master’s in information systems (MSIS) program, more specifically the program’s practicum, in which students work directly on trending business challenges with established companies such as Microsoft, Google, Meta, or local startups. In this case, a group of four students worked directly with Microsoft, under the tutelage of Leavey lecturer and accomplished consultant Charlie Goldenberg.
Kubica and teammates Nanami Emura, Derleeen Saini and Dylan Gordon were presented with a broad topic for the practicum, then worked directly with Microsoft and Goldenberg to narrow its focus to an important research question: Can today’s LLMs handle the nuance of financial sentiment?
LLMs are a form of AI. Examples include ChatGPT, Gemini and Microsoft Copilot. They’re increasingly popular for use in a variety of fields, but research into their effectiveness is still new. So the team set about benchmarking by comparing them with more traditional tools such as Python libraries. Importantly, most traditional tools require coding skills for effective use. If effective, LLMs would be a more widely accessible option for assessing financial texts.
“We ultimately came at it from two fronts, saying LLMs really are impressive and are performing the best,” Gordon says. “But then we also had to bring people back to reality, noting that you can't just depend on them fully yet. We detailed in documentation all the times we struggled, all the times AI lied to us or hallucinated. It struggled with certain formats, post-processing errors, pre-processing errors, you name it.”
Emura provides a specific example: “When I tried to split a set of transcripts up into different business lines, simple prompts like ‘Split this by business line’ did not work. The Microsoft team advised me to do more detailed prompts and more structured instruction, and then we were able to get better results. That made me realize that the effectiveness of LLMs really depends on how we guide them.”
Ultimately, the team’s research results showed that LLMS outperform traditional models for measuring financial sentiment. But the paper also calls for more focus on enterprise-level needs and making sure different systems communicate effectively, and, of course, the continuing need for and value of human expertise.
Real Problem, Real Results
Perhaps even more important than the results of this research — at least for the students involved — was the experience of creating something “real” beyond the classroom.
““It’s rare for a practicum project to progress to publication,” says Haibing Lu, department co-chair in information systems and analytics. “This team did an amazing job.”
The team also got a comprehensive experience that mimics what they’ll face as they earn their degrees and head out into the workforce, from working with others effectively to presenting results to a group of stakeholders.
For instance, they learned how to divide and conquer based on each team member’s strengths and expertise. They had varying levels of experience with both financial sentiment and LLMs, but each member brought unique perspective to the table. Gordon used his experience in biochemistry to help shape the research question and methodology. Kubica put his coding skills to work by building an app to perform the research. In addition to the previously mentioned data analysis and troubleshooting, Emura performed outreach to professional journals to get the research published. And Saini use her data analytics and presentation skills to help streamline exactly what the team put in front of the Microsoft team, ensuring it was clear for people with varied levels of understanding.
“Charlie [Goldenberg] really helped us narrow down the work we were doing and figure out the insights we wanted to present,” Saini says. “In a presentation, you have to be able to back up your research and your claims in a way that even nontechnical people can understand.”
At the final presentation, their conclusions came through so clearly that the Microsoft team asked if the students planned to publish their work. Not only did Microsoft find the insights valuable as a company, they thought others in the industry would, too.
“We wanted to make sure we came across as professional and brought real value to the table,” Gordon says, “so their response really gave us some confidence that we're on par with these working professionals, that we can be part of a back and forth conversation about challenges, and that we can be respected for our thoughts and work.”
“Real” is a common word in discussion of the practicum as a whole and this research in particular. Reality is exactly what it provides: real business challenges, real presentation skills, and real outcomes.
“To solve a real problem is a key to this education,” Lu says. “AI is everywhere, and AI literacy is incredibly important. A project like this benefits the students, and it helps prove to the outside world that we are a leader in this space.”