Can LLMs Reliably Support Domain-Specific Research?

Jun 13

In our previous blog, we delved into advancements in data management and analysis tools with Microsoft Fabric and Copilot in Power BI. As we continue on this journey of harnessing the rapid evolution of machine learning and artificial intelligence for international development projects, we now shift our focus to the transformative capabilities of Large Language Models (LLMs) and their potential impact on domain-specific applications. In doing so, it is essential to acknowledge that text-focused machine learning has been a subject of research and development for years. From sentiment analysis to text classification, our daily digital experiences are heavily shaped by computers’ abilities to understand text. However, the recent breakthroughs in LLMs, such as ChatGPT and others, have propelled the field forward, enabling us to explore new possibilities in understanding and processing language.

In this blog, we will take a look at some of the opportunities and risks associated the application of LLMs, and discuss how we’re exploring the development of domain-specific models that can ingest both unstructured documents and structured data, enabling powerful insights through conversational AI.

Identifying the Possibilities and Risks for Research

Large language models (LLMs) offer a powerful solution for analyzing both structured and unstructured data, presenting new opportunities for data-driven decision-making. Traditionally, structured data with predefined formats has been easier to analyze and derive insights from. However, unstructured data, such as text documents and social media posts, contains valuable information that is often untapped. LLMs excel in processing and extracting insights from unstructured data sources, enabling organizations to gain a comprehensive understanding of complex phenomena. However, they’re not the first tools to offer text analysis solutions.

When it comes to qualitative data analysis, tools like Nvivo have been widely used to analyze unstructured data in a structured manner. Nvivo allows researchers to organize, code, and analyze qualitative data, such as interviews, surveys, and open-ended responses. By leveraging Nvivo's capabilities, researchers can extract meaningful insights, identify patterns, and explore themes within their qualitative data. While the comparison may not be apples to apples, from a user’s perspective, the ability to manage large volumes of text-based data, explore themes, summarize information, etc., certainly has its parallels. Furthermore, Atlas.ti has recently released automated coding solutions for qualitative research, leveraging OpenAI to develop tags. However, I’m not the only one with deep concerns about using opaque large language models to manage qualitative data analysis coding tasks. And unfortunately, the risks associated with these LLMS as a tool for research have extended into multiple sectors.

Beyond the identification of key themes in text, the recent case involving law professor Jonathan Turley highlights the serious risks of relying solely on general-purpose LLMs for domain-specific research. These are not purely semantic search engines that bring up evidence-backed journalism and research, but rather tools that predict the next word in a sentence. In this instance, the AI chatbot generated false information, falsely accusing Turley of sexual harassment based on a non-existent article from The Washington Post. This incident emphasizes the importance of acknowledging these models' inherent risks and biases, prompting us to explore ways to leverage their power while ensuring reliable information delivery.

Risk Mitigation Strategies

To address the inherent risks and biases associated with large language models (LLMs), it is crucial to employ methods and approaches that promote reliable information delivery. Here are some key strategies to consider:

Curated Training Data: Building LLMs on curated and verifiable datasets specific to the domain of interest can enhance the reliability of generated insights. By carefully selecting and curating training data, we can mitigate the risk of exposing models to outdated or biased information.
Domain-Specific Fine-Tuning: Fine-tuning LLMs on domain-specific data and incorporating expert knowledge allows for greater accuracy and relevance in the generated outputs. By tailoring the models to the specific needs of the domain, we can ensure that the insights produced are aligned with the requirements and nuances of the field.
Evaluation and Bias Mitigation: Rigorous evaluation methodologies should be employed to assess the performance and biases of LLMs. By continually monitoring and identifying potential biases, we can take proactive steps to mitigate them, ensuring that the generated information is reliable and unbiased.
Collaboration and Peer Review: Engaging in collaboration and subjecting the models to peer review processes can enhance their reliability. By seeking input from domain experts and researchers, we can incorporate diverse perspectives and refine the models based on collective knowledge and scrutiny.
Transparency and Accountability: Emphasizing transparency in model development and sharing information about the data sources, training methods, and limitations helps establish accountability. Openly addressing the strengths and limitations of the models fosters trust and enables users to make informed judgments about the reliability of the generated insights.
Continuous Improvement: Commitment to ongoing research, development, and refinement of LLMs is essential to improve their performance and address any identified shortcomings. Regular updates and advancements based on feedback and new insights contribute to the production of more reliable information over time.

By adopting these methods and approaches, we may be able to reduce the bias of LLMs, improving the delivery of reliable information. But with LLMs seemingly in the hands of technology giants such as Google and Microsoft, what are our options? And are the efforts to improve language models worth it? I tend to think so.

Implications for Domain-Specific Work

The potential of reliable large language models extends to various domains, offering transformative data analysis and decision-making possibilities. By combining advanced language processing capabilities with domain-specific knowledge, LLMs could empower researchers, policymakers, and practitioners to navigate complex challenges.

At exchange.design, our exploration of LLMs in domain-specific work has recently focused on the realm of conflict analysis research. While it may seem unconventional to apply artificial intelligence in an inherently human field (nevermind the use of remote sensing image classification for a variety of conflict-related tasks), the nature of conflict management work presents significant challenges in digesting vast amounts of information about people, social groups, events, identities, and lessons learned from past efforts. What if we could triangulate lessons learned from hundreds of documents and bring out insights that are most relevant to our current work? How could LLMs be used to make sense of public statements (beyond sentiment analysis) and synthesize their change over time? What are the implications (and risks) of text-to-SQL for putting data analysis tools in the hands of non-analysts?

In response to these questions, we are developing conversational AI tools capable of ingesting unstructured documents and structured data. These models enable dynamic conversations with a transparent body of information, facilitating an exploration of heterogeneous data sources and unlocking powerful insights. This goes far beyond semantic search and into the realm of information summarization. And our current approach to bias minimization is curated training data and domain-specific fine-tuning. Tools such as Llama-Index and LanchChain have put these tools into the hands of developers, enabling the creation of conversational AI tools that integrate with multiple LLMs. We’re early on this journey but are already seeing impressive results.

Acknowledging Risks, Advancing Possibilities

To be honest, rarely a day goes by when I’m not completely taken aback by the power of these technologies. As large language models (LLMs) and their associated tools continue to evolve, it is evident that they are here to stay. However, the incidents and challenges discussed earlier highlight the importance of understanding the limitations and biases associated with LLMs. But rather than dismissing these technologies, we must embrace them with a proactive mindset. By acknowledging the risks, investing in research and development, and implementing robust methodologies and data governance strategies, we can harness the power of LLMs while ensuring more reliable and less biased information delivery.

In a future blog, we may delve further into the fascinating realm of LLMs and their incredible power to transform natural language into SQL. With this capability, LLMs bridge the gap between human language and the structured world of databases, enabling users to query and manipulate data using plain, conversational language. This breakthrough opens up new possibilities for data analysis, making it more accessible and intuitive for a broader range of users. Stay tuned as we explore the transformative potential of LLMs in unlocking the power of natural language querying for domain-specific work.

Greg Maly

Can LLMs Reliably Support Domain-Specific Research?

Identifying the Possibilities and Risks for Research

Risk Mitigation Strategies

Implications for Domain-Specific Work

Acknowledging Risks, Advancing Possibilities

Announcing Evidencebase.ai - Integrated Conversational AI Solutions

What Microsoft Fabric and Copilot Mean for International Development Programs

Links