Top 3 security vulnerabilities for Large Language Models

Every time I think about web application security, OWASP comes to mind. What is OWASP?

OWASP stands for the Open Web Application Security Project. It is a nonprofit organization that focuses on improving the security of software. OWASP provides freely available tools, documentation, and resources for developers, security professionals, and organisations to improve the security of their web applications.

One of the key contributions of OWASP is the OWASP Top 10, which is a regularly updated list of the top ten most critical security risks facing web applications. This list helps developers prioritise their security efforts and address the most pressing vulnerabilities.

Any application built on top of an existing LLM or GenAI is essentially a web application, so OWASP also monitors this aspect. Let's explore the top 10 potential security risks when deploying and managing Large Language Models.

Prompt Injection

What is it? The Prompt Injection Vulnerability arises when an attacker manipulates a large language model (LLM) using carefully crafted inputs, thereby causing the LLM to carry out the attacker's objectives unwittingly. This manipulation can occur either by directly breaching the system's prompt or indirectly through manipulated external inputs, which may result in various concerns such as data theft, social engineering, and other related issues.

How it’s working? An attacker could inject biased language or leading questions into the prompts given to a language model, causing it to generate outputs that promote misinformation, hate speech, or other undesirable content.

Real world example Application for HR, which uses an LLM to check if the uploaded CV fit the needs.

In this example, a malicious actor uploads a document containing hidden instructions—a prompt injection—to manipulate a Large Language Model (LLM). The document suggests it's an excellent job application. When processed by an unsuspecting user through the LLM, the injected prompt alters the LLM's output to praise the document's quality falsely. This demonstrates how prompt injections can deceive LLMs into producing biased or manipulated results, posing significant risks to data integrity and decision-making processes.

Prevention Prompt injection vulnerabilities exist due to the inherent nature of LLMs, which do not differentiate between instructions and external data. Because LLMs use natural language, they treat both forms of input as equally valid. As a result, there's no fool-proof prevention mechanism within the LLM to mitigate prompt injection risks, but you can lower the impact using strategies like:

Implement privilege control for LLM access to backend systems
Incorporate human oversight to enhance functionality
Separate external content from user prompts
Regularly monitor LLM input and output manually

Insecure Output Handling

What is it? Insecure Output Handling involves inadequate validation, sanitization, and management of outputs generated by large language models before they're used elsewhere. It can lead to XSS, CSRF in web browsers, as well as SSRF, privilege escalation, or remote code execution on backend systems.

How it’s working? In the scenario where LLM output is directly inserted into a system shell or a similar function, it can lead to remote code execution. This means that if the LLM generates code or commands that are executed by the system, an attacker could potentially manipulate the LLM to produce malicious code, leading to unauthorized actions on the system, such as accessing sensitive data or compromising its security.

Real world example A website which can share generated LLM content between different users

In this scenario, a web application utilizes a Large Language Model (LLM) to generate content based on user text prompts without properly sanitizing the output. This lack of output sanitization means that the content produced by the LLM is not checked for potentially malicious code or scripts before being presented to users.

An attacker could exploit this vulnerability by submitting a carefully crafted prompt to the web application, causing the LLM to return an unsanitized JavaScript payload as part of the generated content. When this content is rendered in a victim's browser, the embedded JavaScript payload executes, leading to a Cross-Site Scripting (XSS) attack.

Prevention

Treat the model as any other user
Follow the other good practices for input validation and sanitization
Encode model output to mitigate undesired code execution by JS or Markdown

Training Data Poisoning

What is it? The critical aspect of any machine learning method lies in its training data, often referred to as "raw data." This data should encompass various domains, genres, and languages to achieve high capabilities, such as linguistic and worldly knowledge. Large language models utilise deep neural networks to generate outputs based on patterns learned from training data.

Training data poisoning involves manipulating pre-training data or data used in fine-tuning or embedding processes to introduce vulnerabilities, backdoors, or biases that could compromise the model's security, effectiveness, or ethical behavior. Poisoned information may surface to users or create risks like performance degradation, downstream software exploitation, and reputational damage.

How it’s working? In this context, a malicious actor or a competitor brand intentionally creates inaccurate or malicious documents that are included in the dataset used for a model's pre-training, fine-tuning, or embedding processes.

This action is intended to manipulate the model's learning process, potentially leading to several adverse outcomes:

Bias Introduction
Model Degradation
Security Risk
Reputation Damage

Real world example Brand A implemented a chatbot powered by a Large Language Model (LLM), designed to provide quick and efficient customer assistance with their inquiries and technical issues.

Competitor Brand B, seeking to undermine Brand A's reputation and gain a competitive edge, devises a nefarious plan. Brand B infiltrates the training data of a machine learning model. This could be done by injecting documents containing misinformation, biased content, or even malicious code.

As a result of the injected false training data, Brand A's chatbot begins to exhibit erratic behavior. It provides incorrect answers to customer questions, offers inappropriate solutions to technical problems, and, in some cases, may even respond with offensive or nonsensical replies.

Prevention

Verify the supply chain of the training data
Verify your use-case for the LLM
Ensure sufficient sandboxing through network controls is present to prevent the model from scraping unintended data sources
Use strict vetting or input filters for specific training data
Testing and Detection

"With great power comes great responsibility"

As Large Language Models (LLMs) become increasingly integrated into various applications, it's crucial to recognize their potential security vulnerabilities. While LLMs offer immense power and capabilities in natural language processing, they also introduce significant threats to application security.

In this article, we explore the top three security vulnerabilities associated with LLMs. These vulnerabilities can include prompt injections, output handling flaws, and training data poisoning. Each poses unique risks, ranging from unauthorized access to sensitive data to the introduction of biased or malicious outputs.

It's essential to understand that LLMs are not standalone entities but integral parts of application infrastructures. Therefore, addressing their security vulnerabilities is paramount for safeguarding overall system integrity.

To mitigate these risks effectively, companies and development teams are encouraged to stay informed about emerging security issues related to LLMs. Reading reports from organizations like OWASP can provide valuable insights into best practices and strategies for securing LLM implementations.

In conclusion, while LLMs offer transformative capabilities in natural language processing, they also bring significant security challenges. By remaining vigilant and proactive in addressing these vulnerabilities, organizations can harness the power of LLMs while minimizing potential security risks.