Few words about AI Security

Introduction

Hello,

I started a Linkedin post couple of weeks ago with the same words like what’s the subject of this post. Anyway this post intend to be an introduction to AI security of things I have found. There are much more things around the world but since I don’t have much freetime I have prioritized it to these topics and because I’m interested in those.

In this shortishh post I will try to introduce some frameworks, methods, techniques what are the biggest actors in this area and where you can start if you want to find out more. Like always I start with basics.

AI Security involves using artificial intelligence to enhance an organization’s security posture. This includes automating threat detection, prevention, and remediation to better combat cyberattacks and data breaches. AI systems can analyze vast amounts of data, such as traffic trends, app usage, and network activity, to discover patterns and establish a security baseline. Any activity outside that baseline is flagged as an anomaly and potential cyber threat, allowing for swift remediation.

AI security tools often use machine learning and deep learning to analyze data and generative AI to convert security data into plain text recommendations, streamlining decision-making for security teams.

Another aspect of AI Security is securing AI systems themselves from cyber threats. This involves understanding how threat actors can use AI to improve existing cyberattacks or exploit new attack surfaces. For example, large language models can help attackers create more personalized and sophisticated phishing attacks.

OWASP & TOP TEN

“The Open Worldwide Application Security Project (OWASP) is a nonprofit foundation that works to improve the security of software. They have entered (thankfully) also to AI sector and have released couple of new Top Tens.

The OWASP Top Ten is a standard awareness document for developers and web application security. It represents a broad consensus about the most critical security risks to web applications. It was started in 2003 to help organizations and developer with a starting point for secure development. Over the years it’s grown into a pseudo standard that is used as a baseline for compliance, education, and vendor tools. “

Large Language Model Applications

Founded in May 2023, the OWASP Top 10 for LLM Applications Working Group set out to create a definitive guide on vulnerabilities, mitigations, and best practices for LLM applications. Released in August 2023, the guide received widespread acclaim, marking the beginning of a dynamic and expanding initiative. With over 1,000 members, the group has updated the core document, translated it into multiple languages, and developed additional resources like “The LLM AI Cybersecurity & Governance Checklist.” Engagement with standards bodies such as NIST and MITRE has further established the group’s role in shaping cybersecurity practices.

The OWASP Top 10 for Large Language Model Applications project aims to educate developers, designers, architects, managers, and organizations about the potential security risks when deploying and managing Large Language Models (LLMs). The project provides a list of the top 10 most critical vulnerabilities often seen in LLM applications, highlighting their potential impact, ease of exploitation, and prevalence in real-world applications.

The LLM AI Security and Governance Checklist is available as PDF.

One of the important features of that report is the AI Threat Map which illustrates the challenge of balancing the different types of threats.

Machine Learning Security

“The primary aim of of the OWASP Machine Learning Security Top 10 project is to deliver an overview of the top 10 security issues of machine learning systems. As such, a major goal of this project is to develop a high quality deliverable, reviewed by industry peers.

This project will provide an overview of the top 10 security issues of machine learning systems. Due to the rapid adoption of machine learning systems, there are related projects within OWASP and other organisations, that may have narrower or broader scope than this project. As an example, while adversarial attacks is a category of threats, this project will also cover non-adversarial scenarios, such as security hygiene of machine learning operational and engineering workflows.”-The Charter of the Machine Learning Security.

NOTE – The project version is currently DRAFT 0.3.

ML01:2023 Input Manipulation Attack

Input Manipulation Attacks is an umbrella term, which include Adversarial Attacks, a type of attack in which an attacker deliberately alters input data to mislead the model.

Example attack:

A deep learning model is trained to classify images into different categories, such as dogs and cats. An attacker manipulates the original image that is very similar to a legitimate image of a cat, but with small, carefully crafted perturbations that cause the model to misclassify it as a dog. When the model is deployed in a real-world setting, the attacker can use the manipulated image to bypass security measures or cause harm to the system.

ML02:2023 Data Poisoning Attack

Data Poisoning Attacks occur when an attacker manipulates the training data to cause the model to behave in an undesirable way.

Example attack:

An attacker poisons the training data for a deep learning model that classifies emails as spam or not spam. The attacker executed this attack by injecting the maliciously labeled spam emails into the training data set. This could be done by compromising the data storage system, for example by hacking into the network or exploiting a vulnerability in the data storage software. The attacker could also manipulate the data labeling process, such as by falsifying the labeling of the emails or by bribing the data labelers to provide incorrect labels.

ML03:2023 Model Inversion Attack

Model Inversion Attack occur when an attacker reverse-engineers the model to extract information from it.

Example attack:

An advertiser wants to automate their advertising campaigns by using bots to perform actions such as clicking on ads and visiting websites. However, online advertising platforms use bot detection models to prevent bots from performing these actions. To bypass these models, the advertiser trains a deep learning model for bot detection and uses it to invert the predictions of the bot detection model used by the online advertising platform. The advertiser inputs their bots into the model and is able to make the bots appear as human users, allowing them to bypass the bot detection and successfully execute their automated advertising campaigns.

The advertiser executed this attack by training their own bot detection model and then using it to reverse the predictions of the bot detection model used by the online advertising platform. They were able to access this other model through a vulnerability in its implementation or by using an API. The end result of the attack was the advertiser successfully automating their advertising campaigns by making their bots appear as human users.

ML04:2023 Membership Inference Attack

ML04:2023 Membership Inference Attack occur when an attacker manipulates the model’s training data in order to cause it to behave in a way that exposes sensitive information.

Example attack:

A malicious attacker wants to gain access to sensitive financial information of individuals. They do this by training a machine learning model on a dataset of financial records and using it to query whether or not a particular individual’s record was included in the training data. The attacker can then use this information to infer the financial history and sensitive information of individuals.

The attacker executed this attack by training a machine learning model on a dataset of financial records obtained from a financial organization. They then used this model to query whether or not a particular individual’s record was included in the training data, allowing them to infer sensitive financial information.

ML05:2023 Model Theft Attack

ML05:2023 Model Theft Attacks occur when an attacker gains access to the model’s parameters.

Example attack:

A malicious attacker is working for a competitor of a company that has developed a valuable machine learning model. The attacker wants to steal this model so that their company can gain a competitive advantage and start using it for their own purposes.

The attacker executed this attack by reverse engineering the company’s machine learning model, either by disassembling the binary code or by accessing the model’s training data and algorithm. Once the attacker has reverse engineered the model, they can use this information to recreate the model and start using it for their own purposes. This can result in significant financial loss for the original company, as well as damage to their reputation.

ML06:2023 AI Supply Chain Attack

In AI Supply Chain Attacks threat actors target the supply chain of ML models. This category is broad and important, as software supply chain in Machine Learning includes even more elements than in the case of classic software. It consists of specific elements such as MLOps platforms, data management platforms, model management software, model hubs and other specialized types of software that enable ML engineers to effectively test and deploy software.

Example attack:

The attacker, that wants to compromise a Machine Learning project, knows that the project relies on several open-source packages and libraries.

During the attack, they modify the code of one of the packages that the project relies on, such as NumPy or Scikit-learn. The modified version of the package is then uploaded to a public repository, such as PyPI, making it available for others to download and use. When the victim organization downloads and installs the package, the malicious code is also installed and can be used to compromise the project.

This type of attack can be particularly dangerous as it can go unnoticed for a long time, since the victim may not realize that the package they are using has been compromised. The attacker’s malicious code can be used to steal sensitive information, modify results, or lead the machine learning model to return erroneous predictions.

The attacker targets a Machine Learning project that relies on several open-source packages and libraries.

ML07:2023 Transfer Learning Attack

Transfer Learning Attacks occur when an attacker trains a model on one task and then fine-tunes it on another task to cause it to behave in an undesirable way.

Example attack:

An attacker trains a machine learning model on a malicious dataset that contains manipulated images of faces. The attacker wants to target a face recognition system used by a security firm for identity verification.

The attacker then transfers the model’s knowledge to the target face recognition system. The target system starts using the attacker’s manipulated model for identity verification.

As a result, the face recognition system starts making incorrect predictions, allowing the attacker to bypass the security and gain access to sensitive information. For example, the attacker could use a manipulated image of themselves and the system would identify them as a legitimate user.

ML08:2023 Model Skewing Attack

Model Skewing Attacks occur when an attacker manipulates the distribution of the training data to cause the model to behave in an undesirable way.

Example attack:

A financial institution is using a machine learning model to predict the creditworthiness of loan applicants, and the model’s predictions are integrated into their loan approval process. An attacker wants to increase their chances of getting a loan approved, so they manipulate the feedback loop in the MLOps system. The attacker provides fake feedback data to the system, indicating that high-risk applicants have been approved for loans in the past, and this feedback is used to update the model’s training data. As a result, the model’s predictions are skewed towards low-risk applicants, and the attacker’s chances of getting a loan approved are significantly increased.

This type of attack can compromise the accuracy and fairness of the model, leading to unintended consequences and potential harm to the financial institution and its customers.

ML09:2023 Output Integrity Attack

In Output Integrity Attack scenario, an attacker aims to modify or manipulate the output of a machine learning model in order to change its behavior or cause harm to the system it is used in.

Example attack:

An attacker has gained access to the output of a machine learning model that is being used to diagnose diseases in a hospital. The attacker modifies the output of the model, making it provide incorrect diagnoses for patients. As a result, patients are given incorrect treatments, leading to further harm and potentially even death.

ML10:2023 Model Poisoning Attack

Model Poisoning Attacks occur when an attacker manipulates the model’s parameters to cause it to behave in an undesirable way.

Example attack:

Consider a scenario where a bank is using a machine learning model to identify handwritten characters on cheques to automate their clearing process. The model has been trained on a large dataset of handwritten characters, and it has been designed to accurately identify the characters based on specific parameters such as size, shape, slant, and spacing.

An attacker who wants to poison a machine learning model may manipulate the parameters of the model by altering the images in the training dataset or directly modifying the parameters in the model. This can result in the model being reprogrammed to identify characters differently. For example, the attacker could change the parameters so that the model identifies the character “5” as the character “2”, leading to incorrect amounts being processed.

The attacker can exploit this vulnerability by introducing forged cheques into the clearing process, which the model will process as valid due to the manipulated parameters. This can result in significant financial loss to the bank.

Everyone know or should know MITRE ATT&CK & DEFENCE Frameworks.

Ladies and gentlemen let me introduce MITRE ATLAS.

MITRE ATLAS

MITRE’s ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) is a globally accessible, living knowledge base of adversary tactics and techniques against Al-enabled systems based on real-world attack observations and realistic demonstrations from Al red teams and security groups.

The ATLAS Matrix shows the progression of tactics used in attacks as columns from left to right, with ML techniques belonging to each tactic below & indicates an adaption from ATT&CK.

Mitigations represent security concepts and classes of technologies that can be used to prevent a technique or sub-technique from being successfully executed.

There are also case studies available or you build your own.

ATLAS Navigator can be found here which shows ATT&CK Enterprise techniques.

NIST AI & AI Security

NIST (National Institute of Standards and Technology – U.S Department of Commerce) have also made publications of AI and have started its Generative AI evaluation program.

NIST aims to cultivate trust in the design, development, use and governance of Artificial Intelligence (AI) technologies and systems in ways that enhance safety and security and improve quality of life. NIST focuses on improving measurement science, technology, standards and related tools — including evaluation and data.

With AI and Machine Learning (ML) changing how society addresses challenges and opportunities, the trustworthiness of AI technologies is critical. Trustworthy AI systems are those demonstrated to be valid and reliable; safe, secure and resilient; accountable and transparent; explainable and interpretable; privacy-enhanced; and fair with harmful bias managed. The agency’s AI goals and activities are driven by its statutory mandates, Presidential Executive Orders and policies, and the needs expressed by U.S. industry, the global research community, other federal agencies,and civil society.

NIST’s AI goals include:

Conduct fundamental research to advance trustworthy AI technologies.
Apply AI research and innovation across the NIST Laboratory Programs.
Establish benchmarks, data and metrics to evaluate AI technologies.
Lead and participate in development of technical AI standards.
Contribute technical expertise to discussions and development of AI policies.

NIST AI Security

“AI risks should not be considered in isolation. Treating AI risks along with other critical risks, such as cybersecurity, will yield a more integrated outcome and organizational efficiencies. Some risks related to AI systems are common across other types of software development and deployment. Overlapping risks include security concerns related to the confidentiality, integrity, and availability of the system and its training and output data – along with the general security of the underlying software and hardware for AI systems. Cybersecurity risk management considerations and approaches are applicable in the design, development, deployment, evaluation, and use of AI systems”. Read more here.

Microsoft AI Security - A principled approach to detecting and blocking threat actors

“In line with Microsoft’s leadership across AI and cybersecurity, today we are announcing principles shaping Microsoft’s policy and actions mitigating the risks associated with the use of our AI tools and APIs by nation-state advanced persistent threats (APTs), advanced persistent manipulators (APMs), and cybercriminal syndicates we track.

These principles include:

Identification and action against malicious threat actors’ use: Upon detection of the use of any Microsoft AI application programming interfaces (APIs), services, or systems by an identified malicious threat actor, including nation-state APT or APM, or the cybercrime syndicates we track, Microsoft will take appropriate action to disrupt their activities, such as disabling the accounts used, terminating services, or limiting access to resources.

Notification to other AI service providers: When we detect a threat actor’s use of another service provider’s AI, AI APIs, services, and/or systems, Microsoft will promptly notify the service provider and share relevant data. This enables the service provider to independently verify our findings and take action in accordance with their own policies.

Collaboration with other stakeholders: Microsoft will collaborate with other stakeholders to regularly exchange information about detected threat actors’ use of AI. This collaboration aims to promote collective, consistent, and effective responses to ecosystem-wide risks.

Transparency: As part of our ongoing efforts to advance responsible use of AI, Microsoft will inform the public and stakeholders about actions taken under these threat actor principles, including the nature and extent of threat actors’ use of AI detected within our systems and the measures taken against them, as appropriate.”

Read more from Microsoft Threat Intelligence publications.

SANS AI Security

The potential of GenAI extends beyond augmenting security measures; it also introduces complex challenges. Cybercriminals are using GenAI to create more convincing phishing emails, automate code generation for malware, and even mimic behavioral patterns to bypass biometric security systems. Recognizing these threats is crucial for developing a responsive cybersecurity strategy that integrates GenAI as an essential component of the cybersecurity curriculum.

Strategies for Upskilling and Reskilling

Tailored Training Programs: It is critical to develop training that covers both the defensive and offensive uses of GenAI. Such training programs should include real-world simulations where cybersecurity teams must counteract GenAI-driven attacks, providing hands-on experience in a controlled environment.
Collaboration with Academic Institutions and Tech Companies: By partnering with academia and technology firms, organizations can access the latest research and developments in GenAI, including those used maliciously. These collaborations can enrich training programs and ensure that they are as current as possible.
Certification and Continuous Education: Cybersecurity professionals should be encouraged to pursue advanced certifications that focus on AI and cybersecurity. The SANS Institute and Global Information Assurance Certification (GIAC) are reputable, world-class providers offering specialized training and certifications in areas critical to defending against and leveraging AI technologies. Continuous education through seminars, workshops, and courses on the latest GenAI developments is essential. Take a look at what resources are available in SANS training.
Mentorship and Peer Learning: Implement a mentorship program that focuses on GenAI in cybersecurity. Experienced professionals who understand how to implement and counteract GenAI technologies can provide invaluable insights to less experienced staff, accelerating their learning curve.

End words

This was a bit longer than I expected it to be so not so short as I promised. But above are some industry standards for these AI security frameworks and standards to use.

The AI industry are booming and we all at least security people need to be awake.

It seems this AI Security is also be formed as a Serie.

So follow me for more and subscribe my blog.

Share on Social Media

Discover more from Jussi Metso

Subscribe to get the latest posts sent to your email.

AI SECURITY

Comments (3)

tanktaeyang says:

October 8, 2024 at 1:34 pm

Can Share About Microsoft AI Red Teaming?
Jussi Metso says:

October 17, 2024 at 9:31 am

hello, and thank you for your question. Microsoft AI Red Teaming will be my next subject but for while you can start from here: https://learn.microsoft.com/en-us/security/ai-red-team/

Comments are closed.

Table of Contents

Introduction

OWASP & TOP TEN

Large Language Model Applications

Machine Learning Security

ML01:2023 Input Manipulation Attack

ML02:2023 Data Poisoning Attack

ML03:2023 Model Inversion Attack

ML04:2023 Membership Inference Attack

ML05:2023 Model Theft Attack

ML06:2023 AI Supply Chain Attack

ML07:2023 Transfer Learning Attack

ML08:2023 Model Skewing Attack

ML09:2023 Output Integrity Attack

ML10:2023 Model Poisoning Attack

MITRE ATLAS

NIST AI & AI Security

NIST AI Security

Microsoft AI Security - A principled approach to detecting and blocking threat actors

SANS AI Security

Strategies for Upskilling and Reskilling

End words

Discover more from Jussi Metso

Related Posts

Comments (3)