How to Prevent Leaking of Sensitive Data When Using ChatGPT

How to Prevent Leaking of Sensitive Data When Using ChatGPT

ChatGPT has taken the world by storm. Users can enter prompts and ChatGPT will respond with humanlike conversational dialogue. Ask it for a riddle or a joke and it will provide one. Ask it to give a recipe based on limited ingredients and it will offer a meal. It can even help a student create a study plan for a particular topic.

It also has many business uses:

  • Generate summaries of text such as meeting notes or an article
  • Provide outlines for a piece of content you want to write
  • Brainstorm ideas such as titles and blog ideas
  • Write and debug code
  • Create responsive chatbots
  • Enhance research
  • Much more

The potential of ChatGPT and generative AI in general is limitless. In fact, a McKinsey report found that “generative AI and other technologies have the potential to automate work activities that absorb 60 to 70 percent of employees’ time today.” That same report also states that “generative AI could add the equivalent of $2.6 trillion to $4.4 trillion annually.” It is radically changing the way we look at work…and life.

That being said, in some ways, we may have put the cart before the horse. In May, 2023 Bloomberg reported that Samsung had banned AI use for their employees after discovering a ChatGPT data leak. An employee had accidentally leaked sensitive internal source code when they uploaded it to ChatGPT.

Since ChatGPT stores your inputs and learns from it, one inherent risk with the use of ChatGPT is potential exposure of sensitive data. However, this is a risk that is always prevalent in cybersecurity. “When it comes to security, the risks of ChatGPT and generative AI are the same risks we always battle,” according to vCom security specialist Darrin Good. To minimize risk when using ChatGPT, it’s important to bring security in at the beginning and ensure that sensitive data is not being input into ChatGPT in the first place.

One way to do that is to block the use of ChatGPT on all company devices. However, doing so would deprive your organization of the benefits that go along with using ChatGPT. Another solution is to restrict access to sensitive information. Users can still utilize ChatGPT, but the risk of exposing sensitive data is heavily reduced if they don’t have access to it in the first place. To do this, we recommend implementing what is known as zero trust network access security framework.

Zero Trust Network Access

Zero trust network access (ZTNA) “assumes all users, systems, and processes are potentially malicious—and thus require strict access control, regular monitoring, and a variety of preventative strategies.” ZTNA is complicated (learn more about it here) but for our purposes, there are two major components:

  1. Users must go through strict identity verification before being trusted with information.
  2. Once verified, users are only given access to information that is relevant to their role and function. In other words, information is only given to someone if necessary.

Multi-Factor Authentication (MFA)

The first part is typically done through Multi-Factor Authentication (MFA), which requires users to verify themselves in two or more ways to sign-in. The first is the standard way of signing in: a username and password. The second usually involves inputting a unique code that they receive by email, text, phone call, or authenticator software. This ensures that only the person with both forms of verification can log in to that specific account, which is relevant to the next part.

Role-Based Access Control (RBAC)

The second component is known as Role-based Access Control, or RBAC, the idea that users only have access rights to “what are strictly required to do their jobs.“ These roles are defined by the organization and can be as  generic as admin, contributor, and guest, or as specific as “software engineering”  or “finance” (meaning only people in those roles can access the data.) Implementing RBAC is a bit tricky since you need to determine which information should be provided to which users.

Tools for ZTNA

To implement ZTNA, you must use tools that offer features such as Multi-Factor Authentication (MFA) and RBAC, so make sure you do your research. In this case, look for tools that store data and information. Google Workspace and Microsoft 365 are examples of tools that have both data storage and ZTNA features.

Applying ZTNA to ChatGPT

Applying ZTNA requires understanding which information is required, and which users need access to that information. With regards to ChatGPT, the information required must not contain any private information. To determine the relevant information and users, it helps to take a problem-focused approach:

  1. Determine what problem is trying to be solved.
  2. Figure out which information is needed to solve the problem.
  3. Provide information only to the people needed to solve the problem.
  4. Revoke access to the information once the problem is solved and the data is no longer required.

1. Determine what problem is trying to be solved.

Information should only be provided as needed to solve a problem. Figure out the business needs of your organization first, and which ones, if any, need ChatGPT to be solved. If the problem requires private information, ChatGPT may not be the best choice to solve the problem.

2. Figure out which information is needed to be able to solve the problem.

What information is needed to solve the problem at hand? Only information pertinent to the issue should be accessible. If the problem requires the use of ChatGPT, ensure the information is void of anything that you don’t want exposed. For example, if you want to use ChatGPT to debug code, only code without proprietary information can be provided for that purpose. If code has proprietary information, you will need to come up with a different solution to debugging that code.

3. Provide information only to the people needed to solve the problem.

This is where RBAC comes into play. Once the relevant information is established, there is no need for everyone in the organization to have access to that information. The more people with access, the more likely information gets leaked. Determine which people are needed to solve the problem and give them access to the relevant information.

For example, people who oversee debugging code using ChatGPT would only get access to code without proprietary information so there is no chance of them accidentally inputting it into ChatGPT

With this specific scenario, you might potentially need to identify two different roles-based controls: one for code debuggers with proprietary information, and the second for code debuggers without proprietary information. You would allow ChatGPT to be used only on devices of users with access to non-proprietary information, and block usage on those with proprietary information. If that’s not possible and you only have space for one role, make sure to clarify which code can be used with ChatGPT and which code cannot. If you think this is risky (and it is), it might be best to not use ChatGPT to debug code at all.

4. Revoke access to the information once the problem is solved.

When all is said and done, best practice is to limit data access when it is no longer needed. A huge part of ZTNA is regular access monitoring and updating. Since the goal is to mitigate risk as much as possible, any amount of accessibility to data, however small, leaves room for a data leakage. Even if data input into ChatGPT is already void of private information, you want to provide attackers as small a pathway as possible to your private data. Once the problem is solved, remove data access from everyone who no longer needs it.

The Future of AI Tools

ChatGPT and other generative AI tools, represent a revolutionary advancement in technology, offering organizations unprecedented capabilities to automate tasks, create content, and unlock new opportunities. The potential economic benefits are substantial, and generative AI is poised to reshape industries and how we work. Yet, with great power comes great responsibility, and the risks associated with generative AI are significant. The exposure of sensitive data is one of the biggest risks, and protecting against it is the first step toward safeguarding your organization. Zero trust network access can address many risks (more than those we covered.) While implementing ZTNA seems to limit the use cases of ChatGPT, it cannot be emphasized enough how crucial it is to protect your organization’s data.