For teams working on transformational, data-intensive initiatives in natural language processing (NLP) and artificial intelligence (AI), data security and governance issues can have a major impact on the success of your projects.
In our recent survey of NLP practitioners, we asked respondents to identify the top three challenges based on their level of maturity. Organizations evaluating NLP use cases or in early testing identified data security and governance as their biggest challenge (64%).
So what is causing the security and management issues at this stage? And how can companies prepare to make this part of the process as smooth as possible?
Gather the necessary data and sources
In many organizations, the data is usually not owned by the team for which the NLP project is being built. As a result, it takes a tremendous amount of time to collect the right data to generate training and test sets for implementation and testing.
To avoid this problem, we recommend that teams start by creating a responsibility assignment (RACI) matrix for your project. This should be used to clearly define the responsibilities of each department in obtaining the necessary data and resources.
Here, if you have a Project Champion, and we highly recommend that you do, who can work with the right department heads who are critical to the NLP project. This allows for easy accountability from the team that owns the data, controls the quality of the data, and ultimately delivers the data to the NLP project team.
Identify security and management issues for all sources
Next, identify security and management issues for each data source or collection. This should include information about the origin of your data, the nature of the information it contains, and how you plan to use the data. For example:
- Does it include personally identifiable information or other privacy implications?
- Does it contain images as well as text?
- Is it derived from data you own or are held by third-party sources?
- Is it structured, unstructured, or both?
This will be important to ensure that your activities are consistent with your internal data collection and privacy policies, Responsible AI or any other frameworks or policies you have in place.
Best practices for data security
Regardless of your level of AI maturity, the problems with data will only continue to grow, especially when it comes to choosing AI models and algorithms to use. The data privacy and copyright infringement concerns raised by data protection authorities around the world about ChatGPT are just one example of why it’s important to understand what your data and machine learning models contain.
In our recent survey of business, technical and academic natural language AI experts, data privacy and security were the top concerns (73%) for enterprise adoption of large language models and generative AI.
Pending regulation of ESG reporting in Europe and other parts of the world is another factor. the data and technology used here will come under greater scrutiny as part of ESG measurement.
With this in mind, here are some general considerations to keep in mind when designing and collecting data for your NLP projects:
- Will you be hosting or will your data be in the cloud?
- Is your data protected from unauthorized access?
- Does your approach meet internal guidelines?
- Is your approach GDPR compliant?
- Are your vendors certified (eg SOC 2, Type 2; ISO/IEC 27001)
Privacy and Legality
- Does your data include personally identifiable information (PII)?
- Does your training data include anything that might be considered copyrightable?
- Do you have a procedure for removing PII or copyrighted material from your data collection?
- Have you allowed people to be loopy during and after the learning process?
- Do you know where your data comes from?
- Do you know how your data has been tagged?
- Can you explain how your model achieved its results?
- Does your model have safeguards to prevent algorithmic bias?
For additional considerations to ensure the success of your NLP projects, read the first post in this series, AI maturity. assessing your readiness for NLP program successto discover how to tackle the top adoption challenges and download our complete guide; The Roadmap to NLP Success.
Get the ultimate guide to NLP project success
Our experience at expert.ai has taught us a few things about what teams need to be successful in this important, transformative investment.
Download the guide