Guide

Preparing Your Data to Take Full Advantage of Generative AI

Generative AI applications open up a host of new and innovative ways for organizations to monetize their data. A key question is how companies should prepare their data to seize these opportunities.

March 08, 2023

ChatGPT and other generative artificial intelligence applications have entered the chat – and are poised to affect nearly every facet of your business. If, that is, companies can prepare their own data assets to successfully put these technologies to work.

If businesses didn’t feel compelled to get their data house in order for accommodating advanced analytics applications and data science, then the possibilities of using generative AI combined with their own data assets certainly should now.

These AI applications can write, document and review code. They can generate marketing and sales content and plans, new training procedures, social media posts, and blogs. Evaluate service agreements, conduct data analyses, provide next-level automated customer support? Yes. Scientists have even used ChatGPT to develop new protein sequences, and pharmaceutical companies are using it to accelerate drug discovery.

Executives have every reason to be excited about the opportunities these applications can bring. Yet getting your data ready for generative AI comes with a slew of technical, legal, privacy, and strategic challenges. Here’s where business leaders should start.

Chapter 1: Data Management Implications for AI

Ensuring the effective and ethical and use of generative AI may require many organizations to have a renewed focus and investment on certain data management capabilities. These include:

Improving Data Availability. In particular, “dark data”--information or content used for a single operational purpose then often forgotten about--can generate new value streams when ingested by generative AI applications. Yet much of this kind of data remains in archives or in off-line storage. To take full advantage of generative AI, they will need to make available and well-organized information such as emails, contracts, customer transaction and call-center records, text documents, images, production data, legal documents, etc. And the more diverse the better, as AI applications specialize in identifying patterns and synergies among different data sources.
Enacting Data Governance. Establishing guardrails and controls for how and when different kinds of data are to be used by AI applications is of utmost importance, particularly for ensuring compliance, privacy and reducing bias. AI models tend to inherit and perpetuate biases from the data they are trained on.
Ensuring Data Quality. The old “garbage-in-garbage-out” adage can be even more apropos with generative AI than with traditional operational or analytic technologies because of the deep inferences that are made and the current challenges with factual validity that the technology suffers from. Organizations will have to double-down on data quality to ensure accurate, complete, timely, and consistent data for feeding these type of AI applications.
Adding Data Annotations. Generative AI applications rely, not just on the data itself, but also much more heavily on metadata than other forms of processing. This means your organzation may have to raise its metadata game significantly, to ensure appropriate data tags, labels, provenance, lineage, quality indicators, and other “data about the the data.”
Curating New Data Sources. Some of the most innovative uses of generative AI may require data your organization doesn’t currently have curated such as social media content, web content, certain market research or competitive data, or any of thousands of data sets offered by data brokers and aggregators. While most organizations have a department dedicated to procuring office supplies, most do not have one for procuring “data supplies.” Now is the time to form one.
Validating AI-Generated Content. In addition to the data collected or generated during the normal course of business operations, organizations will start generating and collecting information produced by generative AI applications. This information will need special governance to ensure its validation as generative AI applications are known to play fast and loose with facts, such as a recent example in which Chat GPT responded that “cow eggs are larger than chicken eggs.” Organizations should develop policies and procedures as to how and when AI generated content (e.g. facts, lists, writings, code, analyses) is appropriate to use. This will require additional types of governance controls and monitoring, including tagging information as being AI-generated and whether it has been reviewed and validated or not.

Chapter 2: Using Use Cases To Focus on Which Data to Prepare

These new AI applications when given access to your data will be able to provide a variety of benefits. Conceiving these ideas in parallel with an understanding of how well prepared the underlying data is will help prioritize use cases.

For example, AI language models can be used to automate customer service inquiries, such as answering frequently asked questions, handling simple complaints, and resolving issues more quickly and efficiently--even in the form of chatbots. Similarly, these applications will be able to to personalize product recommendations, advertisements, and other content, or analyze customer feedback and sentiment, allowing companies to gain insight into customer attitudes and preferences. But these uses will require access to quality customer, call center, product, and likely third-party data sources you have curated.

Other high-value uses such as content creation, fraud detection, employee or customer training, strategic or operational decision-making, and compliance monitoring will require access to specific and varied data sources related to each.

Chapter 3: Focus on Data Security and Accessibility, Learn the Legalities

Organizations have to balance data accessibility and security. Encryption, access controls, firewalls, and regular data backups should be used to keep data secure. At the same time, data should be stored in a centralized location where it can be easily accessed. For additional flexibility, implement cloud-based storage and data management tools.

GPT-3 is ready for proprietary use, which means – according to Open AI’s terms of use – that you own all inputs (prompts) and the right to the outputs. If you don’t opt out, Open AI can still use both inputs and outputs to improve its model. Understanding the legal issues in this respect will be critical.

That said, it’s still early days, and ethical, copyright, and privacy issues have yet to be fully worked out. As such, business leaders should think through what data they need to redact, mask, and/or synthesize before running it through generative AI programs. Customers’ personal information, healthcare data, and anything that would violate client confidentiality agreements should be evaluated before inputting it into a generative AI system.

Business leaders should also prepare for legal challenges and regulatory guidelines when preparing data controls and processes. For instance, the European Commission and National Institute for Standards and Technology have released useful AI deployment frameworks and guidance. Yet we’re far from any binding regulations, and most challenges (at least in the U.S.) will likely play out in the courts.

Chapter 4: Extend Your Data Strategy

None of the above will be effective without a clear, transparent, and considered data strategy that goes beyond your current strategy for generating, collecting, managing and using data for traditional operational or analytical purposes. Ask yourself:

Has my organization identified clear objectives, goals, and use cases (e.g., reducing costs, improving customer experience) for generative AI?
Are we collecting and making available the right type of data for these applications?
Does my organization have the right infrastructure to make these goals a reality? For instance, you might need to update existing databases and tweak APIs to ensure smooth integration with current technology systems.
Do my employees have the requisite skills to leverage generative AI?
Who will be responsible for implementing AI-related decisions? Consider a cross-functional team including data scientists, legal counsel, and leaders of various business functions.
What metrics will we use to measure success (e.g., customer retention, cost savings, risk management, customer satisfaction, new revenue streams, etc.)?

Generative AI is here to stay – and evolve exponentially, creating a new competitive battlefield for businesses around the world. As organizations start to leverage these applications and their own data to improve everything from customer support to IT to research and development, they should be enthusiastic about new opportunities, while taking the necessary preparations to drive success.

Preparing Your Data to Take Full Advantage of Generative AI

Chapter 1: Data Management Implications for AI

Chapter 2: Using Use Cases To Focus on Which Data to Prepare

Chapter 3: Focus on Data Security and Accessibility, Learn the Legalities

Chapter 4: Extend Your Data Strategy

Let's Connect

Be Part of Our Team

Stay in the Know