How To Train ChatGPT Using Your Business Data in 2024

Transform your customer service, sales, and operations using a ChatGPT version explicitly trained on your business data.

By: R. Paulo Delgado
September 25, 2023
13 minute reading
train chatgpt

Many businesses are intrigued by the idea of training ChatGPT on their data, envisioning a tailored artificial intelligence that understands the nuances of their industry and internal jargon. However, the reality is you can’t train ChatGPT yourself. Only OpenAI can fine-tune ChatGPT for you, and that option is often out of reach for most companies.

But don’t lose hope—while full-scale customization may be off the table, there are innovative alternatives. Options like RAG (retrieval-augmented generation), prompt chaining, and direct access to the ChatGPT API allow you to tweak ChatGPT to operate in a way that feels like it’s been trained on your own data, opening up a whole new realm of possibilities for your specific needs. 

Why train ChatGPT on your custom data?

ChatGPT was trained on a wide range of data, which makes it unsuitable to use in scenarios where answers should come from a specific dataset. For example, a custom chatbot for your website should offer answers related only to your product and service. 

By tweaking ChatGPT to focus on your company’s dataset, you can harness the power of ChatGPT’s generative capabilities while keeping its responses confined to your custom knowledge base. A sophisticated AI chatbot can save your business hours every day by handling most customer interactions and support requests without having to engage a human.

Another use case is when using ChatGPT internally, such as offering new employee onboarding or answering employee questions about your business. A version of ChatGPT custom-trained on your company’s web content, PDF files, CSVs, and other documents allows users to find data rapidly through a familiar chat interface. 

Busting the myths: How is ChatGPT trained (for real)?

Unfortunately, many people and companies loosely use the phrase “train OpenAI’s ChatGPT” to attract clicks and attention. In its strictest sense, you can’t “train” ChatGPT because it has already been pre-trained.

Companies that produce large language models (LLMs) complete the training process before releasing them. People can fine-tune some of the open-source pre-trained models, but most use the word fine-tuning inaccurately.

Fine-tuning is essentially what most people think of as training ChatGPT. It’s an extensive and expensive process that requires your company to work with OpenAI during the fine-tuning process. Currently, fine-tuning is only available for GPT-3.5 Turbo, although OpenAI has announced that support for fine-tuning GPT-4—OpenAI’s latest version—should be ready toward the end of the year. 

Any claims you read on the internet about training ChatGPT usually refer to:

  • Retrieval-augmented generation (RAG)

  • Better prompt-engineering 

  • Prompt chaining 

RAG refers to various methods of augmenting a prompt or response to inform ChatGPT’s answers. It could mean programmatically injecting custom instructions into every prompt, such as “Keep all answers short” or “Relate all answers to Fiverr.” It could also mean building an index of your business’s data and getting ChatGPT to always refer to this index for every response. 

Important: ChatGPT doesn’t keep the context of chats stored in your browser. Some people believe they’re training ChatGPT when they return to a saved chat and continue it. For privacy reasons, this isn’t the case. ChatGPT can only learn based on the conversation from the current session. 

“There are currently four approaches to training an LLM on business data,” says Michael King, CMO of AIPRM, an AI prompt marketplace for ChatGPT, DALL-E, and Midjourney. “The most expensive and unwieldy would be training your own language model from scratch on your business data. Companies like Bloomberg are doing this." 

The second would be to fine-tune a foundational model like GPT-3.5 Turbo. This is a well-worn path that people have been doing since GPT-2. The third would be to use a retrieval-augmented generation (RAG) approach, wherein documents or knowledge graphs are provided to a language model at runtime to inform the response. 

"This approach is similar to how ChatGPT’s custom instructions and AIPRM’s custom profiles work," continues King. "The fourth is that there are some tools out there that let you upload files, and they build custom indexes for you using libraries like LangChain or Llama Index that are provided to the language model to guide its responses. That’s still a RAG approach, but it’s done for you rather than requiring you to deal with any code.”

Better prompt-engineering

The first step for getting better results with AI models is to improve the quality of the prompt. Talking to ChatGPT the right way is so important that OpenAI has written extensive instructions on properly structuring an AI prompt. 

Behind all the magic of AI are algorithms driven by mathematics and sophisticated computer software and hardware. Although this combination has created something of a natural language processing (NLP) revolution, AI can’t think like humans do. 

Understanding this caveat, you can start thinking like a machine and generate better prompts to elicit more specific results. This skill has become so valuable that people are even hiring AI prompt engineers who’ve spent countless hours learning how to write AI prompts

To get the best results from the following ChatGPT prompt engineering tips, you must use ChatGPT-4, which requires a subscription to ChatGPT Plus.

1. Write clear instructions. 

Be specific in what you ask ChatGPT to do. Define the tone and style, the persona, and the level of technicality. Delimiters and furnishing your own examples can help describe the prompt more clearly. 

Poor example: Write an article about cats. 

Good example: Write a 500-word article about cats aimed at potential cat toy purchasers at the awareness stage of the sales funnel. The article’s purpose is informative, but will casually reference cat products that Acme Cat Store sells. The target market for the article is cat owners and cat lovers. 

2. Use reference texts to help avoid hallucinatory answers. 

Providing reference texts helps ChatGPT avoid hallucinations, especially if you’re asking for links, quotes, or lesser-known topics. The problem of ChatGPT hallucination is particularly true when asking for data that only became available after 2021, because of ChatGPT’s knowledge cutoff date.

Poor example:Write a social media post about Elon Musk and Twitter. 

Good example:Consider the article between the ### delimiters and write a LinkedIn post about Elon Musk and Twitter.

### insert relevant news article or similar here ###

3. Break complex tasks into more straightforward tasks. 

Poor example:Write a Python script that reads all the images in a folder and organizes them. 

Good example:

Write a Python script that does the following:

  • Open a dialog box to select a folder. We’ll call this the Root Folder. 

  • If no folder is selected, exit the script.

  • Loop through the files in the folder and all subfolders, looking for files with one of the following non-case-sensitive file extensions: JPEG, JPG, PNG, GIF, TIFF, or BMP.

  • When you find a file, read its Date Created property and save its year and month to a variable called “subfolder_name” in the format: “YYYY.MM.”

  • We want to move the file to a subfolder of the Root Folder with the name stored in “subfolder_name.”

  • If the file is already in the subfolder titled “subfolder_name,” move to the following file. 

  • If not, look for a folder named “subfolder_name.”

  • If none exists, create one. 

  • Move the file to the subfolder with the name stored “subfolder_name.”

  • Move to the next file. 

  • When finished, display a helpful message to the user. 

Include error handling in the script. 

ChatGPT probably wouldn’t need this much explanation for this particular task, but the script could be made more complex, such as by determining what each image is about through AI. 

You can break even more complex tasks into different prompts, getting ChatGPT to achieve the goal of each step before moving to the next. This is called prompt chaining

4. Give ChatGPT more time, such as by asking it to reevaluate its answers. 

Like humans, ChatGPT needs time to process data and tends to give more accurate answers if you slow it down. One way to do this is to ask it to show all its steps to calculate a solution. 

5. Leverage external tools, such as code execution engines or the AI chatbot solution we describe below. 

As a language model, ChatGPT lacks certain functionality. For example, it doesn’t have inherent calculation capabilities. When you give it highly complex mathematical problems to solve, it often will get them wrong. The best option is to connect ChatGPT to an external tool to implement functionality that ChatGPT doesn’t have built-in. 

To connect ChatGPT to an external service, you can buy AI development services from Fiverr freelancers to help you. 

6. Test, test, test, and use a systematic method to do it. 

Finally, you must test your prompt engineering results and compare them to earlier ones. Do this systematically, noting which prompts worked better than others. You could keep a spreadsheet that records tips for better prompting. 

Another option is to buy AI development services from Fiverr freelancers to create an AI app where you can tag successful prompts and store them in a library. Whenever you need a successful prompt, just select it from a list. 

Retrieval-augmented generation (RAG)—restricting ChatGPT to company-specific data

Retrieval-augmented generation (RAG) is a method that combines the capabilities of two separate models: a text retriever and a text generator. It first searches a database to find relevant information (retrieval) and then uses that information to generate a response (generation). It’s different from fine-tuning but is often mistaken for it. 

RAG can enhance ChatGPT’s performance by using external data to provide more precise and informative answers.

Warning: This is an advanced method of tweaking ChatGPT’s responses, but it’s also the most effective. If you’re unfamiliar with programming, we recommend buying AI integrationor Python programming services from Fiverr freelancers to help you. 

If you’re familiar with Python, you can easily navigate through the following step-by-step guide. 

Step 1: Collect your data

For this example, we’re going to build an index of company-specific PDFs and then create a web-based chat interface that lets us use ChatGPT on a local URL. With simple modifications to the Python code below, you can also train ChatGPT on any other type of data.

We saved several guides as PDFs from Fiverr’s business, marketing, and branding guides and stored them in a folder, as shown in the screenshot below: 

ChatGPT training data as PDF files.

ChatGPT training data as PDF files.

Step 2: Get an OpenAI API key

This solution will create a standalone ChatGPT version using your OpenAI API. We’ll still query ChatGPT, but it will refer to the text in the above folder to restrict its answers. 

Visit OpenAI’s API keys page and click “Create new secret key.”

Generate a new secret key.

Generate a new secret key.

Copy this key and save it somewhere safe, because OpenAI will only show it to you once, for security reasons.

OpenAI offers an initial free tier, but you’ll eventually need to pay for monthly credits to use this solution. The pricing for the OpenAI API isn’t connected with ChatGPT Plus, and OpenAI bills you for the API based on usage. 

Step 3: Install Python

Install Python on your machine if you don’t already have it. Our example is OS-independent, so it doesn’t matter if you’re using Windows, Linux, or a Mac. 

Step 4: Install the necessary Python libraries

You need to install the following libraries:

  • OpenAI library, to give us access to ChatGPT through its API. 

  • Gradio library, a framework for quickly building machine-learning web apps and interfaces. 

  • Langchain library, a suite of powerful tools for building LLM applications. 

  • Llama Index, formerly called GPT Index. It provides connectors so your LLM app can ingest data from sources such as SQL databases, PDFs, and CSV files. 

  • Pypdf and PyCryptodome libraries, Python libraries for working with PDF files. 

To install these libraries, run the following command: 

pip install openai langchain llamaindex pypdf PyCryptodome gradio

Llama Index is available only in Python, so this solution won’t work if you want to ingest company data using another programming language such as Java or C#. But it’s indeed possible to create similar solutions in languages other than Python. An open-source C# ChatGPT desktop client already exists, and you could buy freelancer AI development services from Fiverr to integrate a training solution into that. 

Step 5: Write the Python code

We created some rudimentary Python code to index your local data, based on this outdated code that no longer works

You can find our updated version of the code here

Python code to train ChatGPT on your business data.

Python code to train ChatGPT on your business data.

The code above is rudimentary but serves the purpose. Under the hood, Llama indexes our content and stores it in a “vector index,” which is best suited for comparison searches.

An index is a mathematical representation of your data, allowing Llama Index to query ChatGPT with large chunks of your data embedded into the prompt without hitting ChatGPT’s prompt limitations. Llama saves these indexes as JSON (JavaScript Object Notation) files in a folder you specify. 

Llama Index provides other types of indexes, such as keyword and list indexes, and the topic of which one is best can quickly become complicated. Each index type is better suited to specific tasks, such as building chatbots that answer FAQs or chatbots that have a deeper understanding of your company’s docs.

After the code indexes the files in your selected folder, it opens a local URL where you can query it. If you want the URL to be accessible on the internet, replace the last line of the code with:

iface.launch(share=True)

The Gradio library takes care of the interface and is fully themable. Consider buying website design services from Fiverr freelancers to build a custom theme that matches your brand.

Step 6: Test and refine

The final step is to test and refine the AI chatbot. 

We tested our demo AI chatbot by asking it how to create a website. It replied with the precise steps from one of our training documents, How to Build a Website from Scratch

Results of trained ChatGPT version.

Results of trained ChatGPT version.

We then asked it how to make money gaming, and our trained ChatGPT model answered precisely according to its training data:

Results of trained ChatGPT version.

Results of trained ChatGPT version.

“You have to thoughtfully curate the content that you provide, says King. “You also still have to create effective prompts to generate the output that you’re looking for. I’ve seen people throw all their files or web pages into a custom index and do a half-baked prompt, get bad output, and conclude that the tech isn’t good or it doesn’t work. We generally recommend that you get clear on the goals for the content, and look for places in your existing workflows and what tools you’re already using that have incorporated generative AI. Then, selectively curate the content and build out and test the prompts to ensure they are giving you the output consistency you’re looking for.”

Building a more advanced solution with the above script

Using Gradio, writing code so the model accepts images as prompts is also possible, letting you create AI generators for your business. With some simple modifications, you could create an AI app to edit videos, generate music with AI, or use AI to write song lyrics. 

Finding the right AI freelancer for your project

Fiverr’s expert pool of AI freelancers can come to the rescue if you’re not tech-savvy or don’t have the time to train ChatGPT yourself. 

Here’s how to find a qualified freelancer on Fiverr:

Search for competencies: Use keywords like “Python,” “AI development,” or “ChatGPT” to find freelancers with a suitable skill set. You can also use the menu at the top to browse. 

Examine portfolios: Review past projects and reviews to gauge their expertise and reliability.

Ask questions: Many freelancers welcome direct questions via Fiverr’s messaging system. Ask them about their experience with similar projects, their understanding of AI, and their suggested approach for your project.

Type of questions to ask: “Can you provide examples of similar AI development projects you’ve worked on?” and “What’s your approach to meeting project deadlines?”

Customer support: Fiverr’s rated freelancers and pros take their work seriously, but you can always contact Fiverr’s support team if you need help.

Ready to kickstart your AI project? Sign up for Fiverr today to get started.

About Author

R. Paulo Delgado Tech & Business Writer

R. Paulo Delgado is a tech and business freelance writer with nearly 17 years of software development experience under his belt, including WordPress programming. He is also a crypto journalist for Moneyweb, and proudly a member of Fiverr's Pro Seller program — hand-vetted professionals, verified by Fiverr for quality and service.