Google Colab: https://colab.research.google.com/drive/1-0yZZmDe6RDmeyrWNXOg-ruT4ZPjK6li
Let us try to build a clone of https://despam.io/ or OpenAI text moderation. In this process we will be able to understand:
Why fine-tuning an LLM is important ?
Let's understand it with an example. The point is, that we can use system prompts to get the desired output. However, system prompts may not work well in many cases.
# checking the efficiency of system prompts
import requests
import json
url = "https://api.openai.com/v1/chat/completions"
payload = json.dumps({
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a text classification model. Given an input text, \
you will return a JSON object containing the probability scores for the following categories: \
'toxic', 'indecent', 'threat', 'offensive', 'erotic', and 'spam'. \
The JSON object should have keys corresponding to these categories, and the values should be floating-point numbers between 0 and 1, \
representing the probability of the input text belonging to that category. \
The probabilities should have a precision of 8 decimal places. \
Please respond with only the JSON object, without any additional text or explanation."
},
{
"role": "user",
"content": "Hello boss, I can not sustain your ego anymore."
}
],
})
headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer 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',
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
- Imaging after human reviewing, we have identified some predictions are not working correctly. We want to teach GPT about those particular comments. For this, we will need to fine tune GPT.
- Notice the size of the system prompt, it is consuming 139 tokens, which is a bit large. If we can minimize the size of the input prompt to under 100 tokens. Then per request, we will be saving 40 tokens.
Let's say is used by some Telegram or Discord servers, we get around 100K requests per day. So, per day we will be saving around 4M input tokens. Which will save us 40 USD per day considering the cost of gpt-3.5-turbo. In a year we will be saving 40*365 which is approx 14000 USD which is a decent amount for a new born startup. Well fine-tuning often comes at a cost, the price of inference by fine-tuned models is often very high so this point of reducing cost is not that important.
Let's try to fine-tune OpenAI's gpt3.5-turbo to save input tokens. In the next section, we are going to prepare and preprocess the dataset.