Spaces:
Runtime error
Runtime error
Update README_for_domain_specific_ChatGPT.md
Browse files
README_for_domain_specific_ChatGPT.md
CHANGED
@@ -56,11 +56,11 @@ There are essentially three ways to interact with ChatGPT for domain-specific pu
|
|
56 |
2. Inject content into prompts: The second approach, which I took in my demo example, is to inject domain-specific context into your prompt. In this scenario, ChatGPT uses its well-practiced natural language capabilities, but then looks to your specific content when formulating an answer.
|
57 |
|
58 |
|
59 |
-
3. Fine-tune a model: Currently, only the previous and less powerful version of ChatGPT’s neural network model (
|
60 |
|
61 |
-
The newest model (
|
62 |
|
63 |
-
Instead, with
|
64 |
|
65 |
The second and third approaches above use a technique known as “in-context” learning. The key difference between the two is that the second injects the domain- specific content in real time into the prompt whereas approach three tailors the model to your needs and produces a reusable customized model, with potentially more accurate results. With approach two, the base model is used unchanged and the model retains no "memory" of the injected content, outside of the current session.
|
66 |
|
@@ -71,7 +71,7 @@ Each token also gets a numerical representation of the word or word fragment cal
|
|
71 |
|
72 |
In ChatGPT's case, each token has 4,096 data points or dimensions associated with it. In addition, ChatGPT's artificial intelligence model -- a deep neural network -- pays attention to words that come before and after, so it holds on to context as it "reads in" new words.
|
73 |
|
74 |
-
##
|
75 |
Neural networks are often described as brain-like, with “neurons” and their connecting “synapses.” In the simple example below, the far left layer takes in input (the word-derived tokens) and the far right layer is the output (the answer or response). In between, the input goes through many layers and nodes, include the embedding, depending on the complexity of the model. This part is “hidden” in that what each node represents is not easily discernable.
|
76 |
|
77 |
The lines between the model's nodes (similar to synapses connecting neurons in the brain), receive a mathematical weighting that maximizes the chances that the output is correct. These weightings are called parameters.
|
@@ -79,7 +79,7 @@ The lines between the model's nodes (similar to synapses connecting neurons in t
|
|
79 |
![image](https://github.com/robjm16/domain_specific_ChatGPT/blob/main/basic_nn.png?raw=true)
|
80 |
|
81 |
|
82 |
-
The ChatGPT model (
|
83 |
|
84 |
The ChatGPT model also has an “attention” mechanism that allows it to differentially weight the importance of different parts of the input text, leading to a more coherent and fluent response.
|
85 |
|
@@ -107,19 +107,19 @@ Below is an example of a question and response within the interface:
|
|
107 |
## The ChatGPT Ecosystem
|
108 |
OpenAI was founded in 2015 by a group that includes Elon Musk. As mentioned earlier, Microsoft is an investor and key partner.
|
109 |
|
110 |
-
Microsoft plans to incorporate ChatGPT into many of its offerings. For example, it could be integrated with Microsoft Word and PowerPoint, for help in writing, editing and summarization. It could be used to augment Microsoft’s Bing search engine, providing direct answers to questions along with a more semantic search engine. ChatGPT’s coding assistance abilities could be integrated with Microsoft’s Visual Studio code editing product. (Microsoft offers Github Copilot, a code auto-completion tool, and some coders are already using Copilot and
|
111 |
|
112 |
-
The other large cloud providers – Google and Amazon Web Services (AWS) – will no doubt integrate
|
113 |
|
114 |
Google’s CEO has reportedly called a “code red” following the release of ChatGPT, challenging the company to quickly incorporate Google’s own ChatGPT-like models into its dominant search platform.
|
115 |
|
116 |
-
Google, in fact, developed several of the most powerful “large language models” similar to
|
117 |
|
118 |
Meta is also a key player, with Facebook’s RoBERTa model.
|
119 |
|
120 |
AWS’s suite of AI services is called SageMaker. It includes pre-built algorithms and enables companies to quickly build, train and deploy machine learning models.
|
121 |
|
122 |
-
Another player is Hugging Face, which hosts a popular community website for sharing open-source models and for quickly prototyping and deploying natural language processing models. The platform includes a library of pre-trained models, a framework for training and fine-tuning models, and an API for deploying models to production. You can access and use
|
123 |
|
124 |
## Data Security
|
125 |
Each organization will have to make its own security judgments around using ChatGPT, including server location, VPN, firewall, encryption and data security issues.
|
@@ -130,7 +130,7 @@ ChatGPT notes, however, that it has guidelines to guard against confidential or
|
|
130 |
|
131 |
The fine-tuned model is different. OpenAI does not have access to the prompts you use to fine-tune your version of the model and thus could not use them to train the base model.
|
132 |
|
133 |
-
In addition, your organization can purchase
|
134 |
|
135 |
Bottom line: organizations will need to think carefully about restricting or sanitizing some inputs and choosing the right fine-tuning and security arrangements.
|
136 |
|
|
|
56 |
2. Inject content into prompts: The second approach, which I took in my demo example, is to inject domain-specific context into your prompt. In this scenario, ChatGPT uses its well-practiced natural language capabilities, but then looks to your specific content when formulating an answer.
|
57 |
|
58 |
|
59 |
+
3. Fine-tune a model: Currently, only the previous and less powerful version of ChatGPT’s neural network model (GPT-2) is available to install and use in your own environment. With GPT-2 and some other pre-trained libraries, you can alter aspects of the model in a process called transfer learning, and train it on your domain-specific content.
|
60 |
|
61 |
+
The newest model (GPT-3) can be accessed via the OpenAI API. You can “fine tune” it on your content and save a new version of it (at OpenAI) for future use via the API. But you cannot fundamentally alter and retrain it in the traditional machine learning sense. One reason why is the sheer size of the pre-trained model -- the time and cost of retraining would be prohibitive for virtually all users.
|
62 |
|
63 |
+
Instead, with GPT-3, you create a new version of the model and feed it your domain-specific content. The model then runs in the background, seeking to maximize correct answers by updating some of the model’s parameters (see discussion of neural networks below). When complete, it creates a proprietary version of the model for your organization.
|
64 |
|
65 |
The second and third approaches above use a technique known as “in-context” learning. The key difference between the two is that the second injects the domain- specific content in real time into the prompt whereas approach three tailors the model to your needs and produces a reusable customized model, with potentially more accurate results. With approach two, the base model is used unchanged and the model retains no "memory" of the injected content, outside of the current session.
|
66 |
|
|
|
71 |
|
72 |
In ChatGPT's case, each token has 4,096 data points or dimensions associated with it. In addition, ChatGPT's artificial intelligence model -- a deep neural network -- pays attention to words that come before and after, so it holds on to context as it "reads in" new words.
|
73 |
|
74 |
+
## GPT-3: One of World’s Largest Neural Networks
|
75 |
Neural networks are often described as brain-like, with “neurons” and their connecting “synapses.” In the simple example below, the far left layer takes in input (the word-derived tokens) and the far right layer is the output (the answer or response). In between, the input goes through many layers and nodes, include the embedding, depending on the complexity of the model. This part is “hidden” in that what each node represents is not easily discernable.
|
76 |
|
77 |
The lines between the model's nodes (similar to synapses connecting neurons in the brain), receive a mathematical weighting that maximizes the chances that the output is correct. These weightings are called parameters.
|
|
|
79 |
![image](https://github.com/robjm16/domain_specific_ChatGPT/blob/main/basic_nn.png?raw=true)
|
80 |
|
81 |
|
82 |
+
The ChatGPT model (GPT-3) has 175 billion potential line weightings or parameters, but not all of them “fire” depending on the prompt. By contrast, GPT-2 has 1.5 billion parameters.
|
83 |
|
84 |
The ChatGPT model also has an “attention” mechanism that allows it to differentially weight the importance of different parts of the input text, leading to a more coherent and fluent response.
|
85 |
|
|
|
107 |
## The ChatGPT Ecosystem
|
108 |
OpenAI was founded in 2015 by a group that includes Elon Musk. As mentioned earlier, Microsoft is an investor and key partner.
|
109 |
|
110 |
+
Microsoft plans to incorporate ChatGPT into many of its offerings. For example, it could be integrated with Microsoft Word and PowerPoint, for help in writing, editing and summarization. It could be used to augment Microsoft’s Bing search engine, providing direct answers to questions along with a more semantic search engine. ChatGPT’s coding assistance abilities could be integrated with Microsoft’s Visual Studio code editing product. (Microsoft offers Github Copilot, a code auto-completion tool, and some coders are already using Copilot and GPT-3 in tandem to improve their productivity.) Lastly, Microsoft Azure’s cloud computing services are already incorporating GPT-3 -- for example, helping large companies fine-tune ChatGPT on domain-specific content.
|
111 |
|
112 |
+
The other large cloud providers – Google and Amazon Web Services (AWS) – will no doubt integrate GPT-3 into their AI offerings, while continuing to enhance their own AI models.
|
113 |
|
114 |
Google’s CEO has reportedly called a “code red” following the release of ChatGPT, challenging the company to quickly incorporate Google’s own ChatGPT-like models into its dominant search platform.
|
115 |
|
116 |
+
Google, in fact, developed several of the most powerful “large language models” similar to GPT-3 (they go by the names BERT, T5 and XLNet).
|
117 |
|
118 |
Meta is also a key player, with Facebook’s RoBERTa model.
|
119 |
|
120 |
AWS’s suite of AI services is called SageMaker. It includes pre-built algorithms and enables companies to quickly build, train and deploy machine learning models.
|
121 |
|
122 |
+
Another player is Hugging Face, which hosts a popular community website for sharing open-source models and for quickly prototyping and deploying natural language processing models. The platform includes a library of pre-trained models, a framework for training and fine-tuning models, and an API for deploying models to production. You can access and use GPT-2 through Hugging Face (with GPT-3 available through the OpenAI API.)
|
123 |
|
124 |
## Data Security
|
125 |
Each organization will have to make its own security judgments around using ChatGPT, including server location, VPN, firewall, encryption and data security issues.
|
|
|
130 |
|
131 |
The fine-tuned model is different. OpenAI does not have access to the prompts you use to fine-tune your version of the model and thus could not use them to train the base model.
|
132 |
|
133 |
+
In addition, your organization can purchase GPT-3 licenses for on-premises deployment or a "fully managed" enterprise solution hosted on the Microsoft Azure cloud.
|
134 |
|
135 |
Bottom line: organizations will need to think carefully about restricting or sanitizing some inputs and choosing the right fine-tuning and security arrangements.
|
136 |
|