Resources: https://playground.tensorflow.org/
Large Language Models (LLMs) have taken the world of AI by storm, revolutionizing natural language processing (NLP) tasks. At the heart of LLMs lie powerful architectures like Transformers. These models consist of two sub-components: encoders and decoders. Encoders process the input text, capturing its meaning and relationships between words. Decoders then leverage this encoded information to generate text, translate languages, or answer questions. A crucial element within Transformers is the attention mechanism, which allows the model to focus on specific parts of the input sequence, crucial for understanding context.
LLM Training ARC
Training LLMs is no small feat. It starts from data collection.
Data Collecton: A massive ammount of data is collected generally in Tera Bytes or PetaBytes. For instance, GPT3 is said to be trained on 45TB of data. Usually, companies resort to web-scrapping to collect the data available over the internet.
Data Cleaning: Collected data is often very 💩ty. Human reviewers clean it up.
Training: Supervised learning with backpropagation is a common approach. The LLM is presented with training data consisting of input text and desired output (e.g., translation, answer to a question). The model then makes predictions, and the errors between predictions and actual outputs are calculated. Backpropagation propagates these errors back through the network, adjusting the model's internal parameters to improve future predictions.