Hello, I noticed that the code for sub word tokenization is not provided in the blog. It says the code is below but its not there. Please add the implementation.
Hey Eli,
My bad looks like I forgot to add the implementation. This is how you can implement it.
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
tokens = tokenizer.tokenize("Happy Bday to me!")
print(tokens)
You can try different tokenizers instead of AutoTokenizer and check the differences.
Bridging the gap between basic tutorials and industry-grade applications, generative-ai.io is dedicated to bringing you the best in generative AI education.
Our content is designed to challenge and elevate your skills, ensuring you are well-prepared for real-world AI development.
© Copyright Algoholic (OPC) Private Limited