generative-ai.io

Please Provide the code for sub-word tokenization

Hello, I noticed that the code for sub word tokenization is not provided in the blog. It says the code is below but its not there. Please add the implementation.

Hey Eli,
My bad looks like I forgot to add the implementation. This is how you can implement it.

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
tokens  = tokenizer.tokenize("Happy Bday to me!")
print(tokens)

You can try different tokenizers instead of AutoTokenizer and check the differences.

generative-ai.io

Bridging the gap between basic tutorials and industry-grade applications, generative-ai.io is dedicated to bringing you the best in generative AI education.

Our content is designed to challenge and elevate your skills, ensuring you are well-prepared for real-world AI development.

Contacts

Email:

[email protected]

Refunds:

Refund Policy

Social

Youtube