Blog - Today I Learned

My daily learnings in computer science, machine learning and development.

tokenizer

Normalization :

# Normalization : - cleanup the text (rm accents | spaces | unicode normalization | and others...) ```python your_name = "Nathan" # replace with y...

tokenizer

Pre-Tokenizer:

# Pre-Tokenizer: - A tokenizer cannot be trained on raw text alone. - Instead, we first need to split the texts into small entities, like words. - ...