Blog - Today I Learned

My daily learnings in computer science, machine learning and development.

dsa_algos2025-08-29

Dijkstra's algorithm (/ˈdaɪkstrəz/ DYKE-strəz)...

Dijkstra's algorithm (/ˈdaɪkstrəz/ DYKE-strəz) is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, ...

tokenizer2025-08-17

BPE(byte-pair encoding): Use in GPT | GPT2

## BPE(byte-pair encoding): Use in GPT | GPT2 ```python from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('gpt2') f...

tokenizer2025-08-17

Normalization :

# Normalization : - cleanup the text (rm accents | spaces | unicode normalization | and others...) ```python your_name = "Nathan" # replace with y...

tokenizer2025-08-17

Pre-Tokenizer:

# Pre-Tokenizer: - A tokenizer cannot be trained on raw text alone. - Instead, we first need to split the texts into small entities, like words. - ...

dsa_algos2025-08-11

A segment tree is a binary tree data structure use...

A segment tree is a binary tree data structure used for efficient range queries and updates on arrays, such as finding the sum or minimum in a subarra...