Back to Course
NLP Specialist: BERT & Beyond
Module 11 of 11
11. NLP Cheatsheet
Tokenizer
pythontokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") tokens = tokenizer("Hello world", return_tensors="pt")
Regex
pythonimport re # Find emails re.findall(r'[\w\.-]+@[\w\.-]+', text)
Embeddings
pythonmodel = SentenceTransformer('all-MiniLM-L6-v2') emb = model.encode("Hello world")