resumeanalyser.text_cleaning
Module Contents
Functions
|
Remove punctuation and special characters from the input text. |
|
Tokenize the input text into individual words. |
|
Convert all tokens in the input list to lowercase. |
|
Remove stop words from the list of tokens. |
|
Apply lemmatization to each token in the list. |
|
Clean text by applying a series of processing steps: tokenization, converting to lower case, |
- resumeanalyser.text_cleaning.remove_punctuation(text)[source]
Remove punctuation and special characters from the input text.
Parameters: text (str): A string containing the text to be processed.
Returns: str: The text with all punctuation and special characters removed.
Example: >>> remove_punctuation(“Hello, world!”) ‘Hello world’
- resumeanalyser.text_cleaning.tokenize(text)[source]
Tokenize the input text into individual words.
Parameters: text (str): A string containing the text to be tokenized.
Returns: list: A list of words (tokens) extracted from the input text.
Example: >>> tokenize(“Hello, world!”) [‘Hello’, ‘,’, ‘world’, ‘!’]
- resumeanalyser.text_cleaning.to_lower(tokens)[source]
Convert all tokens in the input list to lowercase.
Parameters: tokens (list): A list of tokens (words).
Returns: list: A list of tokens in lowercase.
Example: >>> to_lower([‘Hello’, ‘WORLD’]) [‘hello’, ‘world’]
- resumeanalyser.text_cleaning.remove_stop_words(tokens)[source]
Remove stop words from the list of tokens.
Parameters: tokens (list): A list of tokens (words).
Returns: list: A list of tokens with stop words removed.
Example: >>> remove_stop_words([‘this’, ‘is’, ‘a’, ‘sample’]) [‘sample’]
- resumeanalyser.text_cleaning.lemmatize(tokens)[source]
Apply lemmatization to each token in the list.
Parameters: tokens (list): A list of tokens (words).
Returns: list: A list of lemmatized tokens.
Example: >>> lemmatize([‘running’, ‘jumps’]) [‘running’, ‘jump’]
- resumeanalyser.text_cleaning.clean_text(text)[source]
Clean text by applying a series of processing steps: tokenization, converting to lower case, removing stop words, and applying lemmatization.
Parameters: text (str): A string containing the text to be cleaned.
Returns: str: The cleaned text as a single string.
Example: >>> clean_text(“This is a sample sentence, showing off the stop words filtration.”) ‘sample sentence showing stop word filtration’