Write the code to count the number of distinct tokens in a text
To count the number of distinct tokens in a text, you can use Python along with the nltk
library for natural language processing. If you don't have nltk
installed, you can install it using pip install nltk
. Here's a simple example:
import nltk
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist
def count_distinct_tokens(text):
# Tokenize the text
tokens = word_tokenize(text)
# Calculate the frequency distribution of tokens
frequency_distribution = FreqDist(tokens)
# Count the number of distinct tokens
num_distinct_tokens = len(frequency_distribution)
return num_distinct_tokens
# Example text
sample_text = "This is a sample text. It contains some sample tokens for demonstration purposes."
# Count distinct tokens
result = count_distinct_tokens(sample_text)
print(f"Number of distinct tokens: {result}")
In this example, the count_distinct_tokens
function takes a text input, tokenizes it using word_tokenize
from the nltk
library, and then calculates the frequency distribution of the tokens using FreqDist
. The length of the frequency distribution gives the number of distinct tokens.