Dividing text using chunking
The chunking procedure can be used to divide the large text into small, meaningful words.
How to do it...
- Develop and import the following packages using Python:
import numpy as np from nltk.corpus import brown
- Describe a function that divides text into chunks:
# Split a text into chunks
def splitter(content, num_of_words):
words = content.split(' ')
result = [] - Initialize the following programming lines to get the assigned variables:
current_count = 0 current_words = []
- Start the iteration using words:
for word in words:
current_words.append(word)
current_count += 1 - After getting the essential amount of words, reorganize the variables:
if current_count == num_of_words:
result.append(' '.join(current_words))
current_words = []
current_count = 0 - Attach the chunks to the output variable:
result.append(' '.join(current_words))
return result - Import the data of
Brown corpusand consider the first10000...