After we download the 20 newsgroups by whatever means we prefer, the data object called groups
is now available in the program. The data object is in the form of key-value dictionary. Its keys are as follows:
>>> groups.keys()
dict_keys(['description', 'target_names', 'target', 'filenames',
'DESCR', 'data'])
The target_names
key gives the newsgroups names:
>>> groups['target_names']
['alt.atheism', 'comp.graphics', 'comp.os.ms-windows.misc', 'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware', 'comp.windows.x', 'misc.forsale', 'rec.autos', 'rec.motorcycles', 'rec.sport.baseball', 'rec.sport.hockey', 'sci.crypt', 'sci.electronics', 'sci.med', 'sci.space', 'soc.religion.christian', 'talk.politics.guns', 'talk.politics.mideast', 'talk.politics.misc', 'talk.religion.misc']
The target
key corresponds to a newsgroup but is encoded as an integer:
>>> groups.target
array([7, 4, 4, ..., 3, 1, 8])
Then what are the distinct values for these integers...