Testing machine learning models
Before proceeding, depending on your system, you may need to clean up the memory a bit and free space for machine learning models from previously used data structures. This is done using gc.collect
, after deleting any past variables not required anymore, and then checking the available memory by exact reporting from the psutil.virtualmemory
function:
import gc import psutil del([tfv_q1, tfv_q2, tfv, q1q2, question1_vectors, question2_vectors, svd_q1, svd_q2, q1_tfidf, q2_tfidf]) del([w2v_q1, w2v_q2]) del([model]) gc.collect() psutil.virtual_memory()
At this point, we simply recap the different features created up to now, and their meaning in terms of generated features:
fs_1
: List of basic featuresfs_2
: List of fuzzy featuresfs3_1
: Sparse data matrix of TFIDF for separated questionsfs3_2
: Sparse data matrix of TFIDF for combined questionsfs3_3
: Sparse data matrix of SVDfs3_4
: List of SVD statisticsfs4_1
: List of w2vec distancesfs4_2
: List of wmd distances...