Splitting the data into a training and a testing set is not simple. Both the training and the testing sets have to be representative of the full dataset. If your dataset contains apartment sizes ranging from 15 to 200 square meters, it is probably not a good idea to use the observations that have an area lower than 50 square meters as the training set and use the rest as the testing set. This would not work because both the train and the test samples must contain areas from the whole range. Randomly splitting the data is often sufficient and results in a good representation of the features in both sets.
However, some situations do require a different approach and we should take these into consideration – for example, when the target variable (or any of the categorical features) is unbalanced, meaning some classes are predominant. In this case, we need to make sure both the train and the test samples respect the same class repartitions...