We can certainly drop attributes that we think will not help the classifier in distinguishing between good and not-so-good answers. But we have to be cautious here. Although some features do not directly impact the classification, they are still necessary to keep:
- The PostTypeId attribute, for example, is necessary to distinguish between questions and answers. It will not be picked to serve as a feature, but we will need it to filter the data.
- CreationDate could be interesting to determine the time span between posting the question and posting the individual answers. In this chapter, however, we will ignore it.
- Score is important as an indicator of the community's evaluation.
- ViewCount, in contrast, ...