| Question asked
|
In '''3.3 Step 1: Syntactic Filtering''' t … In '''3.3 Step 1: Syntactic Filtering''' the authors write "tags tjat are too small … or too large … are discarded. … Tags containing numbers are also filtered according to a set of custom heuristics: … we consider global tag frequency and discard any unpopular tags. Finally, common stop-words … are discarded." In class we've discussed some of the problems that can arise in discarding tags. Here is appears that this applies only to tags with numbers. The authors also apply a number of filters (to address misspellings, neologisms, and domain specific terminologies. Are there any effects of this that (other than what they intend) of which we should be aware? they intend) of which we should be aware?
|