Classification Statistics - thursday 2006-03-30 0553 last modified 2006-03-30 0740
Categories: Nerdy, ryanlee.org
TrackBacks Sent: None

Web spam tokens as gathered since classification began.

Five most unnatural terms (closest probability to 1):

  • guestbook
  • scripts
  • cgispy
  • cgi
  • mp3 (when used as the text of a link)

Most neutral term (closest probability to 0.5): www

Five most natural terms (closest probability to 0):

  • I
  • have
  • but
  • are
  • not

You must login to leave a comment

TrackBacks

No TrackBacks for this entry.