Explicit Semantic Analysis Demo

WikipediaESA Demo: Enter two short texts (german!) to compute the semantic relatedness between them. For performance reasons the term database only contains the 88,537 most common german words.

Demo: Compute Semantic Relatedness

Some examples to try out: "IBM", "Software", "Korruption", "Chemie", "VW", "Papst", "Tisch", "Hund", "Katze", "Deutschland", "Hamburg"

Stats

data source:Wikipedia (DE) 2008-03-20
# used articles (dimensions):592,542
# extracted terms:~ 3,900,000
# term vectors:63,439
stemming algo.:Porter (german)
data size:278,137,881 bytes
min term length:2 characters
max term length:33 characters
avg term length:10.26 characters
min vector length:1 dimension
max vector length:317,515 dimensions
avg vector length:1,310.64 dimensions
Created by Henning Jacobs
Guerra Creativa - Creative Crowdsourcing