WikipediaESA Demo: Enter two short texts (german!) to compute the semantic relatedness between them. For performance reasons the term database only contains the 88,537 most common german words.
Some examples to try out: "IBM", "Software", "Korruption", "Chemie", "VW", "Papst", "Tisch", "Hund", "Katze", "Deutschland", "Hamburg"
| data source: | Wikipedia (DE) 2008-03-20 |
|---|---|
| # used articles (dimensions): | 592,542 |
| # extracted terms: | ~ 3,900,000 |
| # term vectors: | 63,439 |
| stemming algo.: | Porter (german) |
| data size: | 278,137,881 bytes |
| min term length: | 2 characters |
| max term length: | 33 characters |
| avg term length: | 10.26 characters |
| min vector length: | 1 dimension |
| max vector length: | 317,515 dimensions |
| avg vector length: | 1,310.64 dimensions |