Benchmarks
Similarity Benchmark Sim-B
We conducted a user study and created a benchmark containing 1726 user-annotated sentence pairs from the German and the English Wikipedia.
The benchmark Sim-B and its description can be downloaded as a MySQL import script here.
The file "benchmark_multiwiki_simb.sql" contains a MySQL export script that creates six tables. The basic table is "sentence_pairs" that contains an English and a German sentence in each row. Each of these pairs is assigned a user evaluated similarity value. The others tables contain additional information. Some of them were parts of the Wikipedia markup (internal links, external links), others were extracted with external tools.
Page | Avg. Rating | Article ID1 | Article ID2 | Text1 | Text2 |
---|---|---|---|---|---|
European Union | 1.0 | en-635761078 | de-136109478 | In 2012, the EU was Union awarded the Nobel Peace Prize. | 2012 wurde der Europischen Union der Friedensnobelpreis zuerkannt. |
Nicolaus Copernicus | 0.4375 | en-634443003 | de-134393612 | He died about 1483. | Als sein Vater 1483 starb, war Nikolaus zehn Jahre alt. |
Text Passage Alignment Benchmark Align-B
We conducted a user study and created a benchmark containing 55 article pairs (English/German and English/Russian) with their text passages being aligned by at least three users each.The benchmark Align-B and its description can be downloaded as a MySQL import script here.
The file "benchmark_multiwiki_alignb.sql" contains a MySQL export script that creates ten tables. The basic table is "passage_pairs" that contains text passage pairs extracted and aligned by users. Each of these pairs is assigned a user given title. The others tables contain additional information. Some of them were parts of the Wikipedia markup (internal links, external links), others were extracted with external tools.
Page | User ID | Article ID1 | Article ID2 | Passage1 | Passage2 | Title |
---|---|---|---|---|---|---|
Winger (sports) | 43 | en-664310306 | de-664310306 | 10-11-12-13 | 7-8-9 | Football |
Johann Hugo von Orsbeck | 43 | en-668382450 | de-144384256 | 4-5 | 8-9 | Birth and parents |
Johann Hugo von Orsbeck | 47 | en-668382450 | de-144384256 | 29 | 61 | The end |