Datasets for evaluating compositional distributional models of meaning
- Grefenstette and Sadrzadeh Compositional Distributional Model Evaluation Dataset, EMNLP 2011 (file)
- Grefenstette and Sadrzadeh Compositional Distributional Model Evaluation Dataset, adjective-noun based transitive sentences, 2012 (file)
- Kartsaklis, Sadrzadeh and Pulman Term-Definition Dataset, COLING 2012 (file)
- Disambiguation Dataset used in Kartsaklis et al., CoNLL 2013. This previously unpublished dataset was produced by Mehrnoosh Sadrzadeh and Edward Grefenstette (file)
- Kartsaklis and Sadrzadeh transitive sentence similarity dataset, EMNLP 2013. This dataset extends the verb-object part of the Mitchell and Lapata (2010) dataset by the introduction of appropriate subject nouns. This version uses the original human judgements from the M&L 2010 dataset (file)
- Kartsaklis and Sadrzadeh transitive sentence similarity dataset, QPL 2014. This is the same dataset as the one used in the EMNLP 2013 paper, but with re-evaluated human scores collected from Amazon Turk (file)
- Kartsaklis and Sadrzadeh entailment datasets, LACL 2016/COLING 2016 (subject-verb, verb-object, and subject-verb-object)
- Wijnholds and Sadrzadeh verb disambiguation and sentence similarity datasets involving verb phrase ellipsis, NAACL 2019 (repo)
Word embeddings
- Cheng and Kartsaklis code (in Python/Theano) and word embeddings, EMNLP 2015 (link)
Code
- Code for reproducing the self- and other-repair detection experiments of Purver, Hough and Howes (TopiCS 2018).
- Code and documentation for reproducing the categorical compositional models of Grefenstette and Sadrzadeh (EMNLP 2011), Kartsaklis et al. (COLING 2012), and Milajevs et al. (EMNLP 2014).
- STIR (“Strictly Incremental Repair Detection”) – an open-source set of tools for self-repair detection in dialogue data.
- DyLan (“Dynamics of Language”) – an open-source Java implementation of Dynamic Syntax, including word-by-word incremental semantic parser and generator, and integration with the Jindigo dialogue system.
- DiaSim – an open-source Java project for calculating lexical, syntactic and semantic similarity in dialogue corpora, including within- and between-speaker similarity and comparison to various randomly re-ordered baselines – see this paper.