Resources

Datasets for evaluating compositional distributional models of meaning

  • Grefenstette and Sadrzadeh Compositional Distributional Model Evaluation Dataset, EMNLP 2011 (file)
  • Grefenstette and Sadrzadeh Compositional Distributional Model Evaluation Dataset, adjective-noun based transitive sentences, 2012 (file)
  • Kartsaklis, Sadrzadeh and Pulman Term-Definition Dataset, COLING 2012 (file)
  • Disambiguation Dataset used in Kartsaklis et al., CoNLL 2013. This previously unpublished dataset was produced by Mehrnoosh Sadrzadeh and Edward Grefenstette (file)
  • Kartsaklis and Sadrzadeh transitive sentence similarity dataset, EMNP 2013. This dataset extends the verb-object part of the Mitchell and Lapata (2010) dataset by the introduction of appropriate subject nouns. This version uses the original human judgements from the M&L 2010 dataset (file)
  • Kartsaklis and Sadrzadeh transitive sentence similarity dataset, QPL 2014. This is the same dataset as the one used in the EMNLP 2013 paper, but with re-evaluated human scores collected from Amazon Turk (file)
  • Kartsaklis and Sadrzadeh entailment datasets, LACL 2016/COLING 2016 (subject-verb, verb-object, and subject-verb-object)

Word embeddings

  • Cheng and Kartsaklis code (in Python/Theano) and word embeddings, EMNLP 2015 (link)

Code

  • Code and documentation for reproducing the categorical compositional models of Grefenstette and Sadrzadeh (EMNLP 2011), Kartsaklis et al. (COLING 2012), and Milajevs et al. (EMNLP 2014).
  • DyLan (“Dynamics of Language”) – an open-source Java implementation of Dynamic Syntax, including word-by-word incremental semantic parser and generator, and integration with the Jindigo dialogue system.
  • DiaSim – an open-source Java project for calculating lexical, syntactic and semantic similarity in dialogue corpora, including within- and between-speaker similarity and comparison to various randomly re-ordered baselines – see this paper.