NoCaCoDa logo

Лаборатория типологического изучения языков    Институт лингвистических исследований РАН

How to read the data

Each full entry in the database contains a stimulus sentence from the questionnaire translated into a target language and annotated for a number of parameters. The dataset containing all full entries in a single spreadsheet is available here. The annotations used in the database come from three sets:

  1. Semantic parameters associated with individual stimulus sentences.
  2. Parameters associated with individual translations into target languages.
  3. Parameters associated with language-specific nominal causal markers.

Apart from this, the database also features generalized data pertaining to the level of individual languages. These data are visible in the aggregate table here. The four sets of annotations are explained below.

Inclusion criteria

The translations obtained are considered a part of the database only if they meet the inclusion criteria. Nonetheless, the translations that failed to meet the inclusion criteria are visible to database users (often this serves as a way to explain the nature of the difficulties encountered). Sentences that failed to meet the inclusion criteria are not tagged for their causal marker (see ‘Parameters associated with individual translations into target languages’ below) and do not participate in any calculations based on the data.

The inclusion criteria are as follows:

  1. The causing event is syntactically represented by a noun phrase. The minimal necessary condition is the compatibility of its head with unquestionably nominal marking; thus, English gerunds can be considered “syntactically nouns” (because of doing), while English infinitives cannot (* because of (to) do).
  2. This noun phrase has approximately the same meaning as the English/Russian noun phrase in the stimulus sentence.
  3. This noun phrase is not a syntactic subject (i.e., sentences like ‘The tears made the handkerchief wet’ are not included).

Semantic parameters associated with individual stimulus sentences

Stimulus sentences are identified by numeric codes (“stimulus_no”: 1, 2, … 54). The 54 stimulus sentences were preannotated for five semantic parameters:

These parameters are explained in the Questionnaire section. Their values ascribed to the stimulus sentences are inherited by the entries in the final dataset.

Parameters associated with individual translations into target languages

Each entry in the dataset contains the following fields:

Parameters associated with language-specific nominal causal markers

Each language-specific causal marker was tagged for its morphosyntactic type, degree of explicitness, and polysemy/syncretism. These values are inherited by the full database entries. The following fields belong to this set of annotations:

Fields related to non-causal meanings of markers

Many nominal markers attested in the dataset can convey non-causal (e.g., spatial) meanings as well. For explicit causal markers, it can result in syncretism, while contextual causal markers have other meaning(s) “by definition”. The most salient and context-independent meanings are classified into five macro-types, labelled “goal”, “instrument”, “source”, “location”, and “object”. Each macro-type comprises one or several meanings, the name of the macro-type being the name of one of its meanings. These macro-types are not mutually exclusive: a certain marker can display more than one syncretic non-causal meaning. There is also an additional sixth type “other”, a residual category that covers relatively rare non-causal meanings. Each marker is tagged with “yes” or “no” for each of the macro-types; “yes” is chosen if the marker in question is productively used with at least one of the macro-type meanings. A marker with six “no”s is a dedicated causal marker.

Parameters associated with individual languages

Several parameters are identified at the level of individual languages. Some of them were predefined, while others summarize the results obtained with the help of the database. The spreadsheet with the values of these parameters can be seen here. The following parameters are used: