diff options
author | Vincent Ambo <mail@tazj.in> | 2023-01-24T22·36+0300 |
---|---|---|
committer | clbot <clbot@tvl.fyi> | 2023-01-24T22·41+0000 |
commit | 192dac5a749edece1b5b3fb0b8acb92819df22e0 (patch) | |
tree | cac2112b2f0eb7bd7dbfede8e0b0b7b4be903fb6 /corp | |
parent | 80723b708d6edc44be00b61fb02260800101dcbb (diff) |
feat(corp/data-import): map OR word types to sets of OC grammemes r/5752
Change-Id: I674f3a66fcd65314431a2ebd747e3830aa2dd7a1 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7924 Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su> Autosubmit: tazjin <tazjin@tvl.su>
Diffstat (limited to 'corp')
-rw-r--r-- | corp/russian/data-import/src/mappings.rs | 13 |
1 files changed, 13 insertions, 0 deletions
diff --git a/corp/russian/data-import/src/mappings.rs b/corp/russian/data-import/src/mappings.rs index 8a581ff86ba8..985088a56628 100644 --- a/corp/russian/data-import/src/mappings.rs +++ b/corp/russian/data-import/src/mappings.rs @@ -1,5 +1,18 @@ //! Manual mapping of some data structures in OC/OR corpora. +/// Maps the *names* of OpenRussian word types (the `word_type` field +/// in the `or_words` table) to the *set* of OpenCorpora grammemes +/// commonly attached to lemmata of this type in OC. +/// +/// Some word types just don't map over, and are omitted. Many words +/// also have an empty word type. +pub const WORD_TYPES_GRAMMEME_MAP: &'static [(&'static str, &'static [&'static str])] = &[ + ("adjective", &["ADJF"]), + ("adverb", &["ADVB"]), + ("noun", &["NOUN"]), + ("verb", &["INFN"]), // or "VERB" ... +]; + /// Maps the *names* of OpenRussian grammemes (the `form_type` fields /// in the `or_word_forms` table) to the *set* of OpenCorpora /// grammemes attached to them corresponding lemma in the `oc_lemmas` |