about summary refs log tree commit diff
path: root/corp/russian
diff options
context:
space:
mode:
authorVincent Ambo <mail@tazj.in>2023-01-24T22·36+0300
committerclbot <clbot@tvl.fyi>2023-01-24T22·41+0000
commit192dac5a749edece1b5b3fb0b8acb92819df22e0 (patch)
treecac2112b2f0eb7bd7dbfede8e0b0b7b4be903fb6 /corp/russian
parent80723b708d6edc44be00b61fb02260800101dcbb (diff)
feat(corp/data-import): map OR word types to sets of OC grammemes r/5752
Change-Id: I674f3a66fcd65314431a2ebd747e3830aa2dd7a1
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7924
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
Autosubmit: tazjin <tazjin@tvl.su>
Diffstat (limited to 'corp/russian')
-rw-r--r--corp/russian/data-import/src/mappings.rs13
1 files changed, 13 insertions, 0 deletions
diff --git a/corp/russian/data-import/src/mappings.rs b/corp/russian/data-import/src/mappings.rs
index 8a581ff86ba8..985088a56628 100644
--- a/corp/russian/data-import/src/mappings.rs
+++ b/corp/russian/data-import/src/mappings.rs
@@ -1,5 +1,18 @@
 //! Manual mapping of some data structures in OC/OR corpora.
 
+/// Maps the *names* of OpenRussian word types (the `word_type` field
+/// in the `or_words` table) to the *set* of OpenCorpora grammemes
+/// commonly attached to lemmata of this type in OC.
+///
+/// Some word types just don't map over, and are omitted. Many words
+/// also have an empty word type.
+pub const WORD_TYPES_GRAMMEME_MAP: &'static [(&'static str, &'static [&'static str])] = &[
+    ("adjective", &["ADJF"]),
+    ("adverb", &["ADVB"]),
+    ("noun", &["NOUN"]),
+    ("verb", &["INFN"]), // or "VERB" ...
+];
+
 /// Maps the *names* of OpenRussian grammemes (the `form_type` fields
 /// in the `or_word_forms` table) to the *set* of OpenCorpora
 /// grammemes attached to them corresponding lemma in the `oc_lemmas`