Age | Commit message (Collapse) | Author | Files | Lines |
|
I'm changing strategies to importing both OC and another dataset
before continuing to normalise the data, as it might be easier to do
in a set of table-constructing queries inside of SQLite with all raw
data in place.
Change-Id: I26b41af80586fc1bfd8e26a6be20579068a82507
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7872
Autosubmit: tazjin <tazjin@tvl.su>
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
|
|
This makes the actual imported database of the ~whole Russian
language (all lemmas, grammemes, forms etc.) a Nix build target which
is built in CI.
This still needs schema normalisation (it's fairly directly mapped to
the raw data), but it's already starting to be a useful data set.
This also happens to be a pretty cool demonstration of the power of
Nix. You can do `nix-build -A corp.russian.data-import.database` and
out comes a perfectly valid SQLite database with a valid external data
import!
Change-Id: I5d6d15e67d0e4a7ff590fad06252be34f5d561fd
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7866
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
|
|
Change-Id: I61ad021c7a5318b099f3adc8bc6aedef65500974
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7865
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
|
|
Change-Id: Iebdbc8f884f28064d7b00b8f8808b5030fa3d05c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7864
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
|
|
Change-Id: Iae01d1dc6894117dc693b4690d8bc79861212ae6
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7863
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
|
|
Otherwise up to 1000 elements might be missing.
Change-Id: I20d6238424eec27f0e758e7737c9c31bcb81b23d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7862
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
|
|
This is an initial and kind of dumb table structure, but there's some
massaging that needs to be done before this makes more sense.
Change-Id: I441288b684ef86be507099bcc4ebf984598789c8
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7861
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
|
|
Change-Id: I1e4efcfc8e555f61578b563411d5e6ed9590d8e8
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7860
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
|
|
Adds the beginning of a tool which can import OpenCorpora data into a
SQLite database. This is quite a lot of toil and there's probably a
better way to do this, but overall becoming this intimately familiar
with the data structures is quite helpful for understanding what I
can/can't do with only this dataset.
Change-Id: Ieab33a8ce07ea4ac87917b9c8132226bbc6523b1
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7859
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
|