about summary refs log tree commit diff
path: root/corp/russian (follow)
AgeCommit message (Collapse)AuthorFilesLines
2023-12-29 r/7278 chore(3p/sources): bump channels & overlays (2023-12-29)Vincent Ambo2-11/+11
* all: update wasm-bindgen to 0.2.89 in WASM projects * users/grfn: explicitly set pinentry for gpg-agent * 3p/crate2nix: drop patches that were merged upstream * 3p/rust-crates: fix one more package name that was broken by crates.io * 3p/overlays: bump telega backend to new required version The update for agenix has been dropped. It caused strange build errors with messages like these: patching script interpreter paths in /nix/store/0g0wpa3vxfb4w461s6ny3s1wr08faj73-agenix-0.15.0 /nix/store/0g0wpa3vxfb4w461s6ny3s1wr08faj73-agenix-0.15.0/bin/agenix: interpreter directive changed from "#!/usr/bin/env bash" to "/nix/store/q8qq40xg2grfh9ry1d9x4g7lq4ra7n81-bash-5.2-p21/bin/bash" stripping (with command strip and flags -S -p) in /nix/store/0g0wpa3vxfb4w461s6ny3s1wr08faj73-agenix-0.15.0/bin Running phase: installCheckPhase no Makefile or custom installCheckPhase, doing nothing agenix version: 0.15.0 error: creating directory '/nix/var': Permission denied There is no rule for secret1.age in ./secrets.nix. /nix/store/d4jf1cbbk494zwgbqz31pxgigpsbh6w2-stdenv-linux/setup: line 138: test: =: unary operator expected /nix/store/d4jf1cbbk494zwgbqz31pxgigpsbh6w2-stdenv-linux/setup: line 131: pop_var_context: head of shell_variables not a function context builder for '/nix/store/0ivvf44hxy0zv4gg8nvchdkp895xw5ri-agenix-0.15.0.drv' failed with exit code 2 I can't be bothered to deal with that right now. Change-Id: Ia052af0d97dbe9ef0c0d4f3e2214ac00ca8645a2 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10458 Reviewed-by: aspen <root@gws.fyi> Tested-by: BuildkiteCI
2023-11-12 r/6998 chore(3p/sources): bump nixpkgs & channels (2023-11-12)Vincent Ambo2-11/+11
* update wasm-bindgen in all Rust-wasm projects * remove stable overlays that work again in unstable * add texlive to stable overlays (see linked nixpkgs PR) * bump tdlib to 1.8.18, new minimum for telega.el Change-Id: Ib8e202de7dfbc35115fda31d0a98b6314b2adf17 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10010 Tested-by: BuildkiteCI Autosubmit: tazjin <tazjin@tvl.su> Reviewed-by: flokli <flokli@flokli.de>
2023-08-08 r/6475 chore(3p/sources): Bump channels & overlaysVincent Ambo2-21/+32
* web/pwcrypt: bump wasm-bindgen * corp/tvixbolt: bump wasm-bindgen * corp/rih/frontend: bump wasm-bindgen * corp/predlozhnik: bump wasm-bindgen * 3p/overlays: set hiPrio on nixpkgs-review package There is some upstream bug causing a conflict with the ZSH completion files generated by home-manager. Change-Id: Ibe5de5564d3214d48469abe175cbebe5356acf74 Reviewed-on: https://cl.tvl.fyi/c/depot/+/9046 Autosubmit: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI Reviewed-by: sterni <sternenseemann@systemli.org>
2023-03-10 r/5947 fix(predlozhnik): use correct link to source code after moveVincent Ambo1-1/+1
Change-Id: I74da72818d9afa96d6bfbfd02f0110707ef8b721 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8248 Reviewed-by: tazjin <tazjin@tvl.su> Autosubmit: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-02-07 r/5839 chore(3p/sources): bump nixpkgs & overlays (2023-02-07)Vincent Ambo2-11/+11
Included fixes: * //3p/overlays: tdlib override no longer needed (bump has landed upstream) * //corp/{predlozhnik,tvixbolt}: bump wasm-bindgen to match nixpkgs Home-manager has not been bumped as it has introduced an incompatibility with Nix 2.3 Change-Id: I96ac3462b82c73db1ba23be03d7968f10abc9b53 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8033 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de> Reviewed-by: sterni <sternenseemann@systemli.org>
2023-01-25 r/5755 fix(corp/data-import): `rank` is an integer fieldVincent Ambo2-2/+2
Change-Id: Ifc9cd46e5b5521096db19628bd8bcf026106dcc9 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7926 Reviewed-by: tazjin <tazjin@tvl.su> Autosubmit: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-01-24 r/5752 feat(corp/data-import): map OR word types to sets of OC grammemesVincent Ambo1-0/+13
Change-Id: I674f3a66fcd65314431a2ebd747e3830aa2dd7a1 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7924 Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su> Autosubmit: tazjin <tazjin@tvl.su>
2023-01-24 r/5751 feat(corp/data-import): map OC lemma grammemes to OR form typesVincent Ambo1-15/+103
Change-Id: Ie804d185269336b0d9fe417754e5e795918e65b8 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7923 Autosubmit: tazjin <tazjin@tvl.su> Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-01-24 r/5750 feat(corp/data-import): map OC word grammemes to OR form typesVincent Ambo2-0/+85
This table maps the grammemes for individual word forms (*not* for lemmata in either corpus!) to the corresponding grammemes from the other dataset. These have drastically different shapes, so the mapping is not perfect, but will help in determining which forms are intended to be the same on both sides. Change-Id: Ib0717e2f7a79d96bcb5e955a20f551e391fcd759 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7918 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI Autosubmit: tazjin <tazjin@tvl.su>
2023-01-22 r/5732 feat(corp/data-import): add import of OR 'translations' tableVincent Ambo3-0/+70
The original dataset contains translations into different languages, but only the English ones are imported here. Note that translations are for lemmata only. Change-Id: Ifb9c32c25fda44c38ad899efca9d205c520c0fa3 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7895 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-01-21 r/5730 feat(corp/data-import): add import of OR 'words_forms' tableVincent Ambo3-6/+69
This is the full morphological set table for all the words from the lemmata table, which they don't call it that. Change-Id: I6f5be673c5f59f11e36bd8c8c935844a7d4fd170 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7894 Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-21 r/5729 feat(corp/data-import): add import of OpenRussian 'words' tableVincent Ambo6-30/+348
This is actually the lemmata table of this corpus, not the forms of all words (they're in a separate table). Change-Id: I89a2c2817ccce840f47406fa2a636f4ed3f49154 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7893 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-01-21 r/5728 chore(corp/data-import): make OR data archive available in envVincent Ambo1-8/+15
Change-Id: Idacf42743051eae0cf7010f952a4f91af17ad708 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7892 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-01-18 r/5703 docs(corp/data-import): document OpenRussian formatVincent Ambo1-4/+53
This is the second dataset I want to integrate as it contains some more practically useful, but somewhat less structured, information. Change-Id: Ib46b2597a33e76f59e030f889a0961ecc5a144eb Reviewed-on: https://cl.tvl.fyi/c/depot/+/7873 Tested-by: BuildkiteCI Autosubmit: tazjin <tazjin@tvl.su> Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-18 r/5702 chore(corp/data-import): namespace tables for OpenCorpora dataVincent Ambo2-22/+22
I'm changing strategies to importing both OC and another dataset before continuing to normalise the data, as it might be easier to do in a set of table-constructing queries inside of SQLite with all raw data in place. Change-Id: I26b41af80586fc1bfd8e26a6be20579068a82507 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7872 Autosubmit: tazjin <tazjin@tvl.su> Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-01-18 r/5693 feat(corp/data-import): build morphology database in derivationVincent Ambo1-6/+10
This makes the actual imported database of the ~whole Russian language (all lemmas, grammemes, forms etc.) a Nix build target which is built in CI. This still needs schema normalisation (it's fairly directly mapped to the raw data), but it's already starting to be a useful data set. This also happens to be a pretty cool demonstration of the power of Nix. You can do `nix-build -A corp.russian.data-import.database` and out comes a perfectly valid SQLite database with a valid external data import! Change-Id: I5d6d15e67d0e4a7ff590fad06252be34f5d561fd Reviewed-on: https://cl.tvl.fyi/c/depot/+/7866 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-01-18 r/5692 feat(corp/data-import): let users specify output pathVincent Ambo1-6/+14
Change-Id: I61ad021c7a5318b099f3adc8bc6aedef65500974 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7865 Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-18 r/5691 feat(corp/data-import): parse and import linksVincent Ambo2-3/+78
Change-Id: Iebdbc8f884f28064d7b00b8f8808b5030fa3d05c Reviewed-on: https://cl.tvl.fyi/c/depot/+/7864 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-01-18 r/5690 feat(corp/data-import): parse and import link typesVincent Ambo2-2/+54
Change-Id: Iae01d1dc6894117dc693b4690d8bc79861212ae6 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7863 Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-18 r/5689 fix(corp/data-import): commit the final transaction, tooVincent Ambo1-0/+2
Otherwise up to 1000 elements might be missing. Change-Id: I20d6238424eec27f0e758e7737c9c31bcb81b23d Reviewed-on: https://cl.tvl.fyi/c/depot/+/7862 Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-18 r/5688 feat(corp/data-import): insert OpenCorpora data into SQLiteVincent Ambo2-9/+155
This is an initial and kind of dumb table structure, but there's some massaging that needs to be done before this makes more sense. Change-Id: I441288b684ef86be507099bcc4ebf984598789c8 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7861 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-01-18 r/5684 feat(corp/data-import): parse lemmas from OpenCorpora dumpVincent Ambo2-14/+135
Change-Id: I1e4efcfc8e555f61578b563411d5e6ed9590d8e8 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7860 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-01-18 r/5683 feat(corp/russian/data-import): new OpenCorpora data import toolVincent Ambo6-0/+829
Adds the beginning of a tool which can import OpenCorpora data into a SQLite database. This is quite a lot of toil and there's probably a better way to do this, but overall becoming this intimately familiar with the data structures is quite helpful for understanding what I can/can't do with only this dataset. Change-Id: Ieab33a8ce07ea4ac87917b9c8132226bbc6523b1 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7859 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-01-17 r/5678 chore(tazjin/predlozhnik): move to //corpVincent Ambo8-0/+934
This is currently hosted by the company, and I'm assigning my copyright to the company, which also runs an ad placement on the page. Note that the NixOS module for hosting it has not been moved yet. Change-Id: Iba9e1cab9370faa79e43c3344fbfbbbabead50b3 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7857 Reviewed-by: tazjin <tazjin@tvl.su> Autosubmit: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI