diff options
author | Vincent Ambo <mail@tazj.in> | 2023-02-14T12·02+0300 |
---|---|---|
committer | tazjin <tazjin@tvl.su> | 2023-03-13T20·30+0000 |
commit | 025c67bf4d5666411b4d6cdc929e1a677ebc0439 (patch) | |
tree | e6e683d5c194686c8112518965f7e11458b18f27 /tvix/eval/builtin-macros/src | |
parent | cbb4137dc08620af9d6360057c75891bf4d03b5f (diff) |
refactor(tvix/eval): flatten call stack of VM using generators r/5964
Warning: This is probably the biggest refactor in tvix-eval history, so far. This replaces all instances of trampolines and recursion during evaluation of the VM loop with generators. A generator is an asynchronous function that can be suspended to yield a message (in our case, vm::generators::GeneratorRequest) and receive a response (vm::generators::GeneratorResponsee). The `genawaiter` crate provides an interpreter for generators that can drive their execution and lets us move control flow between the VM and suspended generators. To do this, massive changes have occured basically everywhere in the code. On a high-level: 1. The VM is now organised around a frame stack. A frame is either a call frame (execution of Tvix bytecode) or a generator frame (a running or suspended generator). The VM has an outer loop that pops a frame off the frame stack, and then enters an inner loop either driving the execution of the bytecode or the execution of a generator. Both types of frames have several branches that can result in the frame re-enqueuing itself, and enqueuing some other work (in the form of a different frame) on top of itself. The VM will eventually resume the frame when everything "above" it has been suspended. In this way, the VM's new frame stack takes over much of the work that was previously achieved by recursion. 2. All methods previously taking a VM have been refactored into async functions that instead emit/receive generator messages for communication with the VM. Notably, this includes *all* builtins. This has had some other effects: - Some test have been removed or commented out, either because they tested code that was mostly already dead (nix_eq) or because they now require generator scaffolding which we do not have in place for tests (yet). - Because generator functions are technically async (though no async IO is involved), we lose the ability to use much of the Rust standard library e.g. in builtins. This has led to many algorithms being unrolled into iterative versions instead of iterator combinations, and things like sorting had to be implemented from scratch. - Many call sites that previously saw a `Result<..., ErrorKind>` bubble up now only see the result value, as the error handling is encapsulated within the generator loop. This reduces number of places inside of builtin implementations where error context can be attached to calls that can fail. Currently what we gain in this tradeoff is significantly more detailed span information (which we still need to bubble up, this commit does not change the error display). We'll need to do some analysis later of how useful the errors turn out to be and potentially introduce some methods for attaching context to a generator frame again. This change is very difficult to do in stages, as it is very much an "all or nothing" change that affects huge parts of the codebase. I've tried to isolate changes that can be isolated into the parent CLs of this one, but this change is still quite difficult to wrap one's mind and I'm available to discuss it and explain things to any reviewer. Fixes: b/238, b/237, b/251 and potentially others. Change-Id: I39244163ff5bbecd169fe7b274df19262b515699 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8104 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: Adam Joseph <adam@westernsemico.com> Tested-by: BuildkiteCI
Diffstat (limited to 'tvix/eval/builtin-macros/src')
-rw-r--r-- | tvix/eval/builtin-macros/src/lib.rs | 150 |
1 files changed, 97 insertions, 53 deletions
diff --git a/tvix/eval/builtin-macros/src/lib.rs b/tvix/eval/builtin-macros/src/lib.rs index ba59cb2e1685..dfd0948c7d9f 100644 --- a/tvix/eval/builtin-macros/src/lib.rs +++ b/tvix/eval/builtin-macros/src/lib.rs @@ -6,20 +6,24 @@ use quote::{quote, quote_spanned, ToTokens}; use syn::parse::Parse; use syn::spanned::Spanned; use syn::{ - parse2, parse_macro_input, parse_quote, Attribute, FnArg, Ident, Item, ItemMod, LitStr, Meta, - Pat, PatIdent, PatType, Token, Type, + parse2, parse_macro_input, parse_quote, parse_quote_spanned, Attribute, FnArg, Ident, Item, + ItemMod, LitStr, Meta, Pat, PatIdent, PatType, Token, Type, }; -struct BuiltinArgs { - name: LitStr, -} +/// Description of a single argument passed to a builtin +struct BuiltinArgument { + /// The name of the argument, to be used in docstrings and error messages + name: Ident, -impl Parse for BuiltinArgs { - fn parse(input: syn::parse::ParseStream) -> syn::Result<Self> { - Ok(BuiltinArgs { - name: input.parse()?, - }) - } + /// Type of the argument. + ty: Box<Type>, + + /// Whether the argument should be forced before the underlying builtin + /// function is called. + strict: bool, + + /// Span at which the argument was defined. + span: Span, } fn extract_docstring(attrs: &[Attribute]) -> Option<String> { @@ -97,6 +101,11 @@ fn parse_module_args(args: TokenStream) -> Option<Type> { /// builtin upon instantiation. Using this, builtins that close over some external state can be /// written. /// +/// The type of each function is rewritten to receive a `Vec<Value>`, containing each `Value` +/// argument that the function receives. The body of functions is accordingly rewritten to "unwrap" +/// values from this vector and bind them to the correct names, so unless a static error occurs this +/// transformation is mostly invisible to users of the macro. +/// /// A function `fn builtins() -> Vec<Builtin>` will be defined within the annotated module, /// returning a list of [`tvix_eval::Builtin`] for each function annotated with the `#[builtin]` /// attribute within the module. If a `state` type is specified, the `builtins` function will take a @@ -114,10 +123,10 @@ fn parse_module_args(args: TokenStream) -> Option<Type> { /// /// #[builtins] /// mod builtins { -/// use tvix_eval::{ErrorKind, Value, VM}; +/// use tvix_eval::{GenCo, ErrorKind, Value}; /// /// #[builtin("identity")] -/// pub fn builtin_identity(_vm: &mut VM, x: Value) -> Result<Value, ErrorKind> { +/// pub async fn builtin_identity(co: GenCo, x: Value) -> Result<Value, ErrorKind> { /// Ok(x) /// } /// @@ -125,7 +134,7 @@ fn parse_module_args(args: TokenStream) -> Option<Type> { /// // argument with the `#[lazy]` attribute /// /// #[builtin("tryEval")] -/// pub fn builtin_try_eval(vm: &mut VM, #[lazy] x: Value) -> Result<Value, ErrorKind> { +/// pub async fn builtin_try_eval(co: GenCo, #[lazy] x: Value) -> Result<Value, ErrorKind> { /// todo!() /// } /// } @@ -156,7 +165,7 @@ pub fn builtins(args: TokenStream, item: TokenStream) -> TokenStream { .position(|attr| attr.path.get_ident().iter().any(|id| *id == "builtin")) { let builtin_attr = f.attrs.remove(builtin_attr_pos); - let BuiltinArgs { name } = match builtin_attr.parse_args() { + let name: LitStr = match builtin_attr.parse_args() { Ok(args) => args, Err(err) => return err.into_compile_error().into(), }; @@ -169,10 +178,11 @@ pub fn builtins(args: TokenStream, item: TokenStream) -> TokenStream { .into(); } - // Determine if this function is taking the state parameter. - let mut args_iter = f.sig.inputs.iter_mut().peekable(); + // Inspect the first argument to determine if this function is + // taking the state parameter. + // TODO(tazjin): add a test in //tvix/eval that covers this let mut captures_state = false; - if let Some(FnArg::Typed(PatType { pat, .. })) = args_iter.peek() { + if let FnArg::Typed(PatType { pat, .. }) = &f.sig.inputs[0] { if let Pat::Ident(PatIdent { ident, .. }) = pat.as_ref() { if ident.to_string() == "state" { if state_type.is_none() { @@ -184,20 +194,28 @@ pub fn builtins(args: TokenStream, item: TokenStream) -> TokenStream { } } - // skip state and/or VM args .. - let skip_num = if captures_state { 2 } else { 1 }; + let mut rewritten_args = std::mem::take(&mut f.sig.inputs) + .into_iter() + .collect::<Vec<_>>(); + + // Split out the value arguments from the static arguments. + let split_idx = if captures_state { 2 } else { 1 }; + let value_args = rewritten_args.split_off(split_idx); - let builtin_arguments = args_iter - .skip(skip_num) + let builtin_arguments = value_args + .into_iter() .map(|arg| { + let span = arg.span(); let mut strict = true; - let name = match arg { + let (name, ty) = match arg { FnArg::Receiver(_) => { - return Err(quote_spanned!(arg.span() => { - compile_error!("Unexpected receiver argument in builtin") + return Err(quote_spanned!(span => { + compile_error!("unexpected receiver argument in builtin") })) } - FnArg::Typed(PatType { attrs, pat, .. }) => { + FnArg::Typed(PatType { + mut attrs, pat, ty, .. + }) => { attrs.retain(|attr| { attr.path.get_ident().into_iter().any(|id| { if id == "lazy" { @@ -209,34 +227,66 @@ pub fn builtins(args: TokenStream, item: TokenStream) -> TokenStream { }) }); match pat.as_ref() { - Pat::Ident(PatIdent { ident, .. }) => ident.to_string(), - _ => "unknown".to_string(), + Pat::Ident(PatIdent { ident, .. }) => { + (ident.clone(), ty.clone()) + } + _ => panic!("ignored value parameters must be named, e.g. `_x` and not just `_`"), } } }; - Ok(quote_spanned!(arg.span() => { - crate::BuiltinArgument { - strict: #strict, - name: #name, - } - })) + Ok(BuiltinArgument { + strict, + span, + name, + ty, + }) }) - .collect::<Result<Vec<_>, _>>(); + .collect::<Result<Vec<BuiltinArgument>, _>>(); let builtin_arguments = match builtin_arguments { - Ok(args) => args, Err(err) => return err.into(), + + // reverse argument order, as they are popped from the stack + // slice in opposite order + Ok(args) => args, }; - let fn_name = f.sig.ident.clone(); - let num_args = f.sig.inputs.len() - skip_num; - let args = (0..num_args) - .map(|n| Ident::new(&format!("arg_{n}"), Span::call_site())) - .collect::<Vec<_>>(); - let mut reversed_args = args.clone(); - reversed_args.reverse(); + // Rewrite the argument to the actual function to take a + // `Vec<Value>`, which is then destructured into the + // user-defined values in the function header. + let sig_span = f.sig.span(); + rewritten_args.push(parse_quote_spanned!(sig_span=> mut values: Vec<Value>)); + f.sig.inputs = rewritten_args.into_iter().collect(); + + // Rewrite the body of the function to do said argument forcing. + // + // This is done by creating a new block for each of the + // arguments that evaluates it, and wraps the inner block. + for arg in &builtin_arguments { + let block = &f.block; + let ty = &arg.ty; + let ident = &arg.name; + if arg.strict { + f.block = Box::new(parse_quote_spanned! {arg.span=> { + let #ident: #ty = generators::request_force(&co, values.pop() + .expect("Tvix bug: builtin called with incorrect number of arguments")).await; + + #block + }}); + } else { + f.block = Box::new(parse_quote_spanned! {arg.span=> { + let #ident: #ty = values.pop() + .expect("Tvix bug: builtin called with incorrect number of arguments"); + + #block + }}) + } + } + + let fn_name = f.sig.ident.clone(); + let arg_count = builtin_arguments.len(); let docstring = match extract_docstring(&f.attrs) { Some(docs) => quote!(Some(#docs)), None => quote!(None), @@ -247,24 +297,18 @@ pub fn builtins(args: TokenStream, item: TokenStream) -> TokenStream { let inner_state = state.clone(); crate::Builtin::new( #name, - &[#(#builtin_arguments),*], #docstring, - move |mut args: Vec<crate::Value>, vm: &mut crate::VM| { - #(let #reversed_args = args.pop().unwrap();)* - #fn_name(inner_state.clone(), vm, #(#args),*) - } + #arg_count, + move |values| Gen::new(|co| generators::pin_generator(#fn_name(inner_state.clone(), co, values))), ) }}); } else { builtins.push(quote_spanned! { builtin_attr.span() => { crate::Builtin::new( #name, - &[#(#builtin_arguments),*], #docstring, - |mut args: Vec<crate::Value>, vm: &mut crate::VM| { - #(let #reversed_args = args.pop().unwrap();)* - #fn_name(vm, #(#args),*) - } + #arg_count, + |values| Gen::new(|co| generators::pin_generator(#fn_name(co, values))), ) }}); } |