Skip to content

Commit

Permalink
Autoharness Subcommand (#3874)
Browse files Browse the repository at this point in the history
This PR introduces an initial implementation for the `autoharness`
subcommand and a book chapter describing the feature. I recommend
reading the book chapter before reviewing the code (or proceeding
further in this PR description).

## Callouts
`--harness` is to manual harnesses what `--include-function` and
`--exclude-function` are to autoharness; both allow the user to filter
which harnesses/functions get verified. Their implementation is
different, however--`--harness` is a driver-only flag, i.e., we still
compile every harness, but then only call CBMC on the ones specified in
`--harness`. `--include-function` and `--exclude-function` get passed to
the compiler. I made this design choice to try to be more aggressive
about improving compiler performance--if a user only asks to verify one
function and we go try to codegen a thousand, the compiler will take
much longer than it needs to. I thought this more aggressive
optimization was warranted given that crates are likely to have far many
more functions eligible for autoverification than manual Kani harnesses.

(See also the limitations in the book chapter).

## Testing
Besides the new tests in this PR, I also ran against
[s2n-quic](https://github.com/aws/s2n-quic) to confirm that the
subcommand works on larger crates.

Towards #3832

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 and MIT licenses.
  • Loading branch information
carolynzech authored Feb 11, 2025
1 parent 4f78926 commit 8b2ec77
Show file tree
Hide file tree
Showing 55 changed files with 1,516 additions and 101 deletions.
1 change: 1 addition & 0 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
- [Reference](./reference.md)
- [Attributes](./reference/attributes.md)
- [Experimental features](./reference/experimental/experimental-features.md)
- [Automatic Harness Generation](./reference/experimental/autoharness.md)
- [Coverage](./reference/experimental/coverage.md)
- [Stubbing](./reference/experimental/stubbing.md)
- [Contracts](./reference/experimental/contracts.md)
Expand Down
84 changes: 84 additions & 0 deletions docs/src/reference/experimental/autoharness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Automatic Harness Generation

Recall the harness for `estimate_size` that we wrote in [First Steps](../../tutorial-first-steps.md):
```rust
{{#include ../../tutorial/first-steps-v1/src/lib.rs:kani}}
```

This harness first declares a local variable `x` using `kani::any()`, then calls `estimate_size` with argument `x`.
Many proof harnesses follow this predictable format—to verify a function `foo`, we create arbitrary values for each of `foo`'s arguments, then call `foo` on those arguments.

The `autoharness` subcommand leverages this observation to automatically generate and run harnesses.
Kani scans the crate for functions whose arguments all implement the `kani::Arbitrary` trait, generates harnesses for them, then runs them.
These harnesses are internal to Kani--i.e., Kani does not make any changes to your source code.

## Usage
Run either:
```
# cargo kani autoharness -Z unstable-options
```
or
```
# kani autoharness -Z unstable-options <FILE>
```

If Kani detects that all of a function `foo`'s arguments implement `kani::Arbitrary`, it will generate and run a `#[kani::proof]` harness, which prints:

```
Autoharness: Checking function foo against all possible inputs...
<VERIFICATION RESULTS>
```

However, if Kani detects that `foo` has a [contract](./contracts.md), it will instead generate a `#[kani::proof_for_contract]` harness and verify the contract:
```
Autoharness: Checking function foo's contract against all possible inputs...
<VERIFICATION RESULTS>
```

Kani generates and runs these harnesses internally—the user only sees the verification results.

The `autoharness` subcommand has options `--include-function` and `--exclude-function` to include and exclude particular functions.
These flags look for partial matches against the fully qualified name of a function.

For example, if a module `my_module` has many functions, but we are only interested in `my_module::foo` and `my_module::bar`, we can run:
```
cargo run autoharness -Z unstable-options --include-function foo include-function bar
```
To exclude `my_module` entirely, run:
```
cargo run autoharness -Z unstable-options --exclude-function my_module
```

## Example
Using the `estimate_size` example from [First Steps](../../tutorial-first-steps.md) again:
```rust
{{#include ../../tutorial/first-steps-v1/src/lib.rs:code}}
```

We get:

```
# cargo kani autoharness -Z unstable-options
Autoharness: Checking function estimate_size against all possible inputs...
RESULTS:
Check 3: estimate_size.assertion.1
- Status: FAILURE
- Description: "Oh no, a failing corner case!"
[...]
Verification failed for - estimate_size
Complete - 0 successfully verified functions, 1 failures, 1 total.
```

## Request for comments
This feature is experimental and is therefore subject to change.
If you have ideas for improving the user experience of this feature,
please add them to [this GitHub issue](https://github.com/model-checking/kani/issues/3832).

## Limitations
Kani will only generate an automatic harness for a function if it can determine that all of the function's arguments implement Arbitrary.
It does not attempt to derive/implement Arbitrary for any types, even if those types could implement Arbitrary.

If a function contains a loop with a loop contract, Kani will detect the presence of a loop contract and verify that contract.
If, however, the loop does not have a contract, then there is currently no way to specify an unwinding bound for the function, meaning that Kani may hang as it tries to unwind the loop.
We recommend using the `--exclude-function` option to exclude any functions that have this issue (or `--harness-timeout` to bail after attempting verification for some amount of time).
9 changes: 9 additions & 0 deletions kani-compiler/src/args.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ pub enum ReachabilityType {
PubFns,
/// Start the cross-crate reachability analysis from all *test* (i.e. `#[test]`) harnesses in the local crate.
Tests,
/// Start the cross-crate reachability analysis from all functions in the local crate.
/// Currently, this mode is only used for automatic harness generation.
AllFns,
}

/// Command line arguments that this instance of the compiler run was called
Expand Down Expand Up @@ -88,6 +91,12 @@ pub struct Arguments {
/// Print the final LLBC file to stdout.
#[clap(long)]
pub print_llbc: bool,
/// If we are running the autoharness subcommand, the functions to autoharness
#[arg(long = "autoharness-include-function", num_args(1))]
pub autoharness_included_functions: Vec<String>,
/// If we are running the autoharness subcommand, the functions to exclude from autoverification
#[arg(long = "autoharness-exclude-function", num_args(1))]
pub autoharness_excluded_functions: Vec<String>,
}

#[derive(Debug, Clone, Copy, AsRefStr, EnumString, VariantNames, PartialEq, Eq)]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ impl CodegenBackend for LlbcCodegenBackend {
units.store_modifies(&modifies_instances);
units.write_metadata(&queries, tcx);
}
ReachabilityType::Tests => todo!(),
ReachabilityType::Tests | ReachabilityType::AllFns => todo!(),
ReachabilityType::None => {}
ReachabilityType::PubFns => {
let unit = CodegenUnit::default();
Expand Down
6 changes: 4 additions & 2 deletions kani-compiler/src/codegen_cprover_gotoc/compiler_interface.rs
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ impl CodegenBackend for GotocCodegenBackend {
let reachability = queries.args().reachability_analysis;
let mut results = GotoCodegenResults::new(tcx, reachability);
match reachability {
ReachabilityType::Harnesses => {
ReachabilityType::AllFns | ReachabilityType::Harnesses => {
let mut units = CodegenUnits::new(&queries, tcx);
let mut modifies_instances = vec![];
let mut loop_contracts_instances = vec![];
Expand Down Expand Up @@ -375,7 +375,9 @@ impl CodegenBackend for GotocCodegenBackend {
// Print compilation report.
results.print_report(tcx);

if reachability != ReachabilityType::Harnesses {
if reachability != ReachabilityType::Harnesses
&& reachability != ReachabilityType::AllFns
{
// In a workspace, cargo seems to be using the same file prefix to build a crate that is
// a package lib and also a dependency of another package.
// To avoid overriding the metadata for its verification, we skip this step when
Expand Down
185 changes: 161 additions & 24 deletions kani-compiler/src/kani_middle/codegen_units.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,11 @@
//! according to their stub configuration.
use crate::args::ReachabilityType;
use crate::kani_middle::attributes::is_proof_harness;
use crate::kani_middle::metadata::{gen_contracts_metadata, gen_proof_metadata};
use crate::kani_middle::attributes::{KaniAttributes, is_proof_harness};
use crate::kani_middle::kani_functions::{KaniIntrinsic, KaniModel};
use crate::kani_middle::metadata::{
gen_automatic_proof_metadata, gen_contracts_metadata, gen_proof_metadata,
};
use crate::kani_middle::reachability::filter_crate_items;
use crate::kani_middle::resolve::expect_resolve_fn;
use crate::kani_middle::stubbing::{check_compatibility, harness_stub_map};
Expand All @@ -20,8 +23,8 @@ use rustc_middle::ty::TyCtxt;
use rustc_session::config::OutputType;
use rustc_smir::rustc_internal;
use stable_mir::CrateDef;
use stable_mir::mir::mono::Instance;
use stable_mir::ty::{FnDef, IndexedVal, RigidTy, TyKind};
use stable_mir::mir::{TerminatorKind, mono::Instance};
use stable_mir::ty::{FnDef, GenericArgKind, GenericArgs, IndexedVal, RigidTy, TyKind};
use std::collections::{BTreeMap, BTreeSet, HashMap, HashSet};
use std::fs::File;
use std::io::BufWriter;
Expand Down Expand Up @@ -57,26 +60,66 @@ pub struct CodegenUnit {
impl CodegenUnits {
pub fn new(queries: &QueryDb, tcx: TyCtxt) -> Self {
let crate_info = CrateInfo { name: stable_mir::local_crate().name.as_str().into() };
if queries.args().reachability_analysis == ReachabilityType::Harnesses {
let base_filepath = tcx.output_filenames(()).path(OutputType::Object);
let base_filename = base_filepath.as_path();
let harnesses = filter_crate_items(tcx, |_, instance| is_proof_harness(tcx, instance));
let all_harnesses = harnesses
.into_iter()
.map(|harness| {
let metadata = gen_proof_metadata(tcx, harness, &base_filename);
(harness, metadata)
})
.collect::<HashMap<_, _>>();
let base_filepath = tcx.output_filenames(()).path(OutputType::Object);
let base_filename = base_filepath.as_path();
let args = queries.args();
match args.reachability_analysis {
ReachabilityType::Harnesses => {
let all_harnesses = get_all_manual_harnesses(tcx, base_filename);
// Even if no_stubs is empty we still need to store rustc metadata.
let units = group_by_stubs(tcx, &all_harnesses);
validate_units(tcx, &units);
debug!(?units, "CodegenUnits::new");
CodegenUnits { units, harness_info: all_harnesses, crate_info }
}
ReachabilityType::AllFns => {
let mut all_harnesses = get_all_manual_harnesses(tcx, base_filename);
let mut units = group_by_stubs(tcx, &all_harnesses);
validate_units(tcx, &units);

// Even if no_stubs is empty we still need to store rustc metadata.
let units = group_by_stubs(tcx, &all_harnesses);
validate_units(tcx, &units);
debug!(?units, "CodegenUnits::new");
CodegenUnits { units, harness_info: all_harnesses, crate_info }
} else {
// Leave other reachability type handling as is for now.
CodegenUnits { units: vec![], harness_info: HashMap::default(), crate_info }
let kani_fns = queries.kani_functions();
let kani_harness_intrinsic =
kani_fns.get(&KaniIntrinsic::AutomaticHarness.into()).unwrap();
let kani_any_inst = kani_fns.get(&KaniModel::Any.into()).unwrap();

let verifiable_fns = filter_crate_items(tcx, |_, instance: Instance| -> bool {
// If the user specified functions to include or exclude, only allow instances matching those filters.
let user_included = if !args.autoharness_included_functions.is_empty() {
fn_list_contains_instance(&instance, &args.autoharness_included_functions)
} else if !args.autoharness_excluded_functions.is_empty() {
!fn_list_contains_instance(&instance, &args.autoharness_excluded_functions)
} else {
true
};
user_included
&& is_eligible_for_automatic_harness(tcx, instance, *kani_any_inst)
});
let automatic_harnesses = get_all_automatic_harnesses(
tcx,
verifiable_fns,
*kani_harness_intrinsic,
base_filename,
);
// We generate one contract harness per function under contract, so each harness is in its own unit,
// and these harnesses have no stubs.
units.extend(
automatic_harnesses
.keys()
.map(|harness| CodegenUnit {
harnesses: vec![*harness],
stubs: HashMap::default(),
})
.collect::<Vec<_>>(),
);
all_harnesses.extend(automatic_harnesses);
// No need to validate the units again because validation only checks stubs, and we haven't added any stubs.
debug!(?units, "CodegenUnits::new");
CodegenUnits { units, harness_info: all_harnesses, crate_info }
}
_ => {
// Leave other reachability type handling as is for now.
CodegenUnits { units: vec![], harness_info: HashMap::default(), crate_info }
}
}
}

Expand All @@ -94,7 +137,15 @@ impl CodegenUnits {
/// We flag that the harness contains usage of loop contracts.
pub fn store_loop_contracts(&mut self, harnesses: &[Harness]) {
for harness in harnesses {
self.harness_info.get_mut(harness).unwrap().has_loop_contracts = true;
let metadata = self.harness_info.get_mut(harness).unwrap();
metadata.has_loop_contracts = true;
// If we're generating this harness automatically and we encounter a loop contract,
// ensure that the HarnessKind is updated to be a contract harness
// targeting the function to verify.
if metadata.is_automatically_generated {
metadata.attributes.kind =
HarnessKind::ProofForContract { target_fn: metadata.pretty_name.clone() }
}
}
}

Expand Down Expand Up @@ -252,3 +303,89 @@ fn apply_transitivity(tcx: TyCtxt, harness: Harness, stubs: Stubs) -> Stubs {
}
new_stubs
}

/// Fetch all manual harnesses (i.e., functions provided by the user) and generate their metadata
fn get_all_manual_harnesses(
tcx: TyCtxt,
base_filename: &Path,
) -> HashMap<Harness, HarnessMetadata> {
let harnesses = filter_crate_items(tcx, |_, instance| is_proof_harness(tcx, instance));
harnesses
.into_iter()
.map(|harness| {
let metadata = gen_proof_metadata(tcx, harness, &base_filename);
(harness, metadata)
})
.collect::<HashMap<_, _>>()
}

/// For each function eligible for automatic verification,
/// generate a harness Instance for it, then generate its metadata.
/// Note that the body of each harness instance is still the dummy body of `kani_harness_intrinsic`;
/// the AutomaticHarnessPass will later transform the bodies of these instances to actually verify the function.
fn get_all_automatic_harnesses(
tcx: TyCtxt,
verifiable_fns: Vec<Instance>,
kani_harness_intrinsic: FnDef,
base_filename: &Path,
) -> HashMap<Harness, HarnessMetadata> {
verifiable_fns
.into_iter()
.map(|fn_to_verify| {
// Set the generic arguments of the harness to be the function it is verifying
// so that later, in AutomaticHarnessPass, we can retrieve the function to verify
// and generate the harness body accordingly.
let harness = Instance::resolve(
kani_harness_intrinsic,
&GenericArgs(vec![GenericArgKind::Type(fn_to_verify.ty())]),
)
.unwrap();
let metadata = gen_automatic_proof_metadata(tcx, &base_filename, &fn_to_verify);
(harness, metadata)
})
.collect::<HashMap<_, _>>()
}

/// Determine whether `instance` is eligible for automatic verification.
fn is_eligible_for_automatic_harness(tcx: TyCtxt, instance: Instance, any_inst: FnDef) -> bool {
// `instance` is ineligble if it is a harness or has an nonexistent/empty body
if is_proof_harness(tcx, instance) || !instance.has_body() {
return false;
}
let body = instance.body().unwrap();

// `instance` is ineligble if it is an associated item of a Kani trait implementation,
// or part of Kani contract instrumentation.
// FIXME -- find a less hardcoded way of checking the former condition (perhaps upstream PR to StableMIR).
if instance.name().contains("kani::Arbitrary")
|| instance.name().contains("kani::Invariant")
|| KaniAttributes::for_instance(tcx, instance)
.fn_marker()
.is_some_and(|m| m.as_str() == "kani_contract_mode")
{
return false;
}

// Each non-generic argument of `instance`` must implement Arbitrary.
body.arg_locals().iter().all(|arg| {
let kani_any_body =
Instance::resolve(any_inst, &GenericArgs(vec![GenericArgKind::Type(arg.ty)]))
.unwrap()
.body()
.unwrap();
if let TerminatorKind::Call { func, .. } = &kani_any_body.blocks[0].terminator.kind {
if let Some((def, args)) = func.ty(body.arg_locals()).unwrap().kind().fn_def() {
return Instance::resolve(def, &args).is_ok();
}
}
false
})
}

/// Return whether the name of `instance` is included in `fn_list`.
/// If `exact = true`, then `instance`'s exact, fully-qualified name must be in `fn_list`; otherwise, `fn_list`
/// can have a substring of `instance`'s name.
fn fn_list_contains_instance(instance: &Instance, fn_list: &[String]) -> bool {
let pretty_name = instance.name();
fn_list.contains(&pretty_name) || fn_list.iter().any(|fn_name| pretty_name.contains(fn_name))
}
2 changes: 2 additions & 0 deletions kani-compiler/src/kani_middle/kani_functions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ pub enum KaniIntrinsic {
CheckedAlignOf,
#[strum(serialize = "CheckedSizeOfIntrinsic")]
CheckedSizeOf,
#[strum(serialize = "AutomaticHarnessIntrinsic")]
AutomaticHarness,
#[strum(serialize = "IsInitializedIntrinsic")]
IsInitialized,
#[strum(serialize = "ValidValueIntrinsic")]
Expand Down
Loading

0 comments on commit 8b2ec77

Please sign in to comment.