Commit-hook integration
biston is designed to scan a whole repository, but when you wire it into a pre-commit hook you usually don’t want every unrelated pair in the codebase to fail a commit — only clones that involve the files the committer touched. The --focus-args / --files / --files-from flags narrow the report to those files while still scanning the whole tree, so cross-file clones between a changed file and the rest of the repo are still detected.
With pre-commit / prek
If you use the pre-commit framework (or prek), drop this into .pre-commit-config.yaml:
- repo: https://github.com/mojzis/biston
rev: v0.5.0
hooks:
- id: biston
That wires up biston scan --focus-args, which receives staged Python files as positional arguments and narrows the report to clones involving any of them. An empty staged set (no Python files touched) passes silently. A companion biston-stats hook is available for CI gating on pair counts.
Heads up: if you write your own
localhook definition for biston instead of using the repo above, you must setrequire_serial: true. Without it pre-commit may batch staged files into parallel invocations, and cross-file clones spanning batches will be silently missed — defeating the point of running biston as a hook.
The shell recipe
For raw .git/hooks/pre-commit scripts, or for CI integration outside of pre-commit:
git diff --name-only --diff-filter=ACM -- '*.py' \
| biston scan --files-from - .
What each piece does:
git diff --name-only --diff-filter=ACM -- '*.py'— list Python files that are Added, Copied, or Modified in the current index (swap inHEAD~1..HEADor--cacheddepending on hook timing).biston scan --files-from -— read that list from stdin (one path per line). Paths are resolved relative to the current working directory.- The positional
.— root of the scan. biston still discovers and parses everything under it; the focus list only restricts which pairs make it into the report.
An empty list (no Python files changed) correctly emits no pairs — the hook passes silently. That’s why --files-from - is the right shape for hooks: --files $(git diff --name-only) silently expands to nothing when the diff is empty, which reverts to a full-repo scan and can trip the hook on pre-existing clones unrelated to the commit.
Semantics
Given a repo with clones A ↔ B (inside the committer’s change) and C ↔ D (elsewhere):
| Invocation | Pairs emitted |
|---|---|
biston scan . | A↔B, C↔D |
biston scan --files A.py . | A↔B (and any A↔X with X anywhere in the repo) |
biston scan --focus-args A.py | A↔B (same as --files A.py .) |
biston scan --files-from - . with empty stdin | (none) |
biston scan --focus-args (no positionals) | (none) |
The three focus modes — --files, --files-from, and --focus-args — are mutually exclusive; pass only one per invocation.
A focus path that can’t be resolved (e.g. a file deleted in the same changeset) is warned about and skipped, not treated as a fatal error — the scan continues with whatever focus paths did resolve.
Tips
- Use
--diff-filter=ACMto avoid passing deleted files. biston tolerates them, but it’s clearer at the hook level. - Combine with
--format sarifif your CI wants to upload findings as annotations — the SARIF output is filtered the same way. statssupports the same flags, so you can gate a hook on a numeric threshold:biston stats --files-from - --format json . | jq '.clone_pairs'.- For a dry run, drop
--files-fromto see the full-repo report and confirm the focused scan isn’t hiding surprises.
Repeated --files
For one-off use outside a hook, --files is repeatable:
biston scan --files src/auth.py --files src/session.py .
Each --files takes a single path; repeat the flag to add more. This form conflicts with --files-from — pick one per invocation.