Status: Planned (Rating-based disputes under design)
The real problem in open-source ecosystems
Open-source projects depend on external contributors.But evaluating contributions fairly is one of the hardest unsolved problems in OSS. Most platforms struggle to answer simple but critical questions:
- Was this pull request actually good?
- Did it improve the project long-term?
- How much should this contribution be rewarded?
- Should this code be merged, revised, or rejected?
Why current evaluation methods fail
1. Quantitative metrics don’t measure quality Common signals like:- lines of code,
- number of commits,
- issue count,
- activity frequency,
2. Maintainer-only evaluation does not scale Relying solely on maintainers:
- creates bottlenecks,
- introduces bias,
- burns out core teams,
- discourages contributors.
- judges,
- gatekeepers,
- and conflict managers.
3. Pure AI-based evaluation breaks in real-world codebases Some platforms experimented with AI-based PR evaluation. A real example:
- Platforms like OnlyDust tested automated or AI-assisted evaluation of contributions.
- While useful for surface-level analysis, these systems failed when:
- evaluating smart contracts,
- judging protocol-level logic,
- understanding security implications,
- reviewing unfamiliar languages or paradigms.
- misjudge intent,
- misunderstand context,
- fail at domain-specific reasoning,
- and confidently score incorrect or risky code.
Why human judgment is unavoidable
Code quality is not just correctness. It includes:- architectural fit,
- security assumptions,
- readability,
- long-term maintainability,
- alignment with project goals.
The missing layer: decentralized, incentivized code evaluation
Justly introduces a new primitive:distributed human evaluation with economic incentives. Instead of:
- one maintainer deciding,
- or a black-box AI scoring,
- multiple independent reviewers,
- clear evaluation criteria,
- economic stakes to discourage bad judgments.
How Justly works for code evaluation
Typical flow:- A contributor submits a pull request.
- The PR enters an evaluation phase.
- Jurors stake stablecoins (e.g. USDC) to participate.
- Jurors review:
- code quality,
- correctness,
- security implications,
- adherence to project standards.
- Each juror assigns a quality score or verdict.
- Scores are aggregated.
- Outcomes are executed automatically:
- merge,
- request changes,
- reject,
- distribute rewards.
Example: smart contract contribution
Scenario- A contributor submits a smart contract PR.
- The code compiles and passes tests.
- An AI reviewer gives it a high score.
- Maintainers feel unsure about edge cases and security assumptions.
- Jurors with relevant expertise review the contract.
- They evaluate:
- attack surfaces,
- economic exploits,
- logic soundness.
- The PR receives a weighted quality score.
- Rewards and merge decisions reflect real risk and value.
- blind trust in automation,
- single-point human failure.
Example: OSS reward distribution
Problem An OSS platform has a fixed monthly reward pool.Multiple contributors submit PRs of varying quality. Without Justly:
- rewards are distributed arbitrarily,
- maintainers decide behind closed doors,
- contributors feel underpaid or ignored.
- each merged PR is scored by jurors,
- rewards scale with contribution quality,
- incentives align with long-term project health.
Why stablecoin staking matters
Using stablecoins (like USDC):- removes token volatility,
- avoids speculation,
- keeps incentives neutral.
- accuracy,
- alignment with consensus,
- honest evaluation.
Benefits for OSS platforms
For maintainers- Reduced evaluation burden.
- Less conflict with contributors.
- More consistent decisions.
- Better security outcomes.
- Fair recognition of work.
- Transparent evaluation.
- Clear incentive alignment.
- Higher code quality.
- Reduced gaming of metrics.
- Stronger long-term sustainability.
Beyond pull requests
The same mechanism applies to:- issue prioritization,
- bug severity scoring,
- grant allocation,
- retroactive funding,
- roadmap impact evaluation.
The takeaway
Open-source fails when:- effort is rewarded instead of impact,
- evaluation is opaque,
- incentives are misaligned.
- a transparent process,
- backed by economic accountability,
- scalable across ecosystems.
Code quality evaluation is expected to leverage rating-based disputes and may utilize Tier 2 or higher to ensure sufficient diversity of judgment. See Dispute tiers.