Paying bounties for links to AI-related evidence
Cross posting from the Bountied Rationality Facebook group:
Here is a Google Doc that lists 21 important beliefs that Palisade Research has about AI. For each belief, we're looking for the strongest evidence that exists in favor of that idea, and the strongest evidence that exists against it. We'll award at least $20 for the best evidence in favor, and at least $20 for the best evidence against each idea. We'll use our discretion for what we consider the "best" evidence, but the kind of thing we're looking for includes empirical research or convincing arguments. Empirical research, or arguments clearly backed by empirical observations, will be preferred over pure arguments.
To submit a piece of evidence, you can either comment here, making it clear which specific idea(s) you're giving evidence for, or you can add a comment to the linked document. A piece of evidence should include a link, should be clearly associated with a specific idea, and should include a short sentence about how the evidence applies to that idea.
For example, you might write a comment on "a strategic AI system will aim to appear convincingly aligned with human goals, or incapable of harming humans, whether it really is or not.", that includes a link to a paper on AI Sandbagging (e.g. "AI Sandbagging: Language Models can Strategically Underperform on Evaluations"), with a sentence like "This work on AI sandbagging shows that existing AI systems already strategically underperform when they can tell they are being evaluated."
In addition to the base $20 bonus for the best evidence on each point, we'll also give bonuses of $50 for pieces of evidence that we think are especially strong. We'll give at least 4 of these bonuses, and up to 20 depending on our subjective sense of the quality of submissions.
So in total, we're offering at least 21 * 2 * $20 + 4 * $50 = $1040, and up to 21 * 2 * $20 + 20 * $50 = $1840 in bounties.
Max bounty: $500 per person. All bounties paid via PayPal. Tentative deadline is October 1.
kip likes this.
[object Object]
in reply to Ben Weinstein-Raun • • •like this
Ben Weinstein-Raun and kip like this.