Tool Use Expert
Role Description
Mercor is partnering with an AI research organization to engage independent evaluation contractors who can assess agentic tool-use quality—specifically whether a model calls search appropriately and rewrites user prompts into effective queries. This short term engagement focuses on high-accuracy judgments, clear rationales, and consistency across a large volume of model–rater traces. The work is well-suited for experts in information retrieval, prompt engineering, and product QA who prefer remote, asynchronous projects.
Key Responsibilities
- Review model interaction logs and decide if invoking the search tool was appropriate given the initial prompt and context.
- Evaluate the rewritten search query for clarity, specificity, and fidelity to the user’s intent.
- Provide concise, evidence-based rationales tied to rubric criteria; label edge cases and ambiguities.
- Score query quality (e.g., intent capture, keyword selection, operator use) and overall tool-use timing.
- Calibrate against gold examples; surface rubric gaps and propose improvements.
- Track decisions in a task portal; maintain high inter-rater agreement and throughput targets.
- Flag potentially sensitive content according to provided safety guidelines.
Qualifications
- Excellent written communication; able to justify decisions succinctly with references to instructions/rubrics.
- Meticulous attention to detail; comfort working independently with minimal oversight.
- Nice to have: familiarity with annotation tools, basic scripting (Python/SQL), and multilingual proficiency.
Requirements
- Remote and asynchronous—contractors set their own hours.
- Expected commitment: ~10–20 hours/week; flexible, project-based workload.
- Duration: initial 6–10 weeks with potential for additional task batches.
- Resource sharing and best-practice guides provided; support team available for inquiries.
Compensation & Contract Terms
- Compensation for completed work: estimated $45/hour equivalent or calibrated per-task rates based on complexity and geography (final rates confirmed before work begins).
- Payments for services rendered via platform (e.g., weekly through Stripe Connect, where available).
- Independent contractor engagement; project-based statement of work; no employment relationship or benefits implied.
Application Process
- Submit a brief profile (CV or LinkedIn) and note relevant evaluation/search experience.
- Complete a short skills check and sample grading exercise to demonstrate rubric alignment.
- If matched, you’ll sign a simple contract/NDA and receive task access details.
- Typical follow-up within a few days after the sample review.
Company Description
- Mercor is a talent marketplace connecting experts with leading AI labs and research groups.
- Backed by Benchmark, General Catalyst, Adam D’Angelo, Larry Summers, and Jack Dorsey.
- Thousands of professionals across domains—research, engineering, law, and creative—partner with Mercor on frontier AI projects.