How AI-assisted candidates are gaming interviews in 2025

Q: What is Cluely and why does it matter for hiring?

Cluely launched in April 2025 with the tagline 'Cheat on Everything.' It is a real-time AI overlay that reads a user's screen during calls -- including job interviews -- and feeds suggested answers. It raised $5.3M seed and a $15M follow-on from a16z. Its rapid growth confirmed that demand for AI-assisted interview cheating is real and commercially significant, not a fringe phenomenon.

Quick takeaway: AI overlays now feed candidates real-time answers during live interviews. The result is an interview that tests memorization speed and AI prompting, not actual capability. The fix is not detection. It is a different kind of interview altogether, one built around evidence that no overlay can generate in advance.

A tool built explicitly for invisible interview cheating raised eight figures in venture funding in 2025. Not because the idea was fringe. Because the market for it turned out to be real.

In April 2025, a company called Cluely launched with the tagline “Cheat on Everything.” The product was a real-time AI overlay, running invisibly on the user’s screen during video calls, reading the conversation and feeding suggested answers in real time. It was designed to be undetectable. The marketing was explicit: use this during your interviews. Seventy thousand people signed up in the first week. The company raised a $5.3M seed round, followed by $15M from Andreessen Horowitz. Annualized revenue reportedly doubled to $7M within weeks of the a16z check. The marketing copy was later scrubbed of its most explicit “cheating” language after public and press scrutiny, but the product remained.

That sequence of events is worth sitting with. A tool whose core use case is deceiving hiring managers during job interviews attracted tens of thousands of users in seven days and enough credibility to close institutional venture capital. That is not a product story. It is a signal about the state of the interview.

What changed

The pre-AI interview was, at its core, a memory and reasoning test. A candidate prepared by recalling past experiences, practiced articulating them, and showed up hoping their real history was strong enough to carry the conversation. The interviewer’s job was to distinguish candidates who had done the thing from candidates who had only talked about the thing.

That asymmetry still exists, but the friction has moved.

Today a candidate can install an invisible overlay before a video call. The overlay reads the interviewer’s questions off the screen, generates fluent, contextually appropriate answers, and displays them for the candidate to read naturally. The candidate’s job is no longer to recall experiences — it is to prompt effectively and read convincingly. A strong Cluely user in a conventional interview is not testing their professional capability. They are testing their prompting skill.

The consequence for hiring managers is practical and immediate. The confident, fluent candidate who gave a perfectly structured answer to every question may have demonstrated nothing beyond the ability to read under pressure. Your gut read on “this person seems sharp” — already unreliable before AI — is worth less now than it has ever been.

Why “detection” is the wrong fix

The intuitive response to AI-assisted cheating is to detect it. Eye-tracking, audio analysis, behavioral markers, AI-detection software. Some products in this space are already building that direction. It is the wrong frame for three reasons.

First, detection escalates. Every detection method exists in an adversarial relationship with the tool it is trying to catch. When the detection method improves, the evasion improves in response. This dynamic is identical to the anti-spam and anti-plagiarism cycles that have played out for two decades: the detection vendor and the cheating tool race each other indefinitely, and the hiring manager is caught in the middle. There is no stable endpoint.

Second, detection is adversarial toward candidates by default. A detection regime positions every candidate as a suspected fraudster until proven innocent. That creates friction, distrust, and legal exposure without delivering a hiring signal. And it will produce false positives — candidates with speech patterns, communication styles, or response latencies that trigger the model — who get flagged for something they did not do.

Third, and most importantly, detection is solving the wrong problem. The real question is not whether the candidate used a tool. The real question is whether the candidate can actually do the job. A candidate who memorized perfect answers from an interview prep website, or who rehearsed with a coach, or who scripted responses with ChatGPT ahead of the call, is doing functionally the same thing as one using an overlay during the call: giving you a performance, not evidence. Detection catches the live tool and misses the prepared script. The frame should not be “did they cheat” but “what evidence did they actually demonstrate.”

The evidence-based interview

Structured interviewing research has answered this question for more than thirty years. Schmidt and Hunter’s 1998 synthesis of 85 years of selection research reported substantially higher predictive validity for structured interviews (.51 vs .38 for unstructured) ¹. The reason is simple: structure forces comparable evidence across candidates, instead of rewarding whoever performed best on the day.

The AI era does not change that principle. It makes it newly urgent.

An evidence-based interview has four components:

Job-specific criteria, written before the first candidate enters. Not generic competencies copied from a job post. Specific: what does strong evidence look like for this role, at this company, at this stage? A candidate who answers a generic “tell me about a time you led a team” can generate a convincing response from any training corpus. A question tied to a specific, unusual scenario for this role narrows the surface area considerably.

Multi-layer probing that depends on what the candidate just said. Ask for a specific example. Then ask how exactly. Then change a variable in the scenario. Then ask what they would do differently. Each question depends on the answer to the previous one. No overlay can predict the third question if it depends on the second answer. Memon, Meissner, and Fraser (2010), in a meta-analytic review of cognitive interview techniques, found that layered follow-up questioning significantly increases the recovery of accurate episodic detail from real memory — without a proportional increase in inaccurate detail ². The technique was developed in forensic interview contexts, but the underlying mechanism applies directly: real memory sustains depth; constructed or read-aloud answers do not.

Live evidence notes, not impressions. During the interview, write down what the candidate actually said — verbatim phrases tied to specific criteria. Not “answered well.” Not “seemed confident.” What did they say, about what specific situation, and does it meet the rubric? DePaulo et al. (2003), in a meta-analysis covering 158 behavioral cues to deception across 120 studies, identified lack of specific detail and lower plausibility as the most consistent verbal markers of non-genuine accounts ³. You do not need to run a deception-detection algorithm. You need to capture what was actually said and check whether the detail is there.

Scoring against a rubric, not a vibe. Strong, mixed, or weak evidence for each criterion. Written before the interview. Consulted during it. The candidate who sounds confident and gives a generic answer gets a weak score on that criterion. The candidate who sounds hesitant but cites a specific situation, with a number, a decision, and an outcome, gets a strong score. The voice in the room is not the data. The captured evidence is.

For the full protocol on building this from scratch, how to run a structured interview covers each step with templates.

Where Recrutador sits

Two overlays now exist. One whispers answers to the candidate. Recrutador sits on the other side of the table and pulls out the truth.

That one-liner is not rhetorical positioning. It describes a real architectural difference.

Recrutador is a Hiring Intelligence Platform with five phases: the Strategist (chat-first consultant) defines the role’s evaluation criteria (the Blueprint); the system generates a job description from those criteria; triages resumes with per-criterion coverage analysis; the live HUD runs a semi-structured interview (every candidate starts from the same probe library, depth adapts per answer); and generates the Hiring Memo with cited evidence per criterion at the end. The HUD phase is where Recrutador’s answer to AI-assisted cheating is most visible — but the platform is the full lifecycle, not just the live session.

During the interview, the HUD listens to the conversation, identifies when an answer is vague or generic, and surfaces the specific follow-up probe that forces the next layer of evidence, based on what the candidate just said. That is the capability that breaks the AI-overlay advantage: an overlay fed from a training corpus can anticipate “tell me about a time you led a team.” It cannot anticipate “you said the budget was cut mid-project — who made that call, and did you agree with it?” because that question did not exist until the candidate said something specific. Recrutador generates the multi-layer probe from the live conversation.

The Integrity Signals feature is built on the same principle. When the live transcript surfaces a pattern worth checking — an abrupt vocabulary shift mid-answer, an unusually long pause before a suspiciously polished response, a term that suddenly appears at a register far above the rest of the conversation — the HUD flags it for the interviewer. It never claims cheating occurred. It never labels the candidate. It never renders a verdict. The interviewer reads the signal and decides. That is the “Augment, never replace” principle made concrete: the platform provides evidence, signals, and structure; the human makes every call.

The tool is transparent and consented. The interviewer brings it to the call. The candidate knows the interview is being assisted. Audio is never stored on any server — a deliberate architecture decision, not a compliance checkbox. That is how the tool stays LGPD-clean and GDPR-clean, and how it stays on the right side of the ethical line that Cluely crossed. The cheating tools brag about being invisible. This one is the opposite. Recrutador is the counter-architecture: transparent where Cluely is hidden, evidence-grounded where Cluely is performance-optimized, and interviewer-augmenting where Cluely is candidate-substituting.

The real cost of a bad hire makes the financial case for why this matters: U.S. Department of Labor estimates put a single bad hire at 30 to 50 percent of first-year salary, with SHRM putting replacement costs at 50 to 200 percent of annual salary. That is the bill you are paying when an interview yields a performance instead of evidence.

What hiring teams should do this quarter

The structural problem is real. The fix does not require a new tool. It requires a different discipline in how interviews are run.

Write evaluation criteria before any candidate enters the process. Not after reviewing resumes. Not during the interview. Before. List the four to six things the role actually requires, and describe in one sentence what strong evidence looks like for each. This alone eliminates the most common failure mode: scoring candidates against an implicit standard that shifts depending on who just walked out the door.

Use a three-layer questioning pattern. For every criterion: ask for a specific example (not a general description). Ask how exactly — probe the detail, the process, the numbers, the people involved. Then change one variable in the scenario and see whether the candidate can reason from first principles or only replay the scripted version. The third question is where prepared answers run out.

Take live evidence notes scored against the rubric. Not impressions. Not adjectives. What did the candidate say, specifically, and does it meet the rubric for that criterion? This takes practice but it is the discipline that makes the interview defensible and the decision readable three weeks later.

Treat Integrity Signals as verification prompts, not verdicts. When a pattern surfaces — a vocabulary register that does not match the rest of the conversation, a pause duration that seems inconsistent with what was asked — treat it as a cue to probe one more layer, not as a finding. The discipline of asking one more follow-up is also the best real-time counter to AI-assisted cheating: no overlay can prepare for a question that depends on the previous answer.

If you want to see the full interview cycle before running it against a real candidate, the 10-minute mock interview lets you walk through a scripted session with a fictional candidate, see the HUD working live, and read the Hiring Memo it generates at the end. You pay only when you run a real one.

For the technique-level companion to this essay — including specific question formulations, the signals that distinguish real answers from rehearsed or AI-fed ones, and how to score the evidence you capture — see how to tell if a candidate is faking answers.

The interview was already an imperfect instrument before 2025. Fluency was already being rewarded over substance. First impressions were already contaminating decisions made from reconstructed memory. AI overlays did not create those problems. They turned up the dial until the problems became impossible to ignore.

The fix is not a detection arms race. It is what the research has recommended for three decades: structured criteria, layered probing, live evidence capture, and a rubric that was written before the first candidate sat down. That methodology was designed for the world where candidates practiced answers. It holds in the world where they get those answers in real time.

Talk to the team and we run your first interview with you.

Frequently asked questions

Can employers detect AI cheating in interviews?

Detection tools exist but are unreliable and invite a cat-and-mouse race with each model release. The more durable fix is an interview design that makes AI assistance irrelevant: multi-layer probing where each question depends on the candidate’s previous answer, live notes scored against pre-defined rubrics, and job-specific criteria that generic AI cannot anticipate. A candidate feeding perfect answers from a hidden tool cannot sustain three layers of follow-up on the same case.

What is Cluely and why does it matter for hiring?

Cluely launched in April 2025 with the tagline “Cheat on Everything.” It is a real-time AI overlay that reads a user’s screen during calls — including job interviews — and feeds suggested answers. It raised $5.3M seed and a $15M follow-on from a16z. Its rapid growth confirmed that demand for AI-assisted interview cheating is real and commercially significant, not a fringe phenomenon.

Is using AI during an interview cheating?

It depends on what is disclosed. A candidate who reads AI-generated text during an interview without disclosure is misrepresenting their real-time reasoning ability. The practical harm is not ethical but predictive: the interview stops measuring what it was designed to measure, and the hire decision is made on faked evidence. The result is the same whether or not the tool was technically “allowed.”

What is an evidence-based interview?

An evidence-based interview is structured around pre-defined, job-specific evaluation criteria. Each question targets a criterion. Follow-ups are layered: ask for a specific example, then probe how exactly, then change the scenario. The interviewer records what the candidate actually said — verbatim — and scores it against a rubric that was written before the first candidate sat down. The output is a documented decision, not a feeling.

How does Recrutador help with AI-assisted interview cheating?

Recrutador sits on the interviewer’s side of the call, not the candidate’s. It covers the full hiring lifecycle — from the Strategist that defines role criteria, through resume triage, to the live HUD that runs the semi-structured interview, to the Hiring Memo generated at the end. During the live interview, the HUD suggests the next follow-up probe based on what the candidate just said, generating the multi-layer questioning that AI-fed answers cannot anticipate. The Integrity Signals feature flags patterns worth verifying — such as vocabulary shifts or unusual pause timing — without ever claiming cheating or labeling the candidate. The interviewer reads the signal and decides. Audio is never stored (a deliberate architecture decision, not a checkbox). The candidate consents to the tool’s presence. The interviewer makes every judgment call. That is the “Augment, never replace” principle in practice.

References

Schmidt, F. L., & Hunter, J. E. (1998). The Validity and Utility of Selection Methods in Personnel Psychology: Practical and Theoretical Implications of 85 Years of Research Findings. Psychological Bulletin, 124(2), 262-274. DOI ↩
Memon, A., Meissner, C. A., & Fraser, J. (2010). The Cognitive Interview: A meta-analytic review and study space analysis of the past 25 years. Psychology, Public Policy, and Law, 16(4), 340-372. DOI ↩
DePaulo, B. M., Lindsay, J. J., Malone, B. E., Muhlenbruck, L., Charlton, K., & Cooper, H. (2003). Cues to deception. Psychological Bulletin, 129(1), 74-118. DOI ↩