I spent part of my early career at Bell Labs in a Center of Excellence focused on discovering new ways to solve business problems. We were not optimizing existing workflows. We were experimenting, building and learning. We helped pioneer concepts such as continuous auditing, applied early machine learning algorithms to customer churn prediction, and chased weak signals others overlooked. The environment rewarded curiosity and innovation above all else.
I have been thinking about that experience as organizations race toward generative and agentic AI.
Today, AI agents are being deployed across analytics, software development, customer service, operations and decision-making. And while “agents” mean different things to different people, I keep hearing the same reassurance from leaders: “There will still be a human in the loop.”
In my own forthcoming TDWI research, when respondents were asked which mitigation capabilities their company considers non-negotiable before scaling generative or agentic AI, selecting up to three responses, the top answer was human oversight, cited by approximately half of respondents.1 It is the right instinct. But that phrase is everywhere, and it is worth examining more fully.
Human-in-the-Loop is not Human-on-the-Loop
Many discussions treat “human in the loop” as a single concept. It is not. There are meaningfully different models, and they create very different cognitive roles for the humans involved.
In a true human-in-the-loop (HITL) system, the human is a necessary part of the execution path. The system cannot finalize an important decision without explicit human involvement. Human judgment is required. Technically speaking, the human’s contribution acts as a gate: a necessary step without which the process cannot proceed.2
Human-on-the-loop (HOTL) is different. Here, AI systems act autonomously within predefined parameters. Humans monitor behavior, review outputs or intervene only when anomalies appear or thresholds are crossed.3 They are watching, not deciding.
The distinction matters because the two models create fundamentally different relationships between people and work. Supervising a process is different cognitively from participating in it. One builds judgment. The other monitors for exceptions.
‘In the Loop’ May Not Mean What We Think
Here is the more uncomfortable problem: Even formal HITL structures do not automatically preserve genuine human judgment. Research has documented this for decades.
The phenomenon known as automation bias; i.e., the tendency to defer to a machine’s output rather than independently evaluate it has been observed across multiple industries.4 When an AI system presents a confident recommendation, humans consistently tend to accept it, even when it is wrong. Research has found that AI errors significantly compromised human decision-making even in cases where the human had initially reached the correct conclusion on their own. The AI’s confident error overrode sound human judgment.5
A 2025 study of 319 knowledge workers by researchers at Microsoft and Carnegie Mellon University, reached a related conclusion: Higher confidence in generative AI is associated with less critical thinking.6 The seemingly apparent authority of AI outputs can dull the skepticism that makes human oversight meaningful.
This suggests that organizations designing HITL systems face another challenge than simply keeping humans in the process. They must actively design for genuine cognitive engagement or risk creating oversight roles that are human in name only.
The HOTL Problem: Oversight Without Knowledge
Human-on-the-loop arrangements carry their own risks, particularly when the humans monitoring AI systems lack deep expertise in the domain being automated.
Consider an organization that deploys agentic AI to handle all customer service interactions, assigning humans to monitor performance and escalate edge cases. Who should those monitors be? Someone without call center experience cannot effectively evaluate whether an agent is mishandling a frustrated customer, missing a retention opportunity or violating a nuanced policy. Meaningful oversight requires the contextual knowledge that comes from having done the work. If you stop doing it the knowledge fades.
Research on what human factors scientists call the “out-of-the-loop” problem has long established that passive monitoring degrades situational awareness over time.4 When people are removed from active execution, they lose the ability to catch the errors they were assigned to catch. The oversight role becomes, over time, an empty one.
Two Futures, One Decision
Organizations are making choices right now (perhaps without realizing it) about what kind of human work they will ultimately create.
In one future, AI handles execution and humans supervise outputs. Efficiency improves. Costs may fall. But the knowledge workers who remain are gradually reduced to validators of work they may have had no hand in shaping. Their judgment weakens because they don’t exercise it. Institutional knowledge becomes harder to build and harder to pass on. This is not a path for success.
In another future, AI expands the range of what humans can explore. Analysts test more hypotheses. Leaders stress-test more strategic options. Researchers surface patterns that no individual could have found alone. The AI accelerates/augments human thinking rather than replacing it. The assembly line mentality is not in force.
Both futures can be described as keeping humans in the loop. Only one of them keeps humans genuinely in the work.
The Leadership Question
The organizations that navigate this well will be the ones whose leaders stop treating HITL as a compliance checkbox and start treating it as a design decision.
That means asking: Where in our AI implementations are humans actively reasoning and where are they passively reviewing? Are we building systems that develop human expertise over time, or erode it? When our AI systems are wrong (and they will be) do we have people in place with the knowledge to recognize that?
The goal of human-in-the-loop should not simply be to keep humans nominally present. It should be to keep them engaged in thinking, exploration and the development of judgment itself.
That is not an AI design question. It is a leadership one.
Sources
1 Halper, F. Data Foundations for AI. TDWI Blueprint Report, forthcoming. Survey question: “Which mitigation capabilities does your company consider non-negotiable before scaling generative or agentic AI?” Respondents could select up to three responses. tdwi.org/research
2 Tóth, B., et al. “Constitutive vs. Corrective: A Causal Taxonomy of Human Runtime Involvement in AI Systems.” arXiv, 2025. arxiv.org/abs/2603.19213
3 Durham, D., as cited in Serco. “Human in the Loop vs. Human on the Loop: Navigating the Future of AI.” Serco Tech and Innovation Podcast, 2025. serco.com
4 Parasuraman, R., and Manzey, D.H. “Complacency and Bias in Human Use of Automation: An Attentional Integration.” Human Factors, vol. 52, no. 3, 2010, pp. 381–410. Accessed via https://www.scribd.com/document/800797485/Parasuraman-Manzey-2010-Complacency-and-Bias-in-Human-Use-of-Automation-An-Attentional-Integration
5 Fernández-Martínez, C., et al. “The Impact of AI Errors in a Human-in-the-Loop Process.” Cognitive Research: Principles and Implications, 2024. pmc.ncbi.nlm.nih.gov/articles/PMC10772030
6 Lee, H., et al. “The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers.” Proceedings of CHI ’25, ACM, 2025. dl.acm.org | Free version: microsoft.com/en-us/research





