Managing AI Projects and Vendors
The graveyard of enterprise AI is littered with projects that looked promising in discovery, started well in pilot, and then failed at scale. McKinsey estimates that fewer than 20% of AI pilots are eventually deployed in production at meaningful scale. Understanding why AI projects fail — and what management practices distinguish the successful from the abandoned — is one of the highest-leverage capabilities an executive can develop. This module focuses on the practical realities of managing AI projects and the vendors who support them.
Why AI Projects Differ from Software Projects
Leaders who have extensive experience managing software development projects frequently underestimate how different AI project management is. The mental models that work for traditional software — clear requirements, deterministic outputs, testable functionality, linear progress from specification to delivery — systematically mislead when applied to AI.
Non-Deterministic Outputs
Traditional software, given the same input, produces the same output every time. This predictability makes specification, testing, and acceptance criteria straightforward. AI systems are fundamentally probabilistic: they produce outputs based on learned statistical patterns, and those outputs vary — sometimes subtly, sometimes significantly — across runs, time periods, and input distributions. You cannot test an AI system against a fixed specification in the way you test software. Evaluation requires statistical thinking: how accurate is the system across a representative sample? What is the distribution of errors? How does performance degrade at the tails of the input distribution?
The practical implication is that "done" means something different in AI projects. A feature in a traditional software project is done when it works as specified. An AI model is never done in the same sense — it achieves some level of performance across some input distribution, and the questions are always: is this performance good enough for the use case? Is it stable? How does it behave on cases it has not seen? These are probabilistic judgments, not binary pass/fail tests, and leaders who insist on traditional software acceptance criteria for AI systems create impossible situations for their teams.
Data Dependency
Traditional software is primarily limited by engineering hours — more engineers, faster delivery. AI projects are primarily limited by data availability, quality, and labeling. A project that appears straightforwardly scoped can stall for months waiting for data that turns out to be unavailable, inaccessible due to privacy constraints, of insufficient quality to train on, or unlabeled and requiring expensive expert annotation. Data discovery — honestly assessing what data is available and in what condition — is one of the most critical early activities in any AI project, and it is systematically underinvested.
Evaluation Challenges
Defining what "good" looks like is harder for AI systems than for traditional software. An AI model that achieves 92% accuracy in classifying customer support tickets by category might seem impressive — but if the business requirement is routing tickets to the right team, and routing errors on the 8% generate disproportionate customer dissatisfaction, accuracy is the wrong metric. Designing evaluation frameworks that connect model performance metrics to business outcome metrics requires collaboration between AI practitioners and business stakeholders that many projects skip, leading to the common failure mode of technically successful models that do not solve the business problem.
In a widely cited analysis of enterprise AI project failures, Gartner found that evaluation framework weaknesses — teams building AI systems without clear success criteria connected to business outcomes — were a contributing factor in the majority of projects that failed to reach production. Building the evaluation framework before building the model, and getting business stakeholders to agree on it, is one of the most effective project management practices available.
Iterative AI Development Practices
The development practices that work for AI projects are inherently iterative — more closely analogous to hypothesis-driven research than to agile software development, although agile practices can be usefully adapted. The CRISP-DM framework (Cross-Industry Standard Process for Data Mining), while showing its age in an era of foundation models, correctly identifies the iterative, non-linear nature of AI development: understanding the business problem, understanding the data, preparing the data, modeling, evaluating, and deploying are phases that cycle back on each other rather than proceeding linearly.
In practice, successful AI project management looks like this: small, time-boxed experiments with clearly defined hypotheses; rapid evaluation against pre-defined success criteria; explicit go/no-go decisions at defined checkpoints; and a culture that treats early failure as informative rather than shameful. The organizations that manage AI projects well treat the discovery of a non-viable approach early in development as a success — they have avoided wasting resources on a path that would not have worked. The organizations that manage AI projects poorly treat any failure as a problem to be hidden from leadership, which results in zombie projects that consume resources without producing value.
The Two-Track Model
One of the most effective structural practices for AI project management is running two parallel tracks: a research track that explores the technical feasibility of the AI approach with maximum speed and flexibility, and an engineering track that builds the production infrastructure and integration framework. Many organizations make the mistake of waiting for the research track to complete before beginning engineering work, leading to a final integration phase where the technically-validated model turns out to be incompatible with production requirements. Running tracks in parallel, with defined integration milestones, significantly compresses the time from model validation to production deployment.
Managing Stakeholder Expectations
Stakeholder expectation management is one of the most politically sensitive and practically important aspects of AI project management. The challenge is that AI often enters organizations trailing a cloud of hype — executives who have read breathless coverage of AI capabilities, attended vendor demos showing cherry-picked examples, or compared their AI ambitions to highly publicized successes at technology companies arrive with expectations that are systematically inflated relative to what is achievable in their specific organizational context.
The leader's job is to manage this without crushing the enthusiasm that drives investment and organizational support. The most effective approach is concrete specificity: replacing vague discussions of AI potential with specific descriptions of what the project will accomplish, what it will not accomplish, what success looks like, what the timeline is, and what risks could prevent delivery. Vague enthusiasm breeds unrealistic expectations; concrete specificity enables informed judgment.
Regular structured communication — monthly written updates that include what was learned, what was not working, what was adjusted, and what is planned — maintains stakeholder trust through the inevitable turbulence of AI development better than infrequent dramatic updates. The worst pattern is radio silence followed by a major delay announcement, which destroys trust and gives stakeholders no opportunity to course-correct before a project is in serious trouble.
AI Project Failure Modes
Understanding the recurring patterns by which AI projects fail is essential for avoiding them. The following taxonomy draws on documented analysis from Gartner, McKinsey, and the academic AI deployment literature.
Vendor Contracts for AI
Procurement and legal teams that negotiate AI vendor contracts using standard software contract templates will consistently produce agreements that fail to address the most important risks in AI vendor relationships. AI contracts require specific provisions that standard software contracts do not.
Data Rights and Privacy
The contract must explicitly address who owns the data used to train or fine-tune the AI system, whether the vendor has the right to use your data to improve their models, what data is transmitted to the vendor's systems during inference, and how data is retained and deleted. Many enterprise AI buyers have been surprised to discover that their vendor contracts permitted the vendor to use their operational data to train shared models — a provision that raises both competitive intelligence and privacy compliance concerns. Explicit, legally reviewed language on data rights is non-negotiable.
Performance Guarantees
Traditional software SLAs define availability and response time. AI systems require additional performance commitments: accuracy guarantees (what performance levels the vendor commits to on defined test sets), fairness commitments (that the system meets defined criteria for performance parity across demographic groups, where relevant), and drift monitoring obligations (that the vendor will notify you if model performance degrades beyond defined thresholds). These commitments are difficult for vendors to make without careful qualification, but their absence leaves buyers with no recourse when AI performance degrades.
Explainability and Audit Rights
In regulated industries — financial services, healthcare, insurance — the contract must specify what level of explainability the AI system provides and what audit rights the buyer retains. As AI regulation has intensified, regulators have increasingly required that financial institutions and healthcare organizations be able to explain how AI-driven decisions were reached. A vendor who cannot provide adequate explainability, or who claims trade secret protection over model internals in ways that prevent regulatory compliance, is a vendor you cannot safely use in regulated contexts.
Exit and Portability
Vendor lock-in is a significant risk in AI systems, where the model, training data, fine-tuning, and integration work may all be proprietary to the vendor. The contract should specify data portability requirements — you must be able to extract your data and fine-tuned model weights — and define a reasonable transition period and assistance obligation if you decide to switch vendors. Many organizations have discovered only at contract renewal that they had inadvertently accepted near-total lock-in, leaving them with no negotiating leverage.
Measuring Vendor Performance
Managing AI vendors requires a structured performance measurement framework that goes beyond the service level metrics of traditional IT contracts. A practical framework has three layers.
Technical performance metrics track the AI system's performance against defined benchmarks over time: accuracy on monthly held-out test sets, false positive and false negative rates, latency, and availability. These metrics should be measured by the buyer — not just reported by the vendor — using evaluation datasets that the vendor has not seen.
Business outcome metrics track whether the AI system is actually delivering the business results it was procured for. This requires establishing baselines before deployment and measuring impact in terms of the business outcomes that motivated the purchase — not just model performance. A customer service AI that maintains 90% accuracy but reduces customer satisfaction scores has a business performance problem regardless of its technical metrics.
Relationship and responsiveness metrics track the vendor's performance as a partner: response time to performance issues, quality of documentation and support, transparency about system changes, and proactiveness in communicating relevant developments. AI systems are living systems that evolve, and a vendor relationship that is not actively managed tends to drift toward the vendor's interests rather than yours.
The organizations that manage AI projects and vendors most successfully treat these relationships as ongoing partnerships requiring active management — not as procurement events followed by passive consumption. They invest in the capacity to independently evaluate AI system performance, they maintain internal expertise sufficient to hold vendors accountable, and they build the contractual and technical flexibility to make changes when systems underperform. AI strategy without AI governance produces capabilities that erode.