How to Evaluate an AI Vendor If You're in Healthcare, Finance, or Legal

Most AI vendor evaluations are structured around the wrong criteria.

The standard enterprise software evaluation — features, pricing, integrations, customer references, security certifications — captures enough to make a reasonable decision for general-purpose software. For AI systems in regulated industries, it misses the questions that will determine whether the system survives contact with your compliance team, your legal team, or your regulators.

This is a practical guide to the evaluation questions that matter, written from the perspective of someone who has spent three decades building enterprise infrastructure in regulated environments and now builds AI systems for them.

Before you evaluate vendors: know what you're buying

AI is not a category. It's a spectrum of technologies — large language models, machine learning classifiers, predictive analytics, automated decision engines — with significantly different compliance profiles.

Before you evaluate vendors, be precise about what you're buying:

Is the system making decisions or informing decisions? An AI system that generates recommendations for human review has a different compliance footprint than one that makes autonomous decisions that trigger actions. The latter requires substantially more rigor in explainability, audit trail design, and change management.

What data will the system access? The compliance requirements for a system that accesses protected health information under HIPAA are different from one that accesses financial transaction data under GLBA, and different again from one that accesses personally identifiable information under state privacy laws. Know your data classification before you write an RFP.

What regulatory frameworks apply to the decisions this system affects? The answer should drive your vendor requirements, not emerge from them. If the system will affect credit decisions, ECOA and FCRA requirements apply. If it will affect insurance underwriting, state fair access requirements apply. If it will affect patient care pathways, FDA guidance on AI/ML-based software as a medical device may apply. Map the regulatory framework first.

The seven evaluation questions that matter

1. "Walk me through the audit trail for a specific past decision."

Not a demo. An actual decision the system made. Ask the vendor to show you what the audit trail looks like, who can access it, in what format, and how long it's retained.

What you're looking for: a complete, queryable record of the inputs, the model version, the decision logic, and the output — in a format a non-technical compliance officer can follow. What disqualifies a vendor: "we can pull logs" without a clear answer on what the logs contain, how they're structured, or who is responsible for producing them in a regulatory examination.

2. "What happens when a regulator asks you to explain this decision six months from now?"

This is the same question from the regulator's perspective, not the vendor's. Make the vendor simulate the scenario: a regulator asks for documentation of a specific decision the system made six months ago. Walk through the process. Who is responsible for producing the documentation? What format does it take? How long does it take to compile?

If the vendor hesitates, redirects to their security team, or describes a process that involves significant engineering effort to reconstruct, that's a foundational gap.

3. "How does the system handle a data deletion requirement?"

This question reveals more about AI-specific compliance maturity than almost any other. When a data deletion requirement applies — a HIPAA patient request, a state privacy law deletion right, a contractual obligation — the AI system needs to handle the deletion in a way that doesn't create compliance gaps or compromise model integrity.

For most general-purpose AI platforms, this is an unsolved problem. They can delete data from their storage systems. They cannot guarantee that the model hasn't encoded that data in its weights in ways that affect future outputs. A compliance-ready system has a defined answer to this question. If the vendor doesn't, note the gap.

4. "What has changed in your model in the last 90 days, and how was each change validated?"

AI systems are not static. Models are retrained, updated, and modified. For regulated industries, every material change to an AI system that affects regulated decisions needs to be validated against compliance requirements before it goes into production.

Ask for the change log. Ask what triggered each change. Ask what validation was performed and by whom. Ask whether your organization is notified of changes that may affect compliance. The answer tells you whether the vendor treats compliance as a process or a one-time certification.

5. "Show me your disparate impact testing results for a population similar to mine."

For systems affecting credit, insurance underwriting, employment, benefits, or healthcare access, disparate impact is a legal and regulatory requirement, not an ethical nice-to-have. ECOA, FCRA, the Fair Housing Act, state insurance fair access requirements, and EEOC frameworks all impose obligations on systems that affect protected classes.

Ask the vendor for their disparate impact testing methodology and results. Ask how often they test and what triggers a retest. Ask what they do when disparate impact is detected. A vendor who hasn't run structured disparate impact testing on their model hasn't finished building a compliance-ready system.

6. "Who is responsible for your regulatory compliance, and what is their background?"

This is a people question, not a features question. Compliance-ready AI requires people who understand the regulatory frameworks that apply — not just engineers who understand model architecture. Ask who owns compliance at the vendor, what their background is, and how they stay current with regulatory developments.

For healthcare AI: are they tracking FDA guidance on AI/ML-based software as a medical device? ONC interoperability requirements? For financial AI: are they tracking OCC and FDIC model risk guidance? State-level AI regulation? For insurance: are they tracking NAIC guidance on AI use in underwriting?

If the answer is "our legal team handles compliance" without a named owner who engages directly with AI-specific regulatory developments, flag it.

7. "What does a security incident look like for your AI system specifically?"

General enterprise security certifications (SOC 2, ISO 27001) address infrastructure security. For AI systems, the incident surface includes model-specific risks: training data poisoning, model inversion attacks, prompt injection (for LLM-based systems), and adversarial inputs designed to produce incorrect outputs.

Ask what the vendor's incident response plan looks like for AI-specific incidents. Ask what monitoring they have in place to detect anomalous model behavior. If the answer is limited to infrastructure security, the vendor hasn't fully mapped the security surface of their AI system.

Red flags that apply regardless of the RFP score

The demo is better than the documentation. If a vendor's demo is compelling but their compliance documentation is thin, vague, or contingent on future roadmap items, weight the documentation.

Compliance is a feature, not an architecture. If compliance capabilities are described as features that can be turned on or off, or as add-ons available at a higher tier, the system wasn't built compliance-ready.

The references are from different industries. Customer references from non-regulated industries don't validate compliance readiness. Ask for references from organizations that have used the system through a regulatory examination, audit, or compliance incident.

They can't tell you when their model last changed. If a vendor can't produce a clear change history for their model, they can't support your change management documentation requirements.

The evaluation is part of the compliance program

The way you evaluate and select an AI vendor is itself a compliance activity. The documentation you produce during evaluation — the questions you asked, the answers you received, the gaps you identified and accepted — becomes part of your organization's evidence base if a decision made by that system is ever questioned.

Treat the evaluation accordingly. The vendors who are genuinely compliance-ready welcome that standard. They've built for it.

Hector DeJesus is the founder and CEO of Develom. He has 33 years of enterprise IT experience, is a certified GCP Pro Architect, and has built infrastructure for organizations in healthcare, finance, insurance, and regulated enterprise environments. Develom builds AI systems for regulated industries.

[Talk to us about evaluating AI for your environment →](https://develom.com/contact)

This post is for informational purposes only and does not constitute legal advice. Regulatory requirements vary by jurisdiction, organization type, and specific use case. Consult qualified legal counsel for guidance applicable to your situation.