Code Red Flags: Spotting AI-Generated Code in M&A Tech Due Diligence

In the rapidly evolving tech acquisition landscape, a new challenge has emerged for due diligence teams: evaluating the quality, security, and sustainability of AI-generated code. As strategic buyers navigate the SaaS acquisition market, the prevalence of AI coding assistants like GitHub Copilot, Loveable, Bolt, Amazon CodeWhisperer, and ChatGPT has introduced both opportunities and risks that warrant careful examination.

AI coding tools have revolutionised software development workflows, enabling teams to deliver features faster and tackle technical challenges with fewer resources. However, many companies don’t fully disclose their reliance on these tools: not out of deception, but because they have become so embedded in their workflows that they no longer think to mention it.

This silent adoption creates blind spots in traditional due diligence approaches. Code that appears functional at first glance may harbour hidden issues that only become apparent after the deal closes – when remediation costs fall directly on the acquirer.

Five Critical Assessment Areas for AI-Generated Code

When evaluating a potential acquisition target where AI-generated code may be present, focus your technical due diligence on these five key areas:

1. Security Vulnerabilities

AI models can inadvertently introduce security vulnerabilities by suggesting code patterns that haven’t been properly vetted against the latest security standards. Static analysis tools like PMD can help identify these issues without needing to run the code. Security vulnerabilities in AI-generated code often follow recognisable patterns: excessive use of deprecated methods, over-permissioning, or insufficient input validation. These patterns can be detected through automated scans and targeted code reviews of security-critical components.

2. Intellectual Property and License Compliance

One of the most significant risks with AI-generated code is the potential for intellectual property contamination. Large language models trained on public repositories may reproduce copyrighted code segments or introduce components with incompatible licenses. Tools like ScanCode Toolkit can identify license inconsistencies and potential IP issues by scanning code-bases for license declarations and copyright notices. This step is crucial for preventing post-acquisition legal risk.

3. Code Quality and Maintainability

While AI tools excel at producing functionally correct code, they often fall short in creating maintainable solutions. AI tends to prioritise immediate functionality over long-term maintainability concerns like readability, modularity, and adherence to project-specific architectural patterns. Static analysis tools like PMD can identify complexity hotspots, excessive method lengths, and other code quality issues that frequently appear in AI-generated segments. Look particularly for inconsistent coding styles within files: a potential indicator of mixed AI and human authorship.

4. Performance and Scalability Concerns

Performance issues in AI-generated code are particularly challenging to identify without running the application. However, static analysis can reveal common performance anti-patterns like inefficient algorithms, potential memory leaks, or database query inefficiencies. Pay special attention to critical path components and data processing pipelines, as these are areas where performance issues can have outsized impacts on overall system scalability.

5. Team Capability and Knowledge Distribution

Perhaps the most overlooked aspect of AI-generated code is its impact on team knowledge distribution and capability. Teams may become overly dependent on AI tools, creating dangerous knowledge gaps where no team member fully understands critical system components. This is called ‘Bus factor (BF)’: a metric that tracks knowledge distribution in a project. It is the minimal number of engineers that have to leave for a project to stall.

The JetBrains’ Bus Factor Explorer tool offers valuable insights here. This tool analyses Git repositories to identify knowledge concentration risks: areas of code where only one or two developers have made contributions. You can then correlate these risk areas back to individual members or teams.

The resulting report highlights components with dangerous knowledge concentration, often correlating with heavy AI tool usage. During interviews, ask team members to explain these components. Hesitation or vague responses can reveal over-reliance on AI tools.

Practical Implementation in Your Due Diligence Process

To effectively incorporate these assessments into your M&A process, consider this streamlined approach:

Begin with automated scanning using PMD and other static analysis tools to establish a baseline quality assessment.
Focus manual reviews on components flagged by automated tools, particularly security-critical paths and performance-sensitive areas.
Conduct targeted interviews with developers responsible for high-risk components to assess their understanding and capability to maintain the code.
Develop a weighted scoring system that prioritises different risk factors based on your investment thesis and the target company’s market.
Calculate remediation costs for identified issues and factor these into your valuation and negotiation strategy.

The Bottom Line

AI-generated code isn’t inherently problematic. In fact, it can significantly enhance development productivity when used appropriately. The key is identifying where AI tools have been used as responsible accelerators versus dangerous shortcuts.

By incorporating these specialised assessment techniques into your technical due diligence process, you can more accurately evaluate the true technical debt in potential acquisitions and make more informed investment decisions. In today’s market, where technical quality can make or break an acquisition’s success, this additional layer of scrutiny isn’t just nice to have. It’s essential.

Remember: in technical due diligence, what you don’t know can hurt you. And with AI-generated code becoming increasingly prevalent, knowing where to look and what to ask has never been more important.

Ready to talk about your M&A Due Diligence requirements? Contact us today.

Code Red Flags: Spotting AI-Generated Code in M&A Tech Due Diligence