
Everywhere you look, everywhere you click, it feels like Artificial Intelligence is ever-present, a perpetual companion to your web browsing, whether you like it or not. As AI proliferates across far reaches of the internet, AI software is popping up in phones, email, messenger apps, and — perhaps most concerningly — in software that deals in matters of law and public policy, such as e-procurement software.
AI chatbots can be helpful for figuring out recipes with the remaining contents of your refrigerator, but should you use it in drafting RFPs and other official documents, when there’s a high probability, at the time when we (actual people) are writing this, of an AI creating ‘hallucinations’ of nonexistent laws? Is it worth trying to save time when AI so often gets facts (especially in areas of law) completely wrong? Is it worth leaving matters of regulation to a software that, as of now, is itself completely unregulated? Is the risk greater than the reward?
Compliance or Complications?
Picture this. You have an RFP to write, many pages of instructions and requirements that require a high degree of exactitude, as is required in public procurement. You want to figure out how to save time and ease your workload, so you try using a large language model (LLM) to write up some text.
It’s highly important that the language complies with all the rules and regulations, so you feed your policies, guidelines, and bylaws into the AI and ask it to draft text that has compliance within them. It returns to you what looks like parts of a polished, well-written RFP. Without careful review, at the surface, this document looks like it did indeed draft the language with all compliance. You give it a once-over and submit it for review.
However, once a few more eyes are reading through it, it becomes clear that there aren’t just flaws in the language — it didn’t understand or comply, at all, with the regulations you fed it. You try to fix the errors, but find enough structural and regulatory flaws in the text that now, you have to scrap it, or large parts of it, and begin writing anew, inadvertently wasting time for you and your team. Or worse — you submit and post it, inviting vendors to respond to something you didn’t intend, to comply within guidelines that don’t reflect your actual regulations — you can’t put the toothpaste back in the tube.
Issues with AI-written text
AI products excel at writing authoritative, confident text quickly. It may be tempting to ask an AI to write language for your RFPs and other solicitation documents, but there are a number of problems with this. You might think that you can feed an LLM laws, statutes, guidelines, and policies that govern your purchasing operation and that solves the problem, but as of spring 2025, LLMs cannot truly do this. Even with all of the statutes fed in as a prompt, the LLM cannot interpret the rules — the software isn’t thinking and cannot comprehend what it’s been given.
Despite the appearances these software packages give — or the claims they make — of comprehension and conversation, an LLM can only generate text based on its mathematical models. While a person could also misinterpret the rules, the difference is this: a person can be held accountable. If a vendor raised a protest over the outcome of an evaluation, would you want to have to point to the AI and say, “the machine did it”?
Risk and Riskier
One of the most crucial jobs of any purchasing staff is to eliminate risk and create safeguards against error, misuse, and mismanagement. If you use AI to write any portion of a proposal or contract language for you, you risk errors, inaccuracies, and flat out incorrect information. Commercial computing can accept a higher risk and failure rate — and even there, there are numerous cases where AI has mislead customers with incorrect information (such as this) — and indeed commercial insurers have begun offering coverages for mistakes made by AI, but for matters of law and policy, of which public procurement is one, the risk and failure rate needs to be as close to zero as possible. New York City discovered this when their chatbot began encouraging business owners to break the law.
Precision is crucial in any public contracting language, as is accuracy in referring back to the contract. If you use AI, it generates a facsimile that looks how a contract is supposed to look, but each word is produced based on how likely it is to appear in the sentence (based on millions of web pages, posts from social media, and so on) and how the text is produced does not at all refer back to the contract or any past language of a document.
Much of the data that is fed into an LLM is from forum posts and websites like Reddit, which are not authoritative sources. If untrustworthy, unreliable, or questionable content makes its way into generated text, it’s obviously far from ideal. Therefore, information can often be distorted or even incorrect. AI has a tendency to ‘hallucinate’ information, i.e. making up incorrect information and presenting it as accurate. This poses a great risk if you were to use it in government contracts. Even if you’re having AI write copy that doesn’t have specific facts and figures in it, you still risk the AI producing the wrong contract language for your specific needs.
Two short — entertaining but grim — anecdotes. The first: Recently, two New York lawers tried to use ChatGPT to write court filings. It came up with convincing-looking examples of case law. One problem: it included 6 case citations that did not exist. The judge on their case gave them citations for giving false or misleading statements to the court.
The trustworthiness of AI on even the most simple tasks has come into question with a recent scandal embroiling Google, our second anecdote. Due to the unpredictable nature of the source material, from forums such as reddit, the result can be inaccurate or totally wrong. Google Gemini recently started using AI to populate an answer to the query. However, since the information is partially sourced from forums and social media rather than peer-reviewed or reputable sources, Google ended up giving results like this one where it claims that glue can help cheese stick to your homemade pizza (which was a ‘fact’ pulled from Reddit). If anything, this should serve as an example of the variability of results.
AI cannot be used for scoring, period.
For scoring proposals, using AI tools would be more than a misstep — it would be unethical. An AI cannot analyze a proposal like a human can. It doesn’t see the entire document as a whole — it sees it like a series of symbols on a page, all separate and not taken as a whole, with all the nuance that entails.
A large language model is a black box. If you ask it the same exact prompt in different instances, it will give different answers, and if you ask it why it gave one answer over another, it wouldn’t be able to give you an answer. This has obvious negative implications for scoring proposals; you absolutely cannot trust an AI to do scoring for you.
There are also technical issues with AI that can inhibit your ability to score down the road if used, even if you don’t use it for the scoring itself. If you use AI to summarize a proposal or any part of an important document, you run the risk of having key information missing from the summary, and AI doesn’t maintain tone and language choice so you would risk contaminating the evaluation with value judgments that aren’t yours.
Public purchasing also demands a fair, unbiased evaluation of vendor proposals. No AI company can guarantee that their tools will offer a fair and unbiased evaluation All the uncertainty that AI introduces has disastrous implications for the procurement industry.
These tools cannot be inspected to understand their “reasoning” process, and if an AI tool does a poor job reading and evaluating a proposal, it introduces an unfairness to the process that can impact the impartiality of the purchasing process — and that can impact not just the vendors on a given solicitation, but taxpayers as well. And if a vendor discovered through a FOIA that an agency is using AI to assist with scoring, it wouldn’t just cause a lawsuit — it would be a PR disaster.
Unacceptable risk for sensitive data
If a solicitation that’s highly sensitive, such as a large-scale transit system or water treatment plant, has an error caused by AI use, it would cause a scandal of immense proportions, especially with the media focus on AI and how it relates to governing bodies. With the seemingly nonstop media feed of articles about AI-related scandals, there is extra scrutiny on AI usage, and especially that of publicly-funded entities.
Even for something as simple as a summary, you or your staff run the risk of the AI missing key details, passing those forward until essential language is obfuscated. AI often has trouble retaining all the facts or components of text being summarized. And if you attempt to have AI generate
Should e-procurement software use AI?
E-procurement software companies that use AI may make a lot of big promises about how much it will help you at your job, but the truth is, since AI is so hot right now, a lot of companies are duct taping AI onto their products in a way that at best introduces uncertainty, and at worst, causes disastrous implications that ripple throughout your agency. Companies see an opportunity and are jumping on the bandwagon. But the truth is, when you’re dealing with government contracting, you can’t afford to leave any piece of your work to a machine, especially one as unpredictable and unreliable as the current implementations of AI.
At BidLabs, at the time of writing in Spring 2025, we do not have any AI integration with our platform, because we do not believe that AI tools are reliable right now for a field where specificity and accuracy is as important as it is in government contracting. We are closely following developments in AI where they can demonstrate reliability for uses in law and public policy, and when those tools have proven themselves reliable, we are ready to incorporate them. For now, however, the important work in public purchasing should always be checked and double checked by humans.
Sometimes, AI just isn’t worth the risk.