How accountants can appropriately rely on AI

Hosted by Neil Amato

April 25, 2024

Danielle Supkis Cheek, CPA, vice president and head of Analytics & AI for Caseware, had an interesting way to continue her exploration of the ethics of using AI tools in accounting: She asked ChatGPT to give her an answer.

The response helped in her assessment of such tools. She shared more about that result — and answered our questions — in this episode of the JofA podcast.

What you’ll learn from this episode:

What a well-known generative AI tool had to say about the risks of using large language models in accounting.
The surprising thing an early version of ChatGPT said about Supkis Cheek.
The top-of-mind AI ethics concerns for the accounting profession.
How the development of smaller-scale language models, with accounting specifics, can improve confidence in large language models.
An explanation of the MAYA principle as it relates to AI.
Thoughts on taking a “measured approach” when it comes to AI auditing.

Play the episode below or read the edited transcript:

— To comment on this episode or to suggest an idea for another episode, contact Neil Amato at Neil.Amato@aicpa-cima.com.

Transcript

Neil Amato: Welcome back to the Journal of Accountancy podcast. This is Neil Amato with the JofA. Today’s episode focuses on a hot topic these days, the ethics of artificial intelligence. The speaker is a friend of the program, a repeat guest. Her name is Danielle Supkis Cheek. She is a CPA and also vice president at Caseware, serving as the company’s head of analytics and AI.

In this episode, we’re going to discuss advice and some of the questions to consider related to AI ethics. Danielle, first, welcome to the show, and I’m going to kick things off with a very light-hearted question. I’ve heard this, but I have to ask to make sure it’s true. Did you really ask ChatGPT the question, what are the accounting ethical risks to using a large language model?

Danielle Supkis Cheek: Of course, I did. Why wouldn’t I ask that? It’s really important to understand how sophisticated a model is, and if you think about it, it was probably the second question I asked ChatGPT is pretty much the ethical considerations and concerns of using yourself in our industry.

It gave a really good response. It was pretty comprehensive, and it covered a lot of different things that I knew to be risks and other things that were also interesting to consider. To me, the concept of using the output of a tool to assess the caliber of the tools is actually profoundly important in order to assess that tool. It’s actually part of the field of prompt engineering to be able to assess tools in that way.

Amato: I’d also like to know this, have you asked ChatGPT what it knows about Danielle Supkis Cheek?

Supkis Cheek: I have. That was the first question I ever asked it. I think almost all of us put in the names. Now, when I first did it, it was clearly early stages, and it was before some of the work that’s been done to prevent some of the hallucinations related to non-super public citizens, presidents of the world type.

At the time I got a ridiculous hallucination. It believed I was an art professor in Pennsylvania that specialized in lithographs. Clearly, none of that is true. None of that is representative of any family member. I don’t have similar names to family members, but if you start piecing it together, I do have family members that enjoy art. My mother is from Pennsylvania, and I’m a professor in accounting in Houston.

When you start to take little snippets of that, you can see how it’s stitched something together, and you’re like, I see some familiarity there. But it was categorically untrue. Now, the less interesting part or maybe the good part is that now if you type something like that in, I’m not a public enough figure like a president of a country to be known, and ChatGPT will just say they don’t know anything about it and that person is probably not the president of a particular country.

Amato: It’s good to know that Danielle is indeed a CPA, indeed based in Houston, not an art professor from a different time zone. That wouldn’t be a bad thing, but it’s good to know.

Supkis Cheek: I also was, I think, 20 years older in the ChatGPT response.

Amato: Well, we definitely know that’s a hallucination. Tell me this, what are some of the aspects of AI ethics that people should be thinking about but aren’t?

Supkis Cheek: I think the space is ever-evolving. The area that everybody went and dealt with as a table stakes approach is making sure that confidential information is secure. If you do have information going back into a model itself, making sure that you’re not putting confidential information in and, if you can’t control the model, there’s approaches to take apart your version of an LLM, or large language model, and segregate it so that it’s not going back into a main model.

Maybe you’re learning from your prompts, but it’s not going back into the general public. That’s one big area that has been the table stakes, and I think is the space that everyone is focused on first. The next iterations start to become more and more complex of how do you appropriately rely? Clearly, we’ve already talked about hallucinations.

What can you do to make sure you can comfortably rely on the results of — right now, heavy focus is obviously on generative AI, but also on other, let’s say, iterative algorithms and machine learning. I think the current standard that has come out from IESBA [the International Ethics Standards Board for Accountants] puts a heavy emphasis on assessing the output of a technology.

They liken it to assessing the use of an expert and relying on an expert, which I find to be an elegant solution. There’s different ways to assess the caliber of being able to rely upon that output. Clearly, one way is to be able to very specifically test my inputs, test my process. Therefore, I feel very comfortable being able to rely on my output. But in other cases, you may not have that level of transparency, or the approach may be so complicated [that] it’s very hard to put that together.

You’re going to start to see people finding ways. There are ways to do this — we can get into it a little bit more in a bit — of how to appropriately rely. I think that ends up being your next biggest ethical consideration, how do I appropriately rely? The next piece of this to me — and I know I’m being a little bit long-winded, but we’re on the third big-bucket aspect — is a concept that’s a little bit more granular and refined from our first concept related to confidential information.

Most people are worried about a broad disclosure of confidential information. That’s the one that we talked about first. As we get more nuance, there’s actually a risk of a more limited-scope disclosure of confidential information.

One of the up-and-coming concerns is if your model allows for learning from prompts, and you put confidential information appropriately, you have all the appropriate safeguards into a prompt for, let’s say, about Company A. Summarize this contract for me, and it’s related to Company A. Then later on you say provide me a sample paragraph that would provide this kind of accounting treatment for me to show as an example to a client. To the extent that, for some reason, the model includes some of Company A’s confidential information into the response for Company B that then makes it to a client for Company B, you have extra complexity related to a more limited disclosure of confidential information, but confidential information nonetheless.

It’s very similar to the very public fights over whose IP [intellectual property] is what. I think you’re going start to see more and more complexity about the nuance. There are some safeguards, by the way, that are already in place for a lot of providers of generative AI products, and a lot of them are related to providence of where is the origin of something. There’s already a lot of work on having the transparency. I think you’re going to start to see some of these more granular, ethical, narrow-scope issues coming up and then getting resolved pretty quickly. I think that’s the big-ticket items; there’s going to clearly be a lot more little nuanced areas as well.

Amato: Sure. There definitely are. You mentioned IESBA, that’s the International Ethics Standards Board for Accountants. You touched on the topic of the next question, which is assurance. As accountants are often the purveyors of assurance, what are the opportunities out there to assure that some of these algorithms aren’t biased?

Supkis Cheek: I think there’s a couple of approaches that we can take. Since accountants are in two different aspects of the world of assurance, they have to provide assurance over financial statements that may use AI, either at the client’s space or their own internal processes related to being able to provide assurance over the space. But then, to your point, there’s the ability to provide assurance over other people’s technology as well.

I think there’s going to be three tiers to think about this. How do we look at the biases that you just talked about is going to depend on where does the bias exist and what is the use case that is needed to feel comfortable that there is a confidence in the information, that it’s a low risk of bias. I think you’re going to start to see the same trends that we’ve seen in the past related to SOC reporting, ESG assurance that’s being talked about a lot, and, is the next kind of frontier, AI assurance and third-party assurance over AI systems to help on the transparency of processing as well as bias.

There are companies out there that already do this. As you start to see more regulation coming out related to transparency in AI, I think you’re going to start to see the need for more assurance over AI. It may be external, but there may also be work that auditors do related to their own use of AI to understand where there could be a bias risk, as well as understanding what their clients’ uses of AI are and does that need to change some of their audit processes or not? I see it as a three-tier system. Probably a little bit of an over-answer of your question, but that’s really where the scope is wildly varying and what use case you’ll be having of those kind of need for assurance.

Amato: Obviously, with all of this — we’re recording here in late April — by the first of May this could easily change a lot. It’s changing all the time.

Supkis Cheek: Daily. It feels like it’s daily.

Amato: Exactly. This whole topic it’s going to continue to grow and change. But what does the next year or so hold as it relates to AI in accounting?

Supkis Cheek: I think it’s going be tough to predict and have a magic eight ball or the magic sphere of understanding and the crystal ball. But, to me, I think there’s going to be a lot of the intersection of refinement of technology to solve some of the problems of the hesitations and concerns of accountants. I see it falling into two tranches. The generative AI side is the first tranche, and I see a lot of organizations working more towards making sure that there is clear — and we’ve talked about it just in passing, the concept of providence — origin or something similar of where did the source of a prompt come from?

That is going to be an important aspect of innovation. I think you’re going to start to see more work done on something called RAGs [retrieval-augmented generations] and knowledge bases or small language models, where you’re putting in more specialized content related to our profession, that you can get a reduced risk of hallucination and other hallucination or mitigation factors to increase the caliber and precision of responses. I think you’re going to start to see even more work on that.

I also think you’re going to start to see more work on interesting areas related to agents in large language models, which, in effect, takes the response of a model, executes a task, and then moves on to the next step. Thinking of it as linking different steps to each other. There’s actually a concept called chain-of-thought reasoning that actually helps you have the transparency in all those different steps, that you don’t have something going off the wheels. It’s actually a very structured process where you can follow through the entire approach. I think that’s the innovation you’re going to see in the generative AI world.

In the more traditional AI world, I think you’re going to start to see more willingness to accept and understand machine learning and the use cases of machine learning. I think you’re going to start to see more almost like citizen data scientists on engagement teams. A lot of students are learning some advanced capabilities in school on data science and data analytics. I think you’re going to start to see more robust computational capabilities coming out of teams, but not to the point where it’s some scary robot uprising concern. I think you’re going to just start to see more of those skills from other domains coming into accounting and being considered a bit more mainstream versus highly advanced, highly foreign, and overly scary.

Generative AI has scared us enough to be seeing that there’s potential seeing the value, but also seeing, I can figure out which areas I need to start getting my risk profile around and understanding the risks. When you start to compare some of the generative AI risks to some of the more straightforward, and I’m not going to say straightforward to mean no risk, but it’s easier sometimes to understand and get your arms around risks of more machine-learning, computational approaches. I think you’re going to just start seeing a little bit more willingness to embrace some of that technology in the mainstream parts of accounting and auditing.

Amato: Now I’ve heard you mention this phrase, I guess the first part of it is an acronym. The phrase is, the MAYA principle, MAYA being M-A-Y-A. What is that principle?

Supkis Cheek: Yeah, it’s most advanced yet acceptable. It’s actually rooted in design. It’s the concept of people want to be advanced, but if you go too advanced too fast, or you go too theoretical, and you don’t think about the practical change management aspects, you’re going to lose people out of it. You don’t want to take people so far outside their comfort zone that everything just stops and there is going to be no going forward from there. People just stop the concepts.

If you think about when ChatGPT first came out, most firms took a very much we will not allow anybody to use this tool until we can assess it. It was very advanced, and everybody instantaneously had a fear of what could it be and couldn’t spend the time to assess it. Then within six months, a lot of firms had already put into place appropriate safeguarded systems and had requisite principles, guidelines, ethics, whatever is the right term for that particular organization, of how to safely use a product.

That one was a pretty advanced shock to the system. But there are incredible enhancements in technology that could be used in a very meaningful way in audit, but they may not have the full transparency that our profession needs, or they may be too advanced to the point of, wait, is the auditor losing some judgment here? We could automate many parts of the audit to the point of actually subordinating judgment for the auditors, which is not OK.

I think, for me, MAYA is incredibly important to understand. We are a highly regulated profession. The public puts a lot of trust in us. We have to have a measured approach to cool technology as well as mainstream technology to make sure we’re doing it all in a very safe and ethical way that fits the needs of our users. For me, that’s always looking at what’s really cool and happening in other professions that are innovating very quickly. But then how do we bring it back and put the appropriate safeguards around it to make sure that it creates that appropriate experience for auditors?

Amato: I really appreciate you taking us through some of the aspects of this topic. Anything else you’d like to add as a closing thought?

Supkis Cheek: One last little closing thought, and hopefully it doesn’t date our session too much, but you already said we’re at late April. One of the things to also look out for, and I’ve alluded to it only a little bit, is the coming regulation of AI. Be watching what happens in the mainstream news about the regulation of AI. It’s moving very quickly.

For those of you that have lived through all the different Wayfair-like and SALT implications of very disparate regulations coming and moving very quickly and auditors having to, or accountants having to, respond in a very meaningful way and creating some uncertainty, get in front of what those are just to see what’s happening.

If you’re wanting to do some work to prepare, I would say spending some time to inventory what AI systems you have in place or asking your clients to start thinking about and inventorying what AI systems they have in place. Maybe you don’t need to take any big actions right now, but it’s an area to watch in the next year as well. It’s not really innovation in the sense of the traditional term or the word innovation for what you are applying of technological innovation. But I think it’s going to be what you see as well a lot in the next year, is what do you do getting ready for coming regulations associated with AI that may be pretty broadly applicable or even just narrow-scope applicable to your clients or your firms or those around you?

Amato: Thanks to Danielle Supkis Cheek for joining us on the JofA podcast. Thanks also to you, the listeners. This is Neil Amato. We’ll talk again next week.