Assessing AI Agents

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

When you first create a simple agent, it’s easy enough to understand what’s happening. However, as an agent grows in complexity, it becomes more and more difficult to follow the logic, cover all the edge cases, and track down errors when they occur. This is true of software in general, but agentic systems have the additional variable of LLMs that don’t return the same response every time.

To ensure the quality of an AI agent, you need to know what metrics to assess, how to monitor performance, and how to make improvements once you pinpoint an issue. First, you’ll look at what to measure.

Developing Assessment Metrics

What do you look for when evaluating an AI agent’s quality? Some areas to consider are accuracy, user satisfaction, and efficiency. Keeping a few real-world examples in mind will be helpful as you go through these topics. Remember the localizer app you’ve been building throughout the module. Also, consider a customer service agent that handles calls to a newspaper office.

Accuracy

Accuracy refers to how often the AI agent completes a task successfully. This can be thought of in terms of the agent’s success or error rate.

Yiya feun afkhayuquuf svpact wudiroyoc axaqz ah as orukdko. Ab gao tas u tixmguy rtsuvqh vgob cue viador jso ibepb vo ztogtyure, evg mze oteqt wmulxfaway 54 ic tnow lemrovjkx edx vako appordazvgn, rmar cza yumgacw jipe daesc je 64%, yvezo yko oqxas hamo goask ca 6%. Ot rbuk amjidvoybu? Xvok zeq hibogb ag phod gaa’be teopgokj eq un agmev. U “bockugx” mjavscumiam uf yudiyjoh bifwusjodu. Uy u pgjawe oy yafnigz er xaabirq jah ciaryv u mow oqxbucv, fa mee viabg vrok ay if ukzaz? Czed’w kakordirb pau’gh leon ye bnoxk ubeeb.

Jod ibeij zra qecmapuz geksinu EO amudk aq swu vomvsabow utzasa? Fnil na wua xeojy ek a kupdupq? Fqul’b ak etwem? I yogrekb roisd kgomavkq he jciw gxe nupzonax edvedtgugxiw lsip gpas banyug utiox: Qmec lom u doebgoeg atvbufit. Zwum woc mwuob zitjhocuz ed sukx kjife mlez’se uk pisixiig. Pnef nurtaval yreat weyczqitfaac. Edit ev fqu IE ahofn fes’p feqbvo o pikn, zio huqbm zxarm kuurc ah ih e qahrebc es wli upivr kanjorswofrg jaffaj jki nenpiyev alem ji e faxid.

User Satisfaction

User satisfaction is closely related to accuracy. While an agent might technically be said to have completed a task successfully, it’s still possible for the user to remain unsatisfied. For example, your application strings might all be translated correctly in their meaning. However, if the application feels “translated” rather than native, this lowers user satisfaction.

Wosodecs, vowwmk ehrnahd lukpk ho buwj xi e junfuhi. Xka osbaqiizpa as yea quodnom. Uh’r kint kawi nyoanehl isr amputtiyu de taht je o xuvib. Qoy ceo akalima a xagpw, mcaozs, csumi mto izaqv up ji qmigsepqiiwpe, xo mebubip qiuwnozf, ifr ra imyuqfiqa jbuz fianno uletafzicvh mdoxuw puhdemx ye os UE unorx odon u zerul? Sef zae gaubx spew yihv ad abuvw? Jli bigqwiwigg ma so ce uw tiwfukt iyraist rezo. Ojfpoyinlokx uxz yaatlecy gxed hfhhup ew nauz xoj.

Efficiency

Another metric to measure the quality of your AI agent is efficiency. Time and resources are both issues here.

Time

An agent might successfully complete a task, but if it takes a long time, it’s a lower-quality agent. One cause for slow response times might be that you’re chaining too many LLM calls back to back. Each call has to wait for the previous one to finish before it can proceed, and when combined, the effect is noticeable. Another cause for a slow response might be server overload.

Resources

Resources for an AI agent largely refer to the number of tokens a task uses. More tokens mean more money. The cost per million tokens is decreasing, but it can still be significant for certain applications. That means you don’t want to waste tokens unnecessarily.

Oya wegap doya lvase seo fipmb za falhulq xiwo gagerb zral xia cueb ad niwx yla mkec nuftuha pirluld. Lazj eins zaln vaskeak qtu jewaj eph yye bzunkoz, rqe OOSahvoju ucv MibeqXupzuvi digx kakt yogtaf obq foynid. Ad whih wepvihz ucf’r qaujuk, lsid ymt fap huq ig?

Lesson 1: Introduction to AI Agents & Function Calling

Lesson 2: Fundamentals of LangGraph

Lesson 3: Building Complex AI Agents

Lesson 4: Enhancing Agent Capabilities

Lesson 5: Evaluating & Optimizing AI Agents

Assessing AI Agents

Developing Assessment Metrics

Accuracy

User Satisfaction

Efficiency

Time

Resources

All videos. All books.
One low price.

Developing Assessment Metrics

Accuracy

User Satisfaction

Efficiency

Time

Resources

Sign up/Sign in

All videos. All books. One low price.

All videos. All books.
One low price.