Blue Skies Ahead: An AI Case Study on LLM Use for a Graph Theory Related Application

A case study showing how LLMs accelerate graph theory development, reducing coding time from hours to minutes and boosting research productivity.

Gerald Rigdon

May. 12, 25 · Analysis

Likes (0)

Comment

Save

1.6K Views

Research activities necessitate the exploration of ideas which may involve significant software development, experimentation, and testing effort. The ability of Large Language Models (LLMs) to generate executable software has been demonstrated in various use cases. This study highlights the positive outcomes of a use case involving a graph theory application. Through a series of successful prompts, functional software was produced within minutes, a task that would otherwise have taken several hours or days to complete. Rather than focusing solely on cost savings from reduced engineering effort, this study aims to highlight the substantial opportunity benefits provided by this technology, especially since research typically involves evaluating multiple ideas before selecting the optimal solution.

Introduction

When considering great systems thinking quotes, Donnella H. Meadows stands out: “Remember, always, that everything you know, and everything everyone knows, is only a model. Get your model out there where it can be viewed. Invite others to challenge your assumptions and add their own.”

In the context of software engineering, one way to wrap your head around a complex software code base is to create a static model of it and attempt to analyze the behavior accordingly. In the classic book Safeware (Leveson, 1995) it states: “Static analysis evaluates the software without executing it. Instead, it examines a representation of the software. In some ways, static analysis is more complete than dynamic analysis, since general conclusions can be drawn and not just conclusions limited to the particular test cases that were selected. On the other hand, static analysis necessarily is limited to evaluating a representation of a behavior rather than examining the behavior itself.”

A paper written several years ago (Rigdon, 2010), discusses the role of static analysis in medical device software development, providing insights into the process at Boston Scientific Corporation concerning the use of design constraints in software engineering activities. These design constraints, distinct from requirements, help ensure that detailed design and implementation remain within domain-specific boundaries set and enforced by static analysis activities. A concrete example is best illustrated by the following story, which also illustrates an opportunity to, as Meadows exhorted, “Get your model out there.”

A few weeks ago, a Boston Scientific colleague identified a concern emerging from the development of a new device platform and proposed creating another design constraint. The issue involved creating a new firmware utility function interface and ensuring developers used this new interface in specific contexts instead of the existing interface that had been in use for years. Integrating a new function in an existing code base generally conjures up images in the mind of software engineers as they immediately want to understand how this new piece fits into the larger whole. As stated so well in (Hopstock, 2022), “Being able to model the inter-procedural control flow as a call graph is one of the most important building blocks when analyzing programs. Many of the more advanced analyses depend on this information being available.”

Conveniently, as discussed in (Rigdon, Doshi, Zheng, 2010), the Boston Scientific implantable firmware development environment includes the use of call graph data for static analysis produced by a customized Static Analysis Tool (SA Tool). But this new design constraint required leveraging that graph data differently, necessitating the development of software to experiment and determine its viability for this new application. It was estimated that although this was not likely to result in a lot of code, that it was still three to four days of considerable Python development and debug given the esoteric nature of the task. Consequently, the task was assigned to an available developer. Waiting until the developer completed the job successfully provided working code and a baseline that was useful for this case study.

Pacemakers and defibrillators, namely, safety-critical Class III implantable devices, are custom machines that are typically programmable. This includes a set of parameters programmed at the time of manufacture, later at the time of device implant, and in post-operative clinical settings. These telemetry interactions are an example of one group of activities that result in firmware utility function use. Referencing the previous conversation with a colleague concerning the creation of another design constraint for managing finer control of device firmware utility function use, the first order of business was to find all uses of the existing utility function, namely, utl_ParamUpdate() and then:

Decide which cases should be updated to use the newly proposed firmware utility function
Implement the new utility function and refactor the existing firmware to make use of it
Enforce, by means of a design constraint, that future design and implementation use the proper utility function

Interestingly, this software engineering problem highlights an opportunity for testing the viability of another type of model, a Large Language Model (LLM).

Identifying Existing Uses

Figure 1 is a snippet of a much larger call graph in this code base that shows all callers of the utl_ParamUpdate() firmware utility function under discussion. For each caller, encased in the bright pink rectangles, there is an exploded view available as shown in Figure 2 and Figure 3. These subsequent views could likewise contain the same bright pink visuals leading to further exploded views until arriving at one or more root nodes or graph origin nodes.

Figure 1

Figure 2

Figure 3

It was expected that the utl_ParamUpdate() firmware utility function would be called in the following cases, identifiable by the graph root nodes:

During initialization of the firmware and following a reset
When commanded by an authenticated external device connected through telemetry
Dynamically, during firmware execution including monitoring and therapy delivery, etc.

Performing text searches like greps or otherwise are not overly useful for discovering these call paths given the goal is to find the root node which would, as stated, allow the identification of the classifications from the above bulleted cases. Granted, given a simple graph, such searches can be successful. However, this discovery process becomes much more difficult when the call graph is both deep and wide or as shown in Figure 3 above, identifies multiple root node path cases. Hence, a manual process would be more susceptible to yielding inaccurate results given graph complexity. It is a problem in search of an automated solution especially since the call graphs themselves are auto generated.

Hand-Crafted Code Creation

As stated earlier, this project commenced with the creation of software by an engineer given some general requirements and the SA Tool generated call graph output CSV file as an input. Figure 4 is a terse example of a call graph file that is a list of all file and function caller and callee pairs.

Figure 4

Each row in this graph CSV file represents nodes in a call graph that are connected. These connections in a directed graph are known as edges which represent the relationship between the caller or source node that invokes the callee or target node. With a list of these caller and callee pairs one has all the information necessary to build out a visual call graph. Further, with the use of some open-source tools like Graphviz this information can be converted to the DOT (graph description language) format which can be ultimately rendered in formats such as JPEG for displaying beautiful visual graphs.

This case study was based on a project whose graph CSV file consisted of over 3,500 rows where the primary software requirements are conveyed as follows:

Req1: The software shall use the graph CSV as an input file and produce a list of all caller root node functions and subsequent node edges that result in all possible unique directed graph paths that end in a specified callee function.

Req2: Each path in Req1 shall be output as a separate row in a destination CSV file where the edges between each node are represented visually with “->” characters.

Although the final product was not a large Python script as had been anticipated, it took three days to develop. Not surprising given most developers fluent in Python are often not well versed in how to write and debug code for graphs unless that is their primary job focus. So, the initial development effort estimates were close. The developed Python code yielded an output file containing one-hundred-six unique paths. An excerpt capturing five of these paths is shown below in Figure 5.

Figure 5

Code Creation With OpenAI ChatGPT-4o

Now with some source code and generated output, the question to be answered was could one get an LLM to write a Python script that could produce the same output. The study used a custom AI web application known as Blue Sky (a tribute to Jeff Lynne and the Electric Light Orchestra and nothing to do with the new social media application boasting the same name). Blue Sky offers several features, and the one chosen was the basic chat with a document selection and configured for connectivity to OpenAI GPT-4o. As shown in Figure 6, the graph CSV named “TestProject.bsci.calls.csv” was uploaded, which is the file produced by the SA Tool for the project of interest.

Figure 6

To produce the correct Python script, the GPT-4o LLM was provided with the prompts shown in Figure 7 (for brevity the intermediate LLM responses were omitted). A close examination reveals some liberties were taken with the wording of the prompts compared to the previously described requirements, namely, Req1 and Req2. Furthermore, only three prompts were needed, taking less than fifteen minutes to generate the desired results. For each GPT-4o script recommendation following each prompt, the generated code was copied and pasted into a file, executed, and the output results were compared until a perfect match was found with the output produced by the hand-crafted code.

Figure 7

The final GPT-4o generated Python script shown in Figure 8 (although different from the hand-crafted script) was finalized when the output matched the output from the hand-crafted script. An obfuscated Figure 9 captures a side-by-side difference report intended to show that the files are identical (no markups) instead of the specific content of the output.