How Calculus Solved Our Mobile Waiting Problem

Waiting for an action to complete sounds trivial until you try to do it well. While building mobile native support on the fore ai platform, we had to figure out when an action — a tap, a text entry, a swipe — was actually done, so we could screenshot the result and compare it to the state before for the purpose of validating correct generation. Taking the “before” screenshot is easy. Knowing when to take the “after” one is the hard part.

First thing that comes to mind is to take a screenshot right after Appium sends a response to the code action (i.e. right after performing a click, send_keys or other action), which is too soon in most cases — the application might start loading an image, or opening a new activity. The second most obvious thing is to add a fixed timeout, but how can one be sure that it does not fire too soon? One could set a large timeout, but it would mean that the whole test generation becomes slower, waiting too long for each action. We need an adaptive approach that ensures reliability and speed.

One solution would be to introduce a new model that constantly checks the state of the app and adaptively tells when the action is finished. This is, however, quite expensive: both in cost and in execution time during runs. We have to come up with a programmatic solution that is cheap, but adaptive. But what would “adaptive” even look like here?

Exploration

Whenever I see problems like this, it always helps to look at the data to understand the process better. Let us take the car booking flow test as an example. Suppose we want to analyse the following five actions of the flow: Accept Cookies, Continue as guest, Dismiss notifications, Select pickup station field and Enter pickup location, plotting the number of frames that were drawn since the start of the test. Getting frame stats is quite simple on an emulator just run adb shell dumpsys gfxinfo com.your.app.package command in the terminal. After executing each step a fixed timeout of 8 seconds is given. Note that the frame stats are sampled every second, whereas actions are run in a different thread, so the timings are not synched.

As one can see, after the first 3 actions the graph stabilises (no new frames drawn). However, after the last two actions the frame count rises steadily. This is because the screen has an input field, which after typing in it has a constant animation of the typing cursor blinking at a constant speed.

So how do we programmatically determine the point at which our graph stabilises, i.e. experiences constant linear growth (i.e. blinking animation) or no changes at all (no animation)? To answer this question, we have to dust off some high-school calculus!

Function derivatives

As one might remember from high school, the derivative of the function shows the rate of change of the output with respect to its input. In our case, this would mean that for the function $f(t)$ of frame counts against time, the regions with no frames drawn would have derivative $f'(t)=0$ and regions where there is a constant animation would be $f'(t)=k,k>0$ . Let us plot the derivative of $f(t)$ .

Does not look very helpful, but do not give up! The trick to solving this is to take the derivative a second time. Why? From the graph above it becomes clear that we are searching for regions where the first derivative is constant, i.e. $f'(t)=k, k \geq 0$ . In other words, the rate of change of the first order derivative should be zero, or $f''(t)=0$ . Let us plot the chart and mark the points where the second derivative becomes zero after each step:

Comparing this to the original $f(t)$ , our estimate stops right when the animation becomes constant or there is no animation at all. It is not perfect — apps with continuous background animation or video playback will need extra handling — but for the vast majority of flows, second derivative of a frame counter is enough. A nice reminder that not every problem in mobile testing needs a model.

That said, this approach does have its limits. Animations with irregular or non-constant frame rates — think physics-based spring animations, or loaders that ease in and out — can produce a second derivative that never fully settles, which can be remedied by setting a hard max waiting time. In practice, we found these edge cases rare enough in the flows we test that the heuristic holds up well. For the few flows where it struggles, a small per-flow override on the stabilisation threshold is enough to bring it back in line. Overall, the approach has proven robust for our use case and remains a lightweight alternative to polling a vision model after every action.

Explore product