The significant impact of performance on user satisfaction, loyalty, and, consequently, your app's revenue is undeniably vital for app developers and should never be overlooked. You can read more about this topic in detail here: Boosting App Revenue: 6 Reasons Why Mobile Performance Matters.
Consider this scenario: you're feeling hungry and attempt to order food through Food Delivery App A. However, App A takes an eternity to load, leaving you hungry and frustrated. In a world with numerous food delivery options, you swiftly switch to Food Delivery App B, which offers quick order placement and delivery, effectively meeting your needs. As a result, you become a loyal customer of the fast app, while the slow App A leaves a negative impression. This personal experience vividly illustrates the importance of app performance for user satisfaction and loyalty.
If this experience hasn't resonated with you yet, let's take a look at some data:
- According to mobile marketing analytics vendor AppsFlyer, nearly one in every two apps are uninstalled within 30 days.
- A study by Andrew Chen found that losing 80% of mobile users is "normal" for all but the most popular apps.
- A report by Think Storage Now found that 70% of mobile app users will abandon an app that takes too long to load.
- An older but still often cited Compuware study found that 84% of app users will abandon an app if it fails just two times.
In this competitive market, mobile app developers must prioritize continuous availability and a seamless user experience to reduce churn. Mobile users expect instant responses to taps, swipes, or any inputs. To ensure a consistently excellent user experience, continuous monitoring and enhancing app performance are paramount to retaining users and succeeding in the highly competitive mobile app landscape.
When it comes to optimizing existing code or writing efficient code from scratch, it turns out that most optimization opportunities fit into just a few patterns, being at the same time the most common, the most impactful on the user experience, and at the same time often relatively low effort to implement.
Pattern #1. Wasting time in unrecognized queues
One of the most common code patterns slowing down your app is wasting time in unrecognized queues. When you design a mobile app and try to implement some feature or page, you may have multiple processes running concurrently, utilizing multiple threads.
Even if the process you design is fine on its own, while you schedule things to run on multiple threads, you will inevitably experience bottlenecks when scheduling something to execute a parallel thread, as shown in the example below.
Instead of the function you scheduled executing instantly, it would only run after an unavoidable delay. That delay will not be due to the process but to some unrelated work blocking the thread it's trying to execute.
This happens most commonly on the main thread because many methods must be executed there. However, it can occur in dedicated threads for any particular purpose, such as local database updates, among many more examples).
These cues often remain hidden within the implementation details of libraries or dependencies we employ. For instance, consider this diagram showcasing the behavior of the widely-used OkHttp Library.
OkHttp is popular among developers and appears as a transitive dependency because of its use for networking in third-party libraries like ExoPlayer.
In this example, imagine an action on the main thread, like a button press, triggering a network request. OkHttp employs a default dispatcher with a somewhat obscure implementation detail - it limits concurrent network requests to just five. Therefore, if five network requests are already in progress when you attempt to initiate a sixth, the additional request will have to wait for one of the existing requests to finish before it can be executed.
In the diagram above, we illustrate that OkHttp thread #2 completes one of these network requests, allowing our pending network request, which happens to be the sixth, to proceed on a separate thread for data retrieval. This situation is not unique to OkHttp; similar situations can occur in various contexts. For example, the Room database restricts transactions to one at a time. Additionally, misunderstandings regarding code functionality can lead to these bottlenecks. In RX Java applications, parallel operations may execute sequentially.
Understanding such situations can be challenging, making it imperative to employ tools to help understand what's happening under the hood.
Pattern #2. The main thread is overwhelmed.
The second pattern is very similar to the previous one and could even be considered its special case, giving a different point of view on queues and focusing on the main thread overwhelm. This is something that a lot of engineers are aware of and track. It concerns the case when excessive work is happening in the main thread. It affects the user's experience because important method calls can only be executed in the main thread (which includes rendering and showing each of the frames you see on the screen). In particular, in cases when the user experience should be smooth, for example, when you have some continuous process showing on the screen, such as scroll, animation, or a transition, you want to see smooth and even movement from frame to frame. In this case, what happens is that the space for the queue that you can afford is getting shorter. For example, the more advanced our phones become, the higher the refresh rate both phone manufacturers and app developers want to show. On most modern Android phones, you can get a frame rate as high as 120 frames per second, leaving just over eight milliseconds between consecutive frames. Within this time interval, you have to fit in the rendering of the next frame and any other processes or events you want to execute in the main thread. As a result, even a tiny 10-millisecond delay would affect user experience.
As you can see in the scrolling example below, it can cause a frame drop, resulting in screen jitter (aka jank).
The user will immediately notice it, unlike in other cases where a 10-millisecond delay is negligible.
Most Android phones operate at 60 frames per second, about 16 milliseconds per frame. Even though 10 milliseconds of work can be executed within one frame, it will use up most of the time available. An important point about frame rates is that you can only rely on something other than the Android operating system to tell you when you're dropping frames. Android only sends a missed frames message when 30 frames or more are dropped. That amounts to about half a second of delay, which would be very noticeable in any app and circumstance.
One example of moving work from the main thread is provided by the author of a popular Android Library called Lottie. Lottie is an animation library with two functions:
- it draws the animation on the screen
- it computes an update for the next frame.
Before a recent release to the library, the computations that did the update were occurring on the main thread before drawing. This is demonstrated in the diagram below.
The recent release moved the update step to a worker thread. The update can immediately start on the background thread after drawing the frame based on the last computations. That frees up the main thread for additional drawing or other work that must happen on the main thread.
Furthermore, the update takes quite a bit longer than the drawing, so the change dramatically reduced the main thread footprint of the animations. The tradeoff is a slightly more complicated program flow that requires concurrency primitives such as synchronized blocks and locks. But, the result is a significant improvement that impacts all of Lottie’s many users.
So far, previous patterns covered cases where the process we're examining is well-designed, and the focus was on how it interacts with the rest of the app's processes. Now, we're shifting our attention to the process itself and how we can optimize its execution.
The next three categories are all related to one central pattern or improvement opportunity: dispatching tasks to parallel threads as early as possible. This is a fantastic opportunity to improve user experience by reducing wait time. However, it's not always straightforward to implement in practice because we simply don't always spot it while writing code.
Let's identify the next three common situations where dispatching happens too late and could benefit from optimization.
Pattern #3. Unnecessary UI-dependencies.
The first one is Unnecessary UI Dependencies. It occurs when a part of an app, running on a parallel thread that's not visible to the user (like loading content from a server), depends on a certain view update. As shown in the example below, you might have two network requests that could be sent in sequence, but often, the first request updates the UI and delays the second one. The key is to schedule the second network request as early as possible once the data from the first one is parsed, even if it's just a few frames earlier. This can save valuable user wait time, sometimes even seconds.
You might create delays if you wait for a UI update, especially an animation at the beginning of page loading. Often, these are seen as product or design decisions, but they can impact loading times. Avoid dependency on animation and other UI updates for a smoother user experience.
Below, we can see two apps designed for trading stocks. Both apps have similar pages with overviews of traded companies. However, there's a noticeable difference in handling data display and loading.
Flink is significantly slower at displaying retrieved data, whereas Robinhood presents data much more quickly. Both apps have transition animation revealing the next screen (the company details page). Both apps show stock data, and the line chart shows changes in the stock price. It is noticeable that while both apps reveal the new page at a similar time, the one on the left shows the stock price chart much later.
A common and straightforward reason for such a delay in showing the data is that the app on the left only starts downloading chart data when the transition animation is complete, and the layout is ready for the data to appear. Such a dependency creates an inevitable visually noticeable delay. Alternatively, downloading the same data during the animation helps utilize user wait time efficiently and create a smoother experience.
You’ll see this sort of main thread dependency because new developers might be taught a certain architectural way to do it in tutorials. However, these tutorials might only consider the ease of understanding the code without the impact on performance. Consider the code example below, which is representative of many beginner tutorials.
You have a ListActivity, say, a list of stocks. When you tap on a stock in the list, you make a transition to a new DetailActivity, and then once the fragment has animated onto the screen, inflated all of its views, and done all the work that it needs to do in its onCreate method, only then you start to load the data.
It’s straightforward to move a couple of lines of code in this example at the very bottom of the code above (client.newCall) into the button ‘clickListener’ and just start the network request as soon as the tap occurs. That will make the animation more delightful because while the user watches the animation, the data is loading. The time spent animating is also time spent loading, so the data appears sooner.
Pattern #4. Network requests: late scheduling
The fourth pattern is highly focused on network requests and their scheduling. This pattern has some intersection with the previous one. The common issue this pattern addresses is network requests being scheduled too late.
The reason for identifying it as a separate pattern is that network requests usually account for most of the wait time in many apps. Therefore, any optimizations related to rescheduling and parallel execution of network requests can significantly improve the user experience by reducing wait times.
Often, the problem arises when dispatching a network request to the server, as you have to choose when to schedule it. Ideally, schedule it as early as possible. However, what often happens is that you end up waiting, either for a UI update (as mentioned in the previous example) or for various other reasons, particularly around app start-up.
So, anytime you're scheduling a significant network request that your user will be waiting for, take a moment to reconsider and ask yourself if you could have started it earlier.
One of the interesting aspects of this pattern is when we examine traces around app starts. We often encounter lengthy app start times where much work is being done. We've discovered that it's quite common to use various libraries in modern apps, such as analytics, authentication, or crash reporting. These libraries often advise initializing them in the application object’s onCreate method. This may be necessary, but sometimes, you can delay these initializations.
Offloading some tasks that typically occur at app startup can help you reach a usable screen more quickly. For instance, you can initiate a network request in a service before loading all these other libraries you'll use later.
Another interesting observation is the impact of dependency injection in this context. At the app's start, many dependency injection frameworks will inflate an object graph of all the objects in the application scope. That makes it important to scope objects in their dependency-injection-related configurations properly. Proper scoping prevents loading objects at application startup that could be deferred until later in the user flow or might not even be necessary.
Pattern #5. Network requests: sequential could be executed in parallel
This pattern is also related to network requests. In many page-loading scenarios, you need to fetch data from the server. Very often, we see multiple network requests that could be running in parallel but are executed sequentially. There can be multiple reasons for this, but the most actionable reason is when it is unrecognized that the requests are, in fact, independent.
If you have a sequence of network requests in your loading pattern, ask yourself whether the second or third one needs the data from the previous one to run. You may have enough information to schedule them in parallel or separate them so they can be scheduled in parallel. One reason parallelization is so significant is that network delays are often considered issues on the back-end. Back-end changes may require a lot of resources. But a lot of network-related optimization can be done on the front-end.
Often, when optimizing the back-end, an improvement of just a few percent or 10% of optimization is already considered a big win. On the front-end, if you put two requests that were running in sequence in parallel, you can save something like 30% to 50% of the wait time. The shortest network request will have its effective delay reduced to zero by moving it out of the critical path, reducing the wait time to that of the longer network request. So this is a very cheap improvement in engineering effort, but it gives you an outsized reduction in the delay.
Given that previously discussed patterns were similar and highly intersecting, here is a bonus that stands out from this crowd and doesn’t stop surprising us.
Bonus pattern: timer dependency and other legacy code
Another issue that can affect the performance of an app is when trade-offs are made to ship features fast.
Unfortunately, this kind of workaround tends to stay in the code much longer than one might expect. This isn’t just a factor for smaller apps. We have also found cases in huge apps with tons of legacy code. Among those quick fixes, the one that we find to happen the most often is the hard-coded delay. One example is using a timer or hard-coded delay to eliminate race conditions or ensure that resources are ready before making method calls that would otherwise fail.
Consider the figure below.
If you cannot check when a certain event is finished to schedule the second one, you might introduce a hard-coded delay to cover the majority of the use cases. While this may work to get the app out to market, you will find that all your users will be limited by the behavior on the lowest-end devices (on the slowest phones on the market) because this hard-coded delay must work for all your users.
It would be better to eliminate the hard-coded delay by detecting when it is possible to move forward rather than waiting an arbitrary time. While it may require more sophisticated code to ensure each device experiences the best performance possible, it is worth the tradeoff. Furthermore, the app performance gains in these cases will be robust to future improvements in device performance.
Stay tuned by following us on LinkedIn.
Mobile Developer? Sign up for Product Science's mobile performance newsletter.