Intro

Saturn is a community-centric social media app for students that allows students to share their schedules with their friends, to see who’s in their classes, and when their friends are free during the day. It has been recently featured as one of the ‘50 Most Promising Startups’ by The Information magazine. 

The background 

Saturn’s iOS team has attracted some of the best engineering talent and utilized the most advanced tools available on the market. However, like every fast-growing application, they struggled with performance while at the same time pressed for the need to ship new features fast. When we began collaborating, they were regularly adding new features every week. Everything you currently see has been transformed throughout the process of assisting them in enhancing the app's performance. It didn't matter for our tooling whether they were shipping features or not; it actually worked like a continuous integration process. In fact, it helped with development significantly. It was actually more helpful during the development phase than afterward.  For instance, the “friends” section on the main screen was not present when we started, and the circle in the middle underwent a complete rewrite.

User Flow: The app cold start

One of the critical issues was the app start - the app has become responsive to user actions after a delay that reached 4 seconds on the latest iPhones.

The issue wasn’t even detectable with their existing tools. 

The standard approach and common observability tools measure an app start by tracking Android's onResume() and iOS's applicationDidBecomeActive(_:), which bring the app to the foreground. This is easy to measure but often doesn't reflect user experience because if after the splash screen disappears, the screen isn't responsive, content isn't loaded, and a user can't utilize the app - they don't care that the splash screen was really short. Merely activating the app does not automatically enable users to engage in meaningful interactions with it.

It is crucial to distinguish between the splash screen and app start, as focusing solely on optimizing the splash screen could lead to overlooking performance opportunities between the splash screen and the home screen, where users can engage with your app meaningfully.

Other tools allowed them to track the system functions only. In the screenshot below, you can see what Saturn could learn from Firebase (Google’s monitoring solution) about the wait time for the user before the application starts:

From the tooling, Saturn engineers could only see how long the app start took without having any clue what took that long. Moreover, from their own experience of using their app, they know that even this information was irrelevant. 

Firebase and other similar tools track only system functions. The application start of every app is a unique sequence of both system and custom functions. By monitoring only the system functions, you can not ensure that the users are getting what they were waiting for once this function is executed. In this case, the tools measured the app's start-up to the moment the logo animation appeared on the user screen. This is quite useless, as by design, the logo animation shouldn’t be visible at all unless the user has a low-end device or has experienced bad network connectivity. In this example, the actual time required for the application to start is as long as 4 seconds.

The Product Science team accurately identified the issues and specific sections of code that required attention. With the assistance of our tool, the performance engineers at Saturn managed to reduce the wait time from 4 seconds to 1.8 seconds within the first month. 

Performance optimization is a continuous process, and once the low-hanging fruits are fixed, many other opportunities reveal themselves. Soon after, the Sturn team reduced wait time even further, down to 0.7 seconds in the second month. 

The other solutions typically allow the developer team to manually mark certain functions for tracking. Even for the manually marked functions, the visibility is relatively low. 

Typically, these markers indicate an area where they possess a deep understanding of how things work, as their full attention is devoted to it. However, the real issue often resides in an entirely neglected section of the code.

What’s important 

Our experience revolves around visibility. PS Tool gives engineers the ability to see where potential problems lie, allowing them to promptly address and improve upon them during feature development. Our tool operates with pre-product shipping, assisting engineers before they deliver the final product to clients.

PS Tool played a crucial role in identifying critical issues that occurred when the user launched the app. Specifically, the app was taking a staggering 4 seconds to respond to user actions. These insights were not apparent to the Saturn team when using other tools, such as Sentry, which estimated a wait time of only 0.6 seconds for this particular user flow.

The key reason for their biggest delay, which allowed them to reduce the time for the app to become responsive, was that the app inflated one of its major views 8 times instead of the expected 1. 

Not only was PS Tool able to show that these 8 iterations happened and took significant extra time, but it also PS Tool highlighted what was triggering each of the inflations. Saturn engineers were using iOS publishers to trigger view updates and they subscribed one of them to the change in particular value. However, the action was triggered not only when a different value was assigned but even when the assigned value didn't change. By some of the iOS engineers, it is considered as common knowledge but from Apple documentation, it doesn't follow directly whether assigning the same value would trigger a publisher. The point is that nobody in the industry, however, experienced, is capable of remembering all the pitfalls and always predicting exactly how their code will behave. Debugging such issues can take hours or days while PS Tool highlights the delay and what is causing it in a matter of seconds.

After fix

Another scenario occurs when certain tasks are scheduled in threads that are excessively busy, causing them to become blocked without the developer's knowledge.

The code was written in a way that gave the impression that certain portions would be executed first, but in reality, that was not always the case. Unfortunately, most thread queues operate on a first-in-first-out basis. They lack intelligence and do not optimize the execution order of the scheduled code segments to ensure they are prioritized appropriately. If engineers had a clear understanding of what was actually happening, they could easily modify the sequence.

The Constraints of Conventional APM and Observability Tools in Mobile Development

Prior to adopting the PS tool, our clients depended on top-notch APM and observability tools, which they still utilize for various purposes beyond enhancing mobile application performance. These clients also have highly skilled performance engineers dedicated to resolving performance issues. However, it was only through the utilization of Product Science that they were able to achieve notable and quick results.

The primary focus of Product Science is on:

  • performance optimization, and
  • mobile development

In contrast, other solutions primarily focus on:

  • error/API accessibility monitoring, and
  • backend development (with no multi-threading)

While other solutions are shifting towards performance management, their primary focus remains on monitoring.

Product Science surpasses the mere identification of problem areas in an app; we also provide insights into the underlying causes of low performance and offer solutions for improvement. We offer comprehensive visibility into the user experience and pinpoint the root causes of issues, enabling customers to resolve them more efficiently. No other tool in the market offers such a high level of visibility.