6 Things I Learned From WebRTC Stress Testing

Tsahi Levent-Levi at Spearline outlines six things they have learnt from their WebRTC stress testing.

In the last 8 years I’ve been on this journey of WebRTC testing and monitoring with testRTC. I had a chance to learn a lot from our customers. I’d like to share some of these learnings with you, focusing on WebRTC stress testing:

#1 – WebRTC Stress Testing Comes in Different Shapes and Sizes

When developing a WebRTC application, there comes a point in time when you need to scale that application – make sure it works for more users, in more locations, in more ways.

At that point, you will need to stress test your WebRTC service.

The thing you will need to ask yourself is what is WebRTC stress testing anyway? Here are a few ways in which I’ve seen our clients stress test their WebRTC app:

Figure out how many users can be crammed into a single session

As active speakers
Passive viewers in a stream or a session
With cameras open

How many users and sessions can fit into a single media server

To decide on sizing
Because we want to optimize the machine specs we end up using in the cloud

Scaling out

Check that as more users/rooms join the service, you load balance them on multiple media servers and signaling servers
See what happens when users from multiple geographies join a single session when your service optimizes for location, connecting users to the closest media server

Registrations per second, calls per second, …

What happens when many users join the service in the exact same second? Do they all succeed? Do some fail to connect?
Figure out what CPS (calls per second) you support properly

Soak testing

If we load the service for a long period of time – what happens?
Do we observe crashes? Memory leaks? Increased latency?

End-to-end

Stressing the web servers, API servers, signaling servers and media servers all at the same time
Getting to the max capacity we want to serve across the whole service

Different developers look at and understand stress in very different ways. Which ones apply to you?

#2 – Predictable and Repeatable Win the Day

Predictable and repeatable means that if I run the same test scenario multiple times, I expect to get roughly the same results and experience the same behavior.

WebRTC is a complex beast. It relies on a lot of moving parts: location, network performance, connectivity, etc. Getting all these lined up and static for repeated executions of test scenarios is important.

Why is that important? So that developers can run and debug the same test scenario of a failed test reported by the QA. And that the QA can run the exact same test once the bug was fixed to validate the fix.

It also enables you to work on optimizing the performance of the application and then running the same test scenario to see if the optimization work improved what we expected or not.

Repeatable bugs are golden. They almost invite you to solve them since they aren’t hiding anywhere. The more bugs you can make repeatable, the better it is for you and your developers.

While we’re at it, having a testing infrastructure in place that makes as many moving variables as possible static means that more of your tests are going to be predictable and repeatable.

#3 – Seeing the Forest From the Trees

Running a large test with 100s of browsers? Great!

What do you do with the results? Do you go from one webrtc-internals dump file to the next to figure out how each and every browser behaved in that test?

With so many trees (=browsers), how do you see the forest (test scenario results)?

Here are a few things you’ll be needing to see the forest:

Aggregate metrics analysis – averaged or summed
Disseminated failures and warnings of browser results to the total test result

With WebRTC, we aren’t interested only in browser performance metrics. We are interested in the WebRTC metrics data and its aggregated analysis.

#4 – Seeing the Trees in the Forest

The opposite is equally important. Once you have a test result and you’ve looked at the aggregate data, it is imperative to be able to look at specific trees (=browsers).

Two things that are critical here:

The ability to easily pinpoint in which browser there were any WebRTC “fluctuations” – things that shouldn’t have happened, like high packet loss, disconnections, high latencies or plain failures
Drill down capabilities so you can analyze the individual browser and the WebRTC peer connections it had

If there’s no simple way to understand and pinpoint the culprit in a large test then you are going to waste a lot of time searching for it.

#5 – Small Tests, Large Outcomes

Large tests come at a price:

They use more resources so they cost more
They take longer to plan, run and analyze

In many cases, what you should be after is a kind of a Minimal Viable Test (MVT anyone?)

What I mean here is that you want to be able to find as many bugs as possible without running tests that are too large. And from experience, most of the bugs you will find with 1,000 or more browsers can be found with 100 or 200 browsers targeting just a single server unit planned for production.

This isn’t to say you shouldn’t be running larger tests – just that you should try to get the most out of the smaller WebRTC stress tests first. This way, you’ll be able to clean and fix your WebRTC application a lot faster and waste less time while doing so. Once ready, you can move on to the next set of bigger stress tests.

Running large tests each time is great, but you’ll end up with a lot of false positives in the failures as well as with too much log data to review in order to find bugs that are easier caught in smaller test runs.

My suggestion? Aim for incremental progression of your stress testing: Conquer 10 concurrent browsers. Then move on to 50 or 100. From there, 200-500. And then aim at where you want the larger ones to be.

#6 – Shorter Cycles = Lower Overall Costs

Here’s the crux of it all. What we are trying to do is to reduce our WebRTC development cycles. Shorter cycles means faster development. Jeff Attwood sums it up nicely in an old post of his titled Boyd’s Law of Iteration:

“This leads to Boyd’s Law of Iteration: speed of iteration beats quality of iteration.”

If you are stress testing, then you are in for a world of pain:

Things are likely to break in different ways
The test script will need to be modified to fit your infrastructure’s behavior under load
Signaling or media servers might fail and will need fixing
Configurations of servers will need fine tuning
Network capacities and routes may need to be revisited
Etc

There are just too many things that can and will break during the WebRTC stress testing process.

To excel and go through this efficiently, it is best to aim for shorter test cycles and quick iterations. And as I’ve said already, bigger tests take more time and resources.

Which means that running smaller tests aimed at stress testing and then growing the test size as we progress in our test plan means you gain speed of iteration.

And the faster you iterate, the faster you’ll be able to solve things and question the results you see, helping you towards the next stage of your WebRTC stress testing project.

Where to Start?

Developing a WebRTC application?

Think and plan ahead for your stress testing from day 1:

Decide if you want to build the whole testing rig in-house or use a 3rd party (I know a great vendor for your WebRTC testing if you’re looking for one)
Determine the scale and parameters you are going to be most interested in (you can change them as you move along, but better have some guidelines to start with)
Aim for effective testing that shortens your development cycle and favors speedier iterations

Author: Guest Author

Published On: 16th Jan 2023
Read more about - Guest Blogs, Spearline