What is k6?

k6 is an open-source, developer-friendly, and flexible load testing tool created by Grafana. It helps developers prevent performance issues and improve the reliability of their applications by simulating real-world load conditions.

I had the privilege of using k6 as part of the API Performance Testing team. I played a major role in generating scripts for each API endpoint developed. Here's how the Data Lake Platform team benefits from this, along with some insights about k6 testing.

Initially, this content was meant for sharing with the team, but I'll expand it into an article to share my thoughts with the public and explain why it should be a standard practice for any team building an API. I have no experience with other performance testing tools, but I believe k6 is the most flexible, customizable, and enjoyable for software engineers who love to code.

What is Load Testing ?

Load testing is the process of putting demand on a system and measuring its response.
Key Points
- Focuses on how well a system works under load.
- Different from functional testing which checks if a system works.
- Measures qualitative aspects like responsiveness and reliability.

Why Do Performance Testing?

Generally, doing this helps us identify potential bottlenecks early and address them. Once the API is fully developed, we add an additional performance test. We use the Grafana Dashboard to run the load test and document the load it can handle. Any anomalies are flagged and reviewed for improvements. This keeps us ahead of user demands. There are more benefits, such as:

Benefits:

Improve user experience.
Identify bottlenecks early.
Prepare for unexpected demand.
Increase confidence in the application.
Assess and optimize infrastructure.

As a Backend Engineer who prioritizes data integrity and correctness, testing by the Data Engineers and Data Solutions team is not enough. We must also ensure that the software engineering team can test thousands, if not millions, of different transaction records in detail to minimize potential issues. In the k6 automation script, we added an extension to check for any data discrepancies between the expected data and the stream data from the API call, which we run in a pipeline. This helps detect issues so QA can report them, and the team can investigate the API further. Sometimes, the problem is due to column changes in the database or incorrectly defined logic causing missing rows. If not quickly resolved, we can mark these as known issues.

Common Excuses & Counterpoints:

"Our application is too small" → Even small systems benefit.
"It's expensive/time-consuming" → Cost of not testing is higher.
"Requires extensive technical knowledge" → Range of complexity available.
"We don’t have a performance environment" → Alternatives exist.
"The cloud is infinite" → Efficiency still matters.

I believe it's a good practice and greatly improves the efficiency and quality of the API. It also makes life easier for QA, as they no longer have to manually check every detail of data streams. Now, let's learn more about k6 and what else it can do besides load and stress testing.

Load Testing vs. Performance Testing

                Performance testing != Load testing

Performance testing verifies how well a system works as a whole, including aspects such as scalability, elasticity, availability, reliability, resiliency, and latency.
Load testing is just one type of performance testing, and it is an approach that can be used to test many of aspects of application performance. However, not all performance testing involves load testing.
Load testing is a sub-practice of performance testing.

Load Test Scenarios

A load test scenario combines specific values of test parameters. Each scenario recreates a certain situation or set of conditions that the application will be exposed to.

Load test scenarios are often called load test types. Some of the most common scenarios are listed here.

Example Code

import http from 'k6/http'
import { sleep, check } from 'k6'
import { Rate, Trend } from 'k6/metrics'

// Custom defined metrics
const lowerThan2sRate = new Rate('lower_than_2s')
const durationInSeconds = new Trend('duration_in_seconds')

// This BASE_URL won't work if you're using Docker.
// You'll need to know the IP address of the host.
// Then replace localhost with the IP address.
const BASE_URL = 'http://localhost:3000'

export const options = {
    vus: 1000,          // 1000 users will be simulated
    duration: '1m',     // the test will run for 1 minute
    thresholds: {       // you can define threshold here. all criteria of a successful test
        lower_than_2s: [{               // custom defined metrics
            threshold: 'rate>0.75',     // the result rate should be above 75 percents 
            abortOnFail: true,          // if the criteria wasn't met, then the test is aborted
        }],
    }
}

export default function(data) {
    // We need this to pass the authorization and authentication middleware
    const params = {
        headers: {
            'Content-Type': 'application/json',
            Authorization: `Bearer ${data.token}`
        }
    }


    data.incomeExpenseTypes.forEach(t => {
        const payload = {
            value: 10000,
            description: 'Test',
            income_expense_type_id: t.id,
            is_income: false
        }
        const res = http.post(`${BASE_URL}/income-expense`, JSON.stringify(payload), params)

        // this check function will run at each iteration
        // making sure that the checks criteria is met
        check(res, {
            'is success': (r) => r.json().success,
            'duration below 2s': r => r.timings.duration < 2000
        })

        // lowerThan2sRate is added by one if the duration is below 2s
        lowerThan2sRate.add(res.timings.duration < 2000)

        // we know that the duration is in millisecond
        // but for demonstartion purposes, we convert it to second
        durationInSeconds.add(res.timings.duration / 1000)
    })

    // sleep for one second at each iteration
    sleep(1)
}

Test parameters

This are the non-exhaustive list of common test parameters:

Virtual users (VUs): 1 VU's activity represents that of 1 real user.

Iterations: The total number of repetitions of the script to be executed by the VUs.
Throughput: A measure of how much load the test generates over time

Load profile: The shape of the traffic generated by the test over times.
Duration : The time it takes to run the entire test and its individual stages.

Shakeout Test

Small test to check for major issues. Sometimes known as a smoke test, checks for major issues before spending more time and resources.

Typically uses one or a few VUs that run for a short amount of time . If test fails any issues must be resolved first.

Name: Shakeout Test
Total VUs: 5    
Ramp-up: 0 seconds    
Duration: 10 minutes
Ramp-Down: 0 seconds

Average Load Test

Simulates typical production load. Typically includes ramp-up and ramp-down periods to simulate users gradually logging in and interact with the system. The test sustains the steady-state load simulation for an hour or so.

Name: Average Load Test
Total VUs: 100    
Ramp-up: 30 minutes
Steady state: 60 minutes
Ramp-down: 10 minutes
Total duration: 100 minutes

Stress Test

simulates the traffic that the application is expected to experience at the highest point of the day or season.

a good test scenario when testing rush hours or sale periods during which the application faces abnormally heavy load.

Soak Test :

Soak tests, also called endurance tests,

Tests with a longer duration than average.
Some performance bottlenecks, such as ones caused by defects in memory management, appear only during longer periods of time.

Name: Soak Test
Total VUs: 50
Ramp-up: 30 minutes
Steady state: 480 minutes
Ramp-down: 10 minutes
Total duration: 520 minutes (8 hours and 40 minutes)

Spike Test :

A spike test, recreates a situation where the application experiences a sudden and massive increase in traffic. Spike tests are good for simulating timed events like:

product launches or sale of concert tickets
deadlines (last days of tax submissions)

Name: Spike Test    
Total VUs: 300
Ramp-up: 1 minute
Steady state: 20 minutes
Ramp-down: 5 minutes
Total duration: 26 minutes

Breakpoint Test :

Identifies the load level at which performance degrades and build confidence in what a system can handle. The results from breakpoint tests provide valuable inputs for capacity planning.

Name: Breakpoint Test    
Total max VUs: unknown
Ramp-up: 10 minutes before each stage
Steady state: 30 minutes
Ramp-down: 0 minutes
Total duration: unknown

Conclusion

In conclusion, performance testing with k6 is an essential practice for any team developing APIs. By simulating real-world load conditions, k6 helps identify potential bottlenecks early, ensuring applications are reliable and responsive under various scenarios. The tool's flexibility and developer-friendly nature make it a preferred choice for software engineers. Implementing load testing not only improves user experience but also prepares the system for unexpected demands, optimizes infrastructure, and enhances overall confidence in the application. As demonstrated, k6 offers a comprehensive approach to performance testing, making it a valuable asset in maintaining high-quality software standards.

Demo

This demo is provided by Grafana labs itself. Was named QuickPizza demo app.

Somehow k6-workshop need minor fix to run the test can view this issues for solution

I did performance testing with K6