Prometheus¶
Prometheus is a widespread open-source software package for monitoring and alerting. It collects metrics from configured targets at given intervals to populate its internal timeseries database. PromQL queries can then be run against the timeseries data to display results. Rules allow to specify conditions for generating alarm events, which can be handled by the Alertmanager module to e.g notify and silence alarms.
Integration¶
Sipfront provides "exporter" API endpoints for Prometheus to scrape test results and metrics. This allows you to integrate specific Sipfront tests into your own Prometheus, display them in your existing dashboards and alert using your existing alerting mechanisms.
Concept¶
The Prometheus exporter represent HTTP API endpoints to monitor specific Sipfront tests, and also whole projects containing multiple tests.
Test level endpoints¶
A test level endpoint allows you to scrape the measures and result status of the recurring runs of one particular test over time. There are three different types of endpoints:
- Gauges: scrape measure value/avg
https://app.sipfront.com/prometheus/tests/<testid>/gauges
- Summaries: scrape measure quantiles (median, p95, ...)
https://app.sipfront.com/prometheus/tests/<testid>/summaries
- Status: scrape result status (passed/failed)
https://app.sipfront.com/prometheus/tests/<testid>/status/hour
https://app.sipfront.com/prometheus/tests/<testid>/status/minute
https://app.sipfront.com/prometheus/tests/<testid>/status/day
Gauges¶
Gauges export endpoints allow you to capture all the measure values over time in Prometheus, and plot relevant graphs in your Graphana instance to view how certain metrics evolve during the time of day or the week. Similar to graphs in the Sipfront WebApp UI, you will be able to see edges and anomalies quickly, as they may happen after configuration or infrastucture changes in your systems.
A gauge provides the measure value together with the time of the most recent run of a particular test:
visqol
Recorded Audio MOS (Visqol)transcribe
Speech-to-text transcription confidence
Below gauges for RTP measures provide the average value across all test agent instances involved with the test run:
rtp_rtt
Round trip timertp_mos
Network MOS
Below gauge for RTP and SIPP measures provide the sum value of all test agent instances involved with the test run:
rtp_byte_per_sec
Sent and received bytes per secondrtp_jitter
Jitterrtp_lost
Lost packets countrtp_lost_per_sec
Lost packets per secondrtp_pkt_per_sec
Sent and received packets per secondsipp_call_rate
Calls per secondsipp_concurrent_calls
Concurrent call loadsipp_failed_calls
Failed call count
A measure value can be present multiple times, ie. for each involved call party. Therefore value dimensions are provided depending on the measure:
role
for call party:caller
callee
A
,B
,C
, ...
dir
for call direction:in
out
confidence
for transcription confidence:min
max
avg
Scraped metric names and dimensions can be explored in the Prometheus/Graphana UI. The exported metric names for prometheus follow a canonical format:
sipfront_Internal_Tests_v6_iotcore_sipfront_a_b:rtp_mos
sipfront
: namespace prefixInternal_Tests
: sanitized project name 'Internal Tests'v6_iotcore_sipfront_a_b
: sanitized test name 'v6_iotcore_sipfront_a_b'rtp_mos
: measure
NOTE: Prometheus basically takes the scrape time as measurement time, but accepts the exported test start time only if it is within the last hour. Hence gauges are suitable for tests run with periods less than an hour.
Summaries¶
While gauges for rtp_rtt
, rtp_mos
, rtp_byte_per_sec
, rtp_jitter
, rtp_lost
, rtp_lost_per_sec
, rtp_pkt_per_sec
, sipp_call_rate
, sipp_concurrent_calls
and sipp_failed_calls
report the average measure value over the test run, there are also corresponding summary endpoints which include the pre-calculated median
, p75
, p90
and p95
quantiles as additional dimension.
The exported metric summary names for prometheus follow a canonical format:
sipfront_Internal_Tests_v6_iotcore_sipfront_a_b:rtp_mos_summary
sipfront
: namespace prefixInternal_Tests
: sanitized project name 'Internal Tests'v6_iotcore_sipfront_a_b
: sanitized test name 'v6_iotcore_sipfront_a_b'rtp_mos
: measuresummary
: summary measure suffix
Status¶
Status endpoints provide you with the number of successful and failed test runs over the last minute
, hour
or day
, depending on the endpoint url. This allows you to track test results over time and set alarms if the number of failed test runs during a specific time period exceeds a certain threshold.
The status endpoint is a gauge that reports the run count values using dimensions:
total
total number of runs started during the last fullminute
,hour
orday
passed
number of runs passing the Sipfront test conditions, started during the last fullminute
,hour
orday
failed
number of runs failing the Sipfront test conditions, started during the last fullminute
,hour
orday
The exported metric names for prometheus follow a canonical format:
sipfront_Selenium_Tests_Selenium_Test_UDP_v4_minute
sipfront
: namespace prefixSelenium_Tests
: sanitized project name 'Internal Tests'Selenium_Test_UDP_v4
: sanitized test name 'Selenium Test UDP v4' (n/a for project level status endpoint)minute
: endpoint periodminute
,hour
orday
NOTE: The project level status endpoints will provide the cumulated results of all tests of the project (sum of total/successful/failed runs).
NOTE: The total number of runs is stable, while passed and failing status require time to settle until the run is finished. Make sure the duration of a run is less than the period of the
minute
,hour
orday
endpoint.
Project level endpoints¶
A project level endpoint allows you to scrape measures and result status of the recurring runs of all project tests over time. There are the same types of endpoints:
-
Gauges: scrape measure value/avg of each test
-
https://app.sipfront.com/prometheus/projects/<projectid>/gauges
For a project containing 3 tests, this is quivalent to scraping the 3 individual test level gauge endpoints, ie.
https://app.sipfront.com/prometheus/tests/<test1id>/gauges
https://app.sipfront.com/prometheus/tests/<test2id>/gauges
https://app.sipfront.com/prometheus/tests/<test3id>/gauges
-
-
Summaries: scrape measure quantiles (median, p95) of each test
-
https://app.sipfront.com/prometheus/projects/<projectid>/summaries
For a project containing 3 tests, this is quivalent to scraping the 3 individual test level summaries endpoints, ie.
https://app.sipfront.com/prometheus/tests/<test1id>/summaries
https://app.sipfront.com/prometheus/tests/<test2id>/summaries
https://app.sipfront.com/prometheus/tests/<test3id>/summaries
-
-
Status: scrape cumulated result status (passed/failed) of all project tests
https://app.sipfront.com/prometheus/projects/<projectid>/status/hour
https://app.sipfront.com/prometheus/projects/<projectid>/status/minute
-
https://app.sipfront.com/prometheus/projects/<projectid>/status/day
For a project containing 3 tests, this is quivalent to scraping the 3 individual test level status endpoints, ie.
https://app.sipfront.com/prometheus/tests/<test1id>/status/minute
https://app.sipfront.com/prometheus/tests/<test2id>/status/minute
https://app.sipfront.com/prometheus/tests/<test3id>/status/minute
and querying the sum using a PromQL expression (sipfront_project_test1_minute + sipfront_project_test2_minute + sipfront_project_test3_minute
)
Configuration¶
Integrating Sipfront tests into your prometheus requires two things:
- A Sipfront API key
- The Sipfront test ID or project ID
Obtain an API key¶
In order to scrape the test results, you need an API key to authenticate with the Sipfront API. You can create an API key in the Sipfront web interface under Account
-> API Keys
:
Once created, copy/paste the public key (the username
for prometheus) and the secret key (the password
) and use it in the configuration below.
Configure Prometheus¶
To configure Prometheus to scrape Sipfront test results and metrics, you need to add scrape jobs to your prometheus.yml
as shown in the examples below.
NOTE: Prometheus scrape jobs allow to specify the scrape interval, which needs to be aligned to the frequency of your Sipfront test. To avoid loosing information because of aliasing, make sure to configure a scrape interval that is (less than) half of the test recurrence interval.
Configure a test level gauges scrape job¶
- job_name: sipfront_test_99_gauges
scheme: https
scrape_interval: 60s
scrape_timeout: 50s
metrics_path: /prometheus/tests/99/gauges
basic_auth:
username: your-public-api-key
password: your-secret-api-key
static_configs:
- targets: ['app.sipfront.com']
Configure a project level gauges scrape job¶
- job_name: sipfront_project_77_gauges
scheme: https
scrape_interval: 60s
scrape_timeout: 50s
metrics_path: /prometheus/projects/77/gauges
basic_auth:
username: your-public-api-key
password: your-secret-api-key
static_configs:
- targets: ['app.sipfront.com']
Configure a test level summaries scrape job¶
- job_name: sipfront_test_99_summaries
scheme: https
scrape_interval: 60s
scrape_timeout: 50s
metrics_path: /prometheus/tests/99/summaries
basic_auth:
username: your-public-api-key
password: your-secret-api-key
static_configs:
- targets: ['app.sipfront.com']
Configure a project level summaries scrape job¶
- job_name: sipfront_project_77_summaries
scheme: https
scrape_interval: 60s
scrape_timeout: 50s
metrics_path: /prometheus/projects/77/summaries
basic_auth:
username: your-public-api-key
password: your-secret-api-key
static_configs:
- targets: ['app.sipfront.com']
Configure a test level status scrape job¶
- job_name: sipfront_test_99_status
scheme: https
scrape_interval: 60s
scrape_timeout: 50s
metrics_path: /prometheus/tests/99/status/minute
basic_auth:
username: your-public-api-key
password: your-secret-api-key
static_configs:
- targets: ['app.sipfront.com']
Configure a project level status scrape job¶
- job_name: sipfront_project_77_status
scheme: https
scrape_interval: 60s
scrape_timeout: 50s
metrics_path: /prometheus/projects/77/status/minute
basic_auth:
username: your-public-api-key
password: your-secret-api-key
static_configs:
- targets: ['app.sipfront.com']