Bank API uptime figures are marketing numbers. The 99.5% availability a bank's developer portal advertises does not correspond to what your integration actually experiences in production. This article explains the gap between reported uptime and functional availability, and lays out a practical framework for designing systems that stay resilient when the upstream behaves badly.
Why reported uptime is the wrong metric
PSD2 Article 32 requires ASPSPs to publish statistics on the availability and performance of their dedicated interfaces. The EBA RTS on SCA specifies that these statistics must be made available to TPPs on request or published on the ASPSP's developer portal. In practice, these figures measure whether the ASPSP's API server returns any response — not whether that response is correct, complete, or useful.
A bank API that returns HTTP 200 with an empty transactions array is, by most server-side availability metrics, "up". A bank API that processes your consent initiation but fails to generate a usable authorisation URL — returning a malformed LinksHref — is "up". A bank API that accepts a payment initiation but never triggers the status webhook is "up".
The metric that matters for a fintech integration is connection success rate: the proportion of user-initiated flows that complete successfully from consent initiation to token issuance, or from payment initiation to confirmed status. This is a product-level metric, not an infrastructure metric, and ASPSPs do not publish it.
How bank APIs actually fail
Bank API failures fall into five categories. Understanding the taxonomy helps you design the right mitigation for each.
1. Hard outages
Total unavailability — the endpoint returns 503 or times out. These are the failures that show up in availability statistics. They are also the easiest to handle: retry with exponential backoff and surface a clear "bank temporarily unavailable" message to the user. Hard outages account for a small fraction of total failure volume.
2. Degraded performance
The endpoint responds but with latency far outside normal. A token endpoint that normally responds in 800ms takes 12 seconds. A transaction fetch that normally returns in 2 seconds times out at 30. Users abandon long-loading bank redirects, which appears in your funnel as user dropout rather than API error. These failures are invisible to basic uptime monitoring unless you track response time distribution alongside error rate.
3. Silent errors
HTTP 200 responses with malformed or incomplete data. Empty transaction arrays when transactions exist. Balances that return stale data from a cached copy. Consent objects with a Status: Authorised that cannot actually be used to fetch accounts. Silent errors are the hardest failure category to detect because your monitoring shows green while your product is broken for a subset of users at a specific bank.
4. Partial session failures
The authorisation flow completes but the resulting access token works for some endpoints and not others. A token that fetches account details successfully but returns 403 on transactions is a partial session failure. This is more common than it should be, particularly on multi-product ASPSPs where different account types (current accounts vs credit cards vs savings) are served by separate backend systems under a unified API surface.
5. Maintenance windows without notice
PSD2 Article 32 requires ASPSPs to give TPPs five days' notice of planned maintenance that affects the dedicated interface. In practice, notification quality varies. Some UK banks are reliable; others publish a notice to their developer portal at midnight before a Sunday morning maintenance window. Your integration should subscribe to ASPSP status feeds where available and design for unannounced maintenance windows regardless.
Designing SLOs when your upstream is a retail bank
Service Level Objectives for a fintech product that depends on bank APIs must be structured differently from typical SLOs for a product with a self-owned backend.
Separate the SLO from the upstream constraint
Define your SLO in terms of what your product guarantees to users, then track separately the portion of SLO budget consumed by upstream bank failures. If your product SLO is 99% monthly connection success rate, and a single large bank has a 6-hour partial outage that degrades 15% of your users' connection attempts, that event consumes your error budget independently of anything your engineering team did.
Per-bank error budgets
Aggregate uptime across all banks masks the variance between institutions. A reliable rate of 99.2% across 2,000 banks conceals that one bank delivers 94% and another delivers 99.8%. For your product, the distribution matters more than the average — because users are associated with specific banks, and a user at a bank with chronic failures experiences those failures 100% of the time.
Maintain per-bank success rate metrics. Set per-bank degradation thresholds that trigger: circuit breakers to stop sending traffic to a failing ASPSP; user-facing messages specific to that bank ("Your bank is currently experiencing issues — try again in a few hours"); and internal alerts to your on-call engineer.
Differentiate transient vs persistent failures
A connection failure that succeeds on the first retry is a transient failure and should not count against your SLO budget. A failure that fails three consecutive retries at 30-second intervals is a persistent failure and should. Your error classification must distinguish the two. Retrying a payment initiation without classifying the failure first risks duplicate payment initiations — some ASPSPs do not implement idempotency keys consistently, so a second initiation attempt may create a second payment order.
Idempotency key handling
UK OBIE specifies the x-idempotency-key header for payment initiation. Berlin Group specifies X-Request-ID as an idempotency identifier. Use these headers on every payment initiation and store the mapping between your internal order ID and the idempotency key before making the API call. On retry, reuse the same idempotency key. Do not retry payment initiations without idempotency key consistency — the financial consequences of duplicate payments exceed the cost of designing the retry layer correctly.
What to measure in production
The metrics dashboard for a fintech integration should include, per bank and in aggregate:
- Connection success rate: user-initiated flows that reach a usable access token ÷ total initiation attempts
- Token fetch latency P50 / P95 / P99: measured at the token endpoint response, not your application's end-to-end time
- SCA redirect completion rate: users who land on the bank's authorisation URL and return with an authorisation code ÷ users who were redirected
- Post-auth fetch success rate: successful account or transaction fetches using an active token ÷ total fetch attempts with a non-expired token
- Webhook delivery latency (if using webhooks for payment status): time from payment initiation to first webhook receipt
The gap between connection success rate and SCA redirect completion rate often reveals UX issues at the bank's authorisation screen that you cannot control but can surface to users with better pre-redirect copy ("This will take you to [bank name]'s secure login — you'll need your mobile banking app to complete the step").
The aggregator's role in reliability
A bank API aggregator adds a reliability layer between your application and the raw ASPSP surface. At minimum, this means standardised retry logic per-bank, circuit breaker state tracked across the aggregator's entire connection pool, and normalised error codes that let you handle ASPSP-specific failures without per-bank conditional logic in your own code.
The questions to ask an aggregator about reliability:
- Do you publish per-bank 90-day success rate metrics, or only aggregate uptime?
- How do you handle idempotency key propagation on retried payment initiations?
- What is your circuit breaker threshold and recovery window?
- How quickly do you detect and suppress traffic to an ASPSP in a silent-error degraded state?
Silent error detection — where the ASPSP returns 200 but the data is wrong — requires the aggregator to maintain a canonical view of what a valid response looks like for each bank and each endpoint. This is expensive to build and requires ongoing maintenance as bank APIs evolve. It is the category of reliability work that most aggregators underinvest in relative to basic uptime monitoring.
Practical recommendations
To summarise the design principles:
- Never use ASPSP-reported uptime as the basis for your own product SLOs
- Track connection success rate per bank, not aggregate uptime
- Implement circuit breakers with per-bank state, not global state
- Classify failures before retrying — especially on payment initiations
- Use idempotency keys consistently and store them durably before making the call
- Subscribe to all ASPSP maintenance feeds and build for unannounced maintenance regardless
- Surface bank-specific failure messages to users rather than generic error states
Bank API reliability is a domain problem more than an infrastructure problem. The tools are standard — circuit breakers, retries, error budgets — but the inputs (per-bank failure modes, ASPSP-specific error semantics, PSD2 regulatory requirements around retries) require familiarity with how specific banks actually behave in production, not just how their API documentation says they should.


