Account aggregation across European banks: the data normalisation problem nobody warns you about

Data normalisation pipeline diagram mapping diverse bank transaction schemas to a unified format

Every bank implementing PSD2 returns account and transaction data in a slightly different format. The specifications are shared — UK OBIE, Berlin Group NextGenPSD2 — but specifications are not implementations. This article documents the specific normalisation work that makes or breaks a European account aggregation product, drawn from connecting to banks across 28 countries.

Why normalisation is harder than it looks

When an engineering team first reads the UK Open Banking Read/Write API Spec or the Berlin Group XS2A spec, the data schemas look reasonable. Transaction objects have amounts, dates, descriptions. Account objects have identifiers and balances. The fields are named consistently within each spec.

The problem is that banks implement the specs selectively. Required fields are sometimes absent. Optional fields are populated inconsistently. The same logical concept — the amount of a transaction, the date it hit the account — is represented differently across institutions even within the same specification. And then there is the UK-vs-EU split, which adds an entirely separate set of schema differences on top of per-bank variation.

What follows is a catalogue of the inconsistencies that cause the most production problems, and the normalisation strategies that address them.

1. Amount sign conventions

This is the most common source of display bugs in personal finance apps.

Under UK OBIE, transaction amounts are always positive. The debit/credit direction is indicated by the CreditDebitIndicator field, which is either "Credit" or "Debit". A payment of £42.50 leaving the account is represented as amount "42.50" with indicator "Debit".

Under Berlin Group, transaction amounts are typically signed: a debit transaction is returned as "-42.50". But not always. Some Berlin Group implementations return unsigned amounts with a creditDebitIndicator field. Some return signed amounts with the indicator as well, leading to an apparent double-negative on debit transactions if you naively apply the sign and the indicator.

Normalisation rule: determine the signing convention for each ASPSP and apply a consistent canonical representation (signed amount, where negative = debit) at ingestion time. Do not apply sign transformation at display time — transforming at display creates subtle bugs when the same transaction is accessed by multiple clients.

2. Date fields: booked vs value date

Every transaction object in both specs has at minimum a bookingDate (when the transaction was posted to the account) and optionally a valueDate (when the funds settled, relevant for interest calculations). The problem is not the existence of these fields but their population.

Many UK banks return only BookingDateTime (full ISO 8601 with time component). Others return only BookingDate (date only). Some return a ValueDateTime; others do not populate the value date at all. A bank that processes card payments may only populate the booking date; a bank with strong treasury operations may populate value date for all transactions.

For a spending analytics product that shows "you spent £X on Saturday", the choice of which date field to display is a product decision — but the normalisation layer must handle the case where the expected field is absent. A safe fallback: use value date if present, fall back to booking date. Never use the transaction creation timestamp in your own system as the display date — it will be wrong for pre-existing transactions fetched at initial consent authorisation.

3. Transaction descriptions and reference fields

The field that should contain "Costa Coffee Moorgate" contains whatever the bank decided to put there. Across banks you will encounter:

  • Truncated merchant names at various character limits (some banks cut at 18 characters, some at 35, some at 140)
  • Uppercase-only descriptions from older bank systems ("COSTA COFFEE 0347 MOORGATE")
  • Reference fields that contain payment references rather than merchant names ("PAY REF: 203847293")
  • Berlin Group's remittanceInformationUnstructured field, which combines merchant name, payment reference, and sometimes the counterparty account number into a single unstructured string
  • Descriptions that have been normalised by the bank (merchant name matching applied by the bank, resulting in a cleaner string but sometimes incorrect categorisation)

There is no reliable way to extract a clean merchant name from raw transaction descriptions across all banks without an enrichment layer. The normalisation approach: pass the raw description field through unchanged, with a parallel enriched description field populated by a merchant name matching process (either your own or a third-party enrichment service). Do not overwrite the raw field — debugging data quality issues requires access to what the bank actually returned.

4. Transaction unique identifiers

UK OBIE requires a TransactionId field for posted transactions. Berlin Group includes transactionId as optional — and many ASPSPs omit it or return non-unique values across the transaction history.

Non-unique transaction IDs are more common than they should be. Some banks reuse a transaction reference number that is only unique within a short rolling window, not across the full transaction history. Some return a sequential ID that resets monthly.

Deduplication strategy: use a composite key for identity — amount + booking date + creditor/debtor name + entry reference or remittance information. This will fail on edge cases (two identical transactions to the same merchant on the same day for the same amount) but handles the vast majority of cases. For the edge cases, accept duplicates and surface them in your data quality monitoring rather than silently dropping one.

5. Balance types and the ASPSP balance model

The OBIE spec defines multiple balance types: ClosingAvailable, ClosingBooked, Expected, InterimAvailable, InterimBooked, OpeningAvailable, OpeningBooked, and others. The Berlin Group spec has closingBooked, expected, authorised, and openingBooked among others.

The balance type your product needs for a "current balance" display is typically InterimAvailable in OBIE (balance available for use at the current moment) or expected in Berlin Group. But not all banks return all balance types. Some banks return only ClosingBooked (yesterday's end-of-day balance), which will be wrong for a user who has made transactions today.

The normalisation rule: define a priority order for balance types to use as the canonical "current balance" — preferred: InterimAvailable / expected; fallback: InterimBooked / closingBooked; last resort: the most recent transaction date's closing balance reconstructed from transaction history. Expose the balance type actually used in your API response so downstream applications can make informed decisions about staleness.

6. Account type classification

Current accounts, savings accounts, credit cards, loans, mortgages — all accessible via open banking APIs in markets where ASPSPs offer broad product coverage. The classification of which account type was returned varies significantly.

OBIE defines an AccountType field with values like Personal or Business, and a separate AccountSubType field (CurrentAccount, Savings, CreditCard, etc.). Berlin Group uses accountType values that differ per implementation — German banks often classify accounts differently from Dutch or Spanish ones.

Credit card accounts via PSD2 present a special case. The "balance" of a credit card is a liability, not an asset — a positive balance means money owed. Many banks return credit card balances as positive numbers without explicit sign convention indicating liability. Your normalisation layer must classify accounts by type and apply the correct balance sign semantics for each type.

7. International character sets and encoding

French, German, Spanish, and Polish bank data contains diacritics. Banks return UTF-8, but not always. Some older bank systems return Latin-1 (ISO 8859-1) or Windows-1252 encoded strings in an API response that claims to be UTF-8. The result is mojibake in merchant names and creditor references.

Normalisation rule: detect encoding at ingestion time (not at storage time) and transcode to UTF-8. Maintain a per-bank encoding profile based on observed response data. Do not trust the Content-Type: application/json; charset=utf-8 header — it is frequently wrong.

8. Pending vs posted transaction status

Banks differ on whether they return pending transactions in the transactions endpoint at all, and on how they classify and update pending transactions as they post.

UK banks in the OBIE standard expose a Status field on transactions with values Booked and Pending. Some UK banks return pending card transactions; others do not. Berlin Group uses status values booked and pending similarly, but some European banks do not implement the pending transactions endpoint at all.

The critical data management issue: a transaction that appears as Pending in one fetch may appear as Booked in a later fetch, potentially with a slightly different amount or booking date after final settlement. Your data model must handle the transition from pending to booked state — matching on a composite key and updating rather than creating a duplicate posted entry.

The case for a single normalisation layer

Maintaining per-bank normalisation logic in-house means maintaining an expanding set of bank-specific edge cases as banks upgrade their implementations, change their API versions, and add or remove optional fields. In practice, most in-house aggregation implementations eventually accumulate a file called something like bank_normalisation_hacks.py that grows without bound.

The alternative is using an aggregation API that maintains this normalisation layer centrally. The trade-off: you lose access to bank-specific fields that the normalisation layer does not expose, and you depend on the aggregator's data quality monitoring rather than your own. For most fintech products, the exchange is worthwhile — the bank-specific edge cases that normalisation hides are not features your product can actually build on.

What to verify when evaluating a normalisation layer: does it apply consistent signed-amount semantics? Does it handle pending-to-booked transitions without creating duplicates? Does it surface balance type metadata? Does it expose the raw description field alongside any enriched version? These are the questions that reveal whether normalisation is superficial (renaming fields) or substantive (handling the full range of per-bank variation).