SportsCloud Integration Hub

In dit statement wordt uitgelegd hoe wij omgaan met integraties en data uitwisseling tussen verschillende systemen.

SportsCloud Integration Hub

SportsCloud Data Hub

A digital solution acting as a synchronization hub in order to provide data synchronization of multiple platforms via a centralized interface.

In this debriefing we will focus on two parts:

Offer a platform which makes synchronization possible between multiple systems if it can't be configured via a web interface

Provide reusability so a marketeer or developers can setup the same connections cost-efficient

Brief overview of data synchronization

Synchronizations types

There are commonly two types of data synchronizations:

  • Bulk , typically periodically executed (e.g. each night)
  • Real-time , often established via a webhook

In order to make sure you have 100% the same data in both platforms you will need to apply a bulk sync and a real-time sync. It can be that one of the platforms is temporary down or return errors. In that case a bulk sync will always make sure the missing data is corrected. From the next time that data entry is changed the real-time webhook will process it directly.

With only working with webhooks you have the scenario that certain data will get out of sync. Also profiles before activating the webhook are not synced as they are not triggered (yet).

On the other side if you only apply bulk updates, especially when not configured frequently, you will not have real-time data available. For a marketeer you won't be able to act directly (e.g. starting a campaign after a purchase).

Active and passive

Synchronizations can operate by the following modes:

  • Active , the performing platform consumes the data source and checks if there is new data available
  • Passive , the performing platform waits for signals from the data source (events or webhooks)

Webhooks are passive as the platform receives a signal (webhook) when a profile is changed. This webhook can either be a POST request (get the profile pushed within the webhook) or just a GET request with a identifier. In the last case you know that a profile is updated and you should consume the other platform about that profile (with the id) and deside what to do with that new profile.

Bulk updates are always active as you deside yourselves when to consume a list (can also be a single) of profiles of the oter platform and what to do with that list.

Data sources & CRUD operations (create, read, update & delete)

Static files

Static files are well known by all kind of platforms and developers. Formats like .csv, .xml and .xls are often used. This big disavantage of these files is the lack of validation while storing them. The outgoing platform should make sure it follows the requirements and the incoming platform should also follow these same requirements in order to read. Also real-time synchronization and static files are bad performant. The big advantage is that it's easy to implement.

Static files do require transfer via storage, like (S)FTP, AWS S3 or Azure Blob Storage.

API's

Most modern data sources can be consumed and mutated via a REST API. A REST API offers great validation, is easy to implement and is often well documented.

The requirements, like field definitions and authentication can differ per API. This makes almost every API unique and requires specific knowledge and custom work to implement.

Streaming data

With very high volume streaming data is by far the best data source. Technologies like Apache Kafka or AWS Kinesis are often used. From multiple data streams you can listen and pick up events from the stream (e.g. creation and mutation of profiles) and process them in your platform. This technology is probably only offered when having enterprise like implementations and you have to deal with millions of changes over the day.

Also with streaming data you will need an initial sync as you only receive partial real-time updates.

Authorization & permissions

Every platform and data source is (and should) be protected. Common used authorization protocols are:

  • JWT Tokens
  • OAuth 2 (and 1)
  • API Keys

While setting up a synchronization the platform will have to implement or pass though crendetials. Some protocols are easier to use as they don't need any logic to operate (API Keys).

Protocols like OAuth 1 require to create a digital signature. Protocols like JWT Tokens and OAuth 2 can have a expired token and need to refresh with another token. This requires additional software to operate.

E.g. a simple webhook with a API Key:

$ curl -X POST \

-H "Api-key: my-secret-key" \

-d "{'email': 'j.doe@example.com', 'opt-in': true}" \

https://example.com/api/profiles/1/

These kind of requests can easily be established and configured via a web management interface like BlueConic (if the receiving platfors offers these kind of API specifications).

It often occurs that the platform which needs the data requires additional logic (conditions, authentication, etc.) and can't be configured via a web interface. In this case we need a man in the middle; the SportsCloud Data Hub.

Step-by-step compatibility report

If a synchronization should be established a typical order of steps below can be done for defining if a synchronization can be done:

1. Does the origin platform can trigger synchronizations or webhooks
2. How can these synchronizations and webhooks be configured (e.g. which fields)

3. Offers the receiving platform a API to push the changes to
4. Which authentication protocols are in place
5. Which format of data is supported (JSON for REST API, or .csv, etc.)

Data flows of syncs

Below we will describe some simple and more advanced data flows which are offen used and are preferable.

Simple (does not require man in the middle software; TDH)

1. Email has changed in origin platform
2. Origin platform triggers a webhook
3. Webhook pushes email to receiving platform via REST API with API Key 4. Receiving platform is updated

Advanced (does require TDH)

  1. Email has changed in origin platform
  2. Origin platform triggers a webhook
  3. Webhook pushes email to TDH platform via REST API with API Key
  4. TDH Platform knows which client, project and sync this is based on the unique API Key
  5. TDH Platform creates a signed (required for e.g. OAuth 1) request based on the received data from

the webhook

  1. TDH Platform pushes email to receiving platform via created request
  2. Receiving platform is updated

Best practice & advanced (easy to configure in origin platform)

  1. Email has changed in origin platform
  2. Origin platform triggers a webhook
  3. Webhook notifies TDH platform something is changed for user with id 1 via REST API with API Key
  4. TDH Platform knows which client, project and sync this is based on the unique API Key
  5. TDH platform consumes profile of origin platform with id 1
  6. TDH platform desides what to sync to the receiving platform (can be done via management interface, managed by a marketeer or based on a predefined template of a developer)
  7. TDH Platform creates a signed (required for e.g. OAuth 1) request based on changes to be made
  8. TDH Platform pushes email to receiving platform via created request
  9. Receiving platform is updated

In this way we only apply mutations to the receiving platform as we have full control over what to change (so not the origin platform). Also GET requests are very simple to configure in the origin platform; rather than (sometimes complex) POST requests with all kind of custom fields and a wide variety of variable implementations. Maintenance of these setups is hard (the webhook just exists of a simple url, including a single Id and an API Key), e.g. https://example.com/api/notify...

This is seen as best practice as you can easily update your own software platform and apply it for every client instead of appying changes in many different systems. Also you can prevent sending to much expensive unnecessary requests to receiving platforms.

Functional requirements

MVP

  • Management interface
  • Manage clients, projects and type of synchronizations
  • Manage API Keys / Authentication tokens
  • Setup should be testable before coming active
  • Reusable setups for clients
  • Insights in usage per client, per project & per synchronization entry

User scenario #1:

When a profile is updated in BlueConic the corresponding profile interests should be updated as well in Copernica. If the profile doesn't exists it should be created.

User scenario #2:

Starting with a clean slate campaign in ActiveCampaign the complete list of profiles from BlueConic should be synced to ActiveCampaign. After that if that profile is updated in BlueConic it should also be updated in ActiveCampaign.

User scenario #3

When a profile is updated in BlueConic a particular platform should be updated with the same profile as well. However this platform does not support developer friendly profile endpoints in the API, but used OAuth 1 and a combination of REST API endpoints; one for creating a contact and one for updating a contact (e.g. Copernica).

Nice-to-have

  • Manage triggers from this centralized platform in order to setup the complete synchronization without doing configurations in multiple platforms
  • Manage targets to other platforms and apply configuration (e.g. update interests properties from BlueConic to ActiveCampaign).

Non-functional requirements

  • Monitoring for performance
  • Monitoring for errors
  • Horizontally scalable (high volume traffic e.g. 1.000 r/s)
  • Forwarding IP
  • Secure authentication (revoke, hashes, etc.)
  • Zero downtime deployment (prevent missing of API calls)
  • Async scheduler- and worker services
  • Retry mechanisme with exponentional time inbetween (e.g. do not retry 10 times in 1 second)
  • Secure network connections (SSL)

Technology

  • Python
  • Django
  • Django REST framework (Zappier is also built on Django DRF)
  • API Keys