Rayn Data Ingestion Pipeline for AWS S3

The Rayn Data Ingestion Pipeline for AWS S3 allows you to provide the Rayn Machine with content and user data. This data is essential for the Rayn Machine to function properly and provide accurate and relevant results for your users.

Data Ingestion Pipeline for AWS S3 is a Rayn Air Enterprise feature. Please contact Sales for a quote.

Getting Started

To get started with the Rayn Data Ingestion Pipeline, you will need to have an AWS S3 account. If you do not have one, you can sign up for a free account on the AWS website. Once you have an account, you can begin setting up your data ingestion pipeline.

How to get there

In the left hand menu navigate to Air>Connections. Click on New to create a new connection and select Server Side Data Ingestion.

Configuring the Data Ingestion Pipeline

The configuration for the data ingestion pipeline is crucial for ensuring that the Rayn Machine receives the necessary data to provide accurate results.

Content data ingestion

Content data ingestion can be used to ingest content data for contextualization. There are two methods of content ingestion.

AWS S3: Update content data in batch by uploading a file to an S3 bucket. The configuration in Console allows you to configure the data pipeline for your file format. An added benefit of data ingestion via S3 is the use of content augmentation, which can be used to expand the scope of your content.
Content API: The content api allows for near real time contextualization and flexible updates of content id's. More information on the content api can be found on developers.rayn.io.

Set up Content Ingestion via S3

To configure the pipeline, you will need to follow these steps:

Step 1

Name the data ingestion pipeline to ensure it will be easy to recognise later.

Step 2

Prepare to setup the content ingestion pipeline by sourcing a sample file of the data that will be uploaded to S3. The sample file will be used to to validate whether the settings are correct and to show an example of the output. Rayn currently only accepts csv format for both content data as well as user data ingestion via S3. Upload your sample file and adjust the settings for Rayn to correctly interpret the csv. Once the sample file is uploaded then the columns will be displayed. Check out the attachment of this article for an example file.

Additional taxonomies: By default Rayn will map the content to the IAB Content Taxonomy 3.0. In some cases it may be required to work with the IAB Content Taxonomy 2.2, in which case you will need to enable this feature.

The contentId is used by Rayn as a key to identify each unique content article. Select the field that should be used as contentId. Ensure that this field is unique to each article being ingested.

Rayn Machine will process all fields unless instructed to ignore some fields. Use the toggles to exclude the fields that Rayn should ignore.

Step 3 (Optional)

Configure Rayn Machine to expand the scope of relevance of your content by using LLM's.

Content Augmentation

In addition to providing the Rayn Machine with data, the data ingestion pipeline also allows for content augmentation. This means that you can instruct Rayn Machine to create a larger body of relevant text using the initial input. An augmentation template or a custom created template will prompt one or multiple LLM's to generate more content.

For example, if you have only a single "product name" as content, you can use Content Aumentation to add a relevant body of text which Rayn can categorize.

Running an augmentation template

Choose any of the predefined templates to augment the content. Select the fields from the sample file that should be included in the processing.

Create a custom augmentation template

Write a custom prompts and insert fields from the sample file in the prompt. The fields will be used as placeholders.

Content Augmentation will bring additional costs. Please contact your sales representative for more information.

Step 4

After completing the configuration Rayn Machine will process the sample file and produce sample results. Review the results to ensure that the settings are correct.

User data ingestion

User data ingestion can be used to update user or add user data. There are two methods of updating user.

AWS S3: Update user data in batch by uploading a file to an S3 bucket. The configuration in Console allows you to configure the data pipeline for your file format. An added benefit of data ingestion via S3 is the use of data transformers which allow you to upload data in a a certain form (like birth date), Rayn will transform the birth date to the appropriate audience category.
Data API: The data api allows for fast and flexible updates of user data. More information on the data api can be found on developers.rayn.io.

User data ingestion cannot be used in combination with user data that has been ingested with Rayn JS.

Set up User Data Ingestion via S3

The setup for user data ingestion via S3 allows you to configure the ingestion pipeline to handle your file formatting.

Step 1

Select the data ingestion pipeline you wish to use. From the connections overview. Select the Data panel and choose Add.

Step 2

Prepare to setup the user data ingestion pipeline by sourcing a sample file of the data that will be uploaded to S3. The sample file will be used to to validate whether the settings are correct and to show an example of the output. Rayn currently only accepts csv format for both content data as well as user data ingestion via S3. Upload your sample file and adjust the settings for Rayn to correctly interpret the csv. Once the sample file is uploaded then the columns will be displayed. Check out the attachment of this article for an example file.

Step 3

Map the fields in the sample file to Rayn fields.

Field	Description	Example
userId	The userId to be added or updated.	"123456"
age	User age. Rayn will map the user's age from the value in the sample file to age categories as per the IAB Audience Taxonomy 1.1 taxonomy.	"31-12-1999"
categories	Provide any audience categories from IAB Audience Taxonomy 1.1.	"12, 58, 42"