Outlook connector reference

This page contains reference documentation for the Outlook connector in Lakeflow Connect.

Connection properties

When you create the Unity Catalog connection, you must specify the following properties. See Configure authentication to Microsoft Outlook for how to obtain these values.

Property Description
Client ID The Application (client) ID from the Microsoft Entra ID app registration.
Client secret The client secret value from the Microsoft Entra ID app registration.
Tenant ID The Directory (tenant) ID from the Microsoft Entra ID app registration.

Destination schema

The connector produces a single table, email_messages, under the default schema.

  • Primary key: (mailbox, outlook_message_id)
  • Incremental sync cursor: received_at, tracked per mailbox and folder

email_messages

Column Type Description
mailbox string Email address of the mailbox. Part of the primary key.
outlook_message_id string Unique message ID from the Microsoft Graph API. Part of the primary key.
internet_message_id string RFC 2822 internet message ID.
conversation_id string Conversation thread ID.
folder string Folder display name (for example, Inbox).
to_recipients array<string> List of recipient email addresses.
cc_recipients array<string> List of CC recipient email addresses.
bcc_recipients array<string> List of BCC recipient email addresses.
from string Sender email address.
sender string Actual sender email address (might differ from from when sent on behalf).
reply_to array<string> List of reply-to email addresses.
subject string Email subject line.
importance string Importance level (for example, normal, high, low).
is_read boolean Whether the message has been read.
in_reply_to string Internet message ID of the parent message, from email headers.
references array<string> Array of referenced message IDs, from email headers.
body_preview string Preview of the email body.
full_body_content string Complete body content. Format is HTML or plain text, based on the body_format option.
unique_body_content string Unique body content, excluding quoted text from replies.
received_at timestamp Date and time the message was received (ISO-8601). Used as the incremental sync cursor.
sent_at timestamp Date and time the message was sent (ISO-8601).
categories array<string> User-defined categories or tags on the message.
attachments array<struct> Array of attachment structs. Omitted when attachment_mode is NONE. See Attachment struct.

Attachment struct

Field Type Description
attachment_id string ID of the attachment from the Microsoft Graph API.
file_name string Original filename.
mime_type string MIME type (for example, application/pdf).
size bigint File size in bytes.
attachment_kind string Type indicator (for example, fileAttachment, itemAttachment).
is_inline boolean Whether the attachment is inline (for example, an embedded image in a signature).
content binary Base64-encoded file content.

Connector options

These options are specified under outlook_options in the pipeline specification. See Filter combination logic for how multiple filter options interact.

Option Type Required Default Description
include_mailboxes array<string> No All accessible mailboxes List of mailbox email addresses to sync. If not specified, the connector discovers and ingests all accessible mailboxes in the tenant using the Microsoft Graph GET /users endpoint.
include_folders array<string> No ["Inbox"] List of folder display names to sync. Examples: Inbox, Sent Items, Custom_Folder. Matching is case-insensitive.
include_senders array<string> No All senders Filter emails by sender email address using exact match. Example: user@vendor.com.
include_subjects array<string> No All subjects Filter emails by subject line. Values ending with * use prefix match; other values use substring match. Example: "Invoice" (substring), "Re:*" (prefix).
start_date string No Complete history from epoch Start date for the initial sync in YYYY-MM-DD format. Determines the earliest date from which to sync historical data.
body_format string No TEXT_HTML Controls the email body content format. TEXT_HTML: preserves full HTML formatting. TEXT_PLAIN: converts the body to plain text (recommended for AI/RAG pipelines to reduce token usage).
attachment_mode string No ALL Controls which attachments to ingest. ALL: all attachments. NON_INLINE_ONLY: non-inline attachments only (recommended to avoid corporate signature images). INLINE_ONLY: inline attachments only. NONE: no attachments (skips attachment API calls entirely).

Filter combination logic

An email message is ingested when it matches at least one value from each specified filter category. Multiple filter categories are combined with AND logic; values within a single category use OR logic.

Example: include_folders=["Inbox"] AND include_senders=["user@vendor.com", "alerts@system.io"] ingests emails from the Inbox folder that are sent by either user@vendor.com OR alerts@system.io.