Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This page contains reference documentation for the Outlook connector in Lakeflow Connect.
Connection properties
When you create the Unity Catalog connection, you must specify the following properties. See Configure authentication to Microsoft Outlook for how to obtain these values.
| Property | Description |
|---|---|
| Client ID | The Application (client) ID from the Microsoft Entra ID app registration. |
| Client secret | The client secret value from the Microsoft Entra ID app registration. |
| Tenant ID | The Directory (tenant) ID from the Microsoft Entra ID app registration. |
Destination schema
The connector produces a single table, email_messages, under the default schema.
- Primary key:
(mailbox, outlook_message_id) - Incremental sync cursor:
received_at, tracked per mailbox and folder
email_messages
| Column | Type | Description |
|---|---|---|
mailbox |
string |
Email address of the mailbox. Part of the primary key. |
outlook_message_id |
string |
Unique message ID from the Microsoft Graph API. Part of the primary key. |
internet_message_id |
string |
RFC 2822 internet message ID. |
conversation_id |
string |
Conversation thread ID. |
folder |
string |
Folder display name (for example, Inbox). |
to_recipients |
array<string> |
List of recipient email addresses. |
cc_recipients |
array<string> |
List of CC recipient email addresses. |
bcc_recipients |
array<string> |
List of BCC recipient email addresses. |
from |
string |
Sender email address. |
sender |
string |
Actual sender email address (might differ from from when sent on behalf). |
reply_to |
array<string> |
List of reply-to email addresses. |
subject |
string |
Email subject line. |
importance |
string |
Importance level (for example, normal, high, low). |
is_read |
boolean |
Whether the message has been read. |
in_reply_to |
string |
Internet message ID of the parent message, from email headers. |
references |
array<string> |
Array of referenced message IDs, from email headers. |
body_preview |
string |
Preview of the email body. |
full_body_content |
string |
Complete body content. Format is HTML or plain text, based on the body_format option. |
unique_body_content |
string |
Unique body content, excluding quoted text from replies. |
received_at |
timestamp |
Date and time the message was received (ISO-8601). Used as the incremental sync cursor. |
sent_at |
timestamp |
Date and time the message was sent (ISO-8601). |
categories |
array<string> |
User-defined categories or tags on the message. |
attachments |
array<struct> |
Array of attachment structs. Omitted when attachment_mode is NONE. See Attachment struct. |
Attachment struct
| Field | Type | Description |
|---|---|---|
attachment_id |
string |
ID of the attachment from the Microsoft Graph API. |
file_name |
string |
Original filename. |
mime_type |
string |
MIME type (for example, application/pdf). |
size |
bigint |
File size in bytes. |
attachment_kind |
string |
Type indicator (for example, fileAttachment, itemAttachment). |
is_inline |
boolean |
Whether the attachment is inline (for example, an embedded image in a signature). |
content |
binary |
Base64-encoded file content. |
Connector options
These options are specified under outlook_options in the pipeline specification. See Filter combination logic for how multiple filter options interact.
| Option | Type | Required | Default | Description |
|---|---|---|---|---|
include_mailboxes |
array<string> |
No | All accessible mailboxes | List of mailbox email addresses to sync. If not specified, the connector discovers and ingests all accessible mailboxes in the tenant using the Microsoft Graph GET /users endpoint. |
include_folders |
array<string> |
No | ["Inbox"] |
List of folder display names to sync. Examples: Inbox, Sent Items, Custom_Folder. Matching is case-insensitive. |
include_senders |
array<string> |
No | All senders | Filter emails by sender email address using exact match. Example: user@vendor.com. |
include_subjects |
array<string> |
No | All subjects | Filter emails by subject line. Values ending with * use prefix match; other values use substring match. Example: "Invoice" (substring), "Re:*" (prefix). |
start_date |
string |
No | Complete history from epoch | Start date for the initial sync in YYYY-MM-DD format. Determines the earliest date from which to sync historical data. |
body_format |
string |
No | TEXT_HTML |
Controls the email body content format. TEXT_HTML: preserves full HTML formatting. TEXT_PLAIN: converts the body to plain text (recommended for AI/RAG pipelines to reduce token usage). |
attachment_mode |
string |
No | ALL |
Controls which attachments to ingest. ALL: all attachments. NON_INLINE_ONLY: non-inline attachments only (recommended to avoid corporate signature images). INLINE_ONLY: inline attachments only. NONE: no attachments (skips attachment API calls entirely). |
Filter combination logic
An email message is ingested when it matches at least one value from each specified filter category. Multiple filter categories are combined with AND logic; values within a single category use OR logic.
Example: include_folders=["Inbox"] AND include_senders=["user@vendor.com", "alerts@system.io"] ingests emails from the Inbox folder that are sent by either user@vendor.com OR alerts@system.io.