Exercise - Connect to GitHub repository

Completed

In this exercise, you'll configure your connector to pull data from a specific public GitHub repository's issues and customize the data schema to match GitHub's issue structure.

Learning objective

Configure the connector to connect to a real GitHub repository (such as the Microsoft 365 Agents Toolkit samples repository), customize the data model to include GitHub-specific fields, and prepare the connector for data ingestion.

Scenario

You need to configure your connector to pull issues from the Microsoft 365 Agents Toolkit samples repository. You can connect to any other public repository you'd like, or even private repositories that require authentication.

Task 1: Configure environment variables for GitHub

Configure your connector to connect to the Microsoft 365 Agents Toolkit samples repository.

  1. In your VS Code project, open the /env/.env.local file.

  2. Locate and update the following variables:

    CONNECTOR_ID=github_issues_connector_001
    CONNECTOR_NAME=GitHub Issues Connector
    CONNECTOR_DESCRIPTION=Connector that indexes GitHub issues from the Microsoft 365 Agents Toolkit samples repository
    CONNECTOR_REPO=OfficeDev/microsoft-365-agents-toolkit-samples
    

    Note

    The CONNECTOR_REPO value should be in the format owner/repository. The Microsoft 365 Agents Toolkit samples repository is owned by Microsoft and is publicly accessible, making it ideal for this exercise.

  3. Save the file.

Task 2: Review the schema

Review the connector's data model to match the structure of GitHub issues.

  1. Open /src/models/Item.ts to see the current data model.

  2. This is a representation of the item before translated into an externalItem for further ingestion to the Copilot connectors API (formerly Graph API):

    export interface Item {
       id: string;
       issueNumber: string;
       owner: string;
       repo: string;
       assignedTo: string;
       state: string;
       lastModified: string;
       title: string;
       abstract: string;
       author: string;
       content: string;
       url: string;
     }
    
  3. Open /src/references/schema.json to see the current Microsoft Graph schema and how the data is represented in Entra ID:

    [
      {
        "name": "title",
        "type": "String",
        "isQueryable": "true",
        "isSearchable": "true",
        "isRetrievable": "true",
        "labels": ["title"]
      },
      {
        "name": "owner",
        "type": "String",
        "isQueryable": "true",
        "isSearchable": "true",
        "isRetrievable": "true"
      },
      {
        "name": "repo",
        "type": "String",
        "isQueryable": "true",
        "isSearchable": "true",
        "isRetrievable": "true"
      },
      {
        "name": "assignedTo",
        "type": "String",
        "isQueryable": "true",
        "isSearchable": "true",
        "isRetrievable": "true"
      },
    ...
    
  4. Review the schema properties that are currently defined. You will add a new property to be ingested in a later exercise.

Task 3: Review data retrieval

Review the connector's data retrieval logic that connects to GitHub's API.

  1. Open /src/custom/getAllItemsFromAPI.ts, which is the file used to retrieve items from the GitHub API.

  2. Find the customization points (look for comments starting with [Customization point]) in the data retrieval code.

  3. Look for the function getAllItemsFromAPI and review the code. If you need additional logic to get all items from the repository, you can add it here. This function is used to get all items from the repository, filtered to exclude pull requests and only include issues:

    export async function* getAllItemsFromAPI(
      config: Config,
      since?: Date,
      pageSize = 100
    ): AsyncGenerator<Item> {
      const repos = config.connector.repos.split(",");
      for (const repo of repos) {
        config.context.log(`Fetching issues from repo: ${repo}`);
        // Url to fetch issues for the first page
        let fetchPageUrl = `https://api.github.com/repos/${repo}/issues?state=all&per_page=${pageSize}${
          since ? `&since=${since.toISOString()}` : ""
        }`;
        while (fetchPageUrl) {
          const response = await fetchIssues(config, fetchPageUrl, repo);
          const issues = await mapItemsFromResponse(response);
          for (const item of issues) {
            yield item;
          }
          // If there are no more pages, null is returned to break the loop
          fetchPageUrl = getNextPageUrl(response);
        }
      }
    }
    

Task 4: Review the connection configuration

Review the configuration object that represents your connector. Together, these fields enable the application to securely connect to Microsoft Graph, describe the connector, and manage its configuration and data scope.

  1. Open /config.ts to see how the Microsoft Entra app registration is represented:

    export interface Config {
      context: InvocationContext;
      clientId: string;
      connector: {
        accessToken: string;
        id: string;
        name: string;
        description: string;
        schema: ExternalConnectors.Schema;
        template: any;
        repos: string; // Comma separated list of repositories for sample purpose
      };
    }
    
  2. This configuration object contains:

    • Authentication details - Client ID and access token for Microsoft Graph

    • Connector metadata - ID, name, and description you configured

    • Data scope - The repository or repositories to index

    • Schema definition - How the data will be structured in Microsoft Graph

What you've accomplished

You've successfully configured your connector to work with a real GitHub repository. Your connector is now set up to:

  • Connect to Microsoft 365 Agents Toolkit samples repository - A real, active open-source project with plenty of issues

  • Use appropriate data model - Schema that matches GitHub issue structure

  • Handle GitHub-specific data - Configuration that understands GitHub's API and data format

In the next exercise, you'll run the connector to actually ingest the GitHub issues data into Microsoft Graph and verify that it works correctly.