Write action scripts for a Microsoft Discovery tool

Action scripts implement the operations that your tool exposes to Discovery agents. Each action maps to a script (or a command dispatched by a central entrypoint) that the Discovery platform calls when an agent invokes that action.

This article describes how to structure action scripts, handle multiple input formats, implement batch processing, and produce consistent output, using a molecular analysis tool as a reference example.

Note

This article applies to action-based and hybrid tools. If you're building a code environment tool, see Create a tool definition instead.

Prerequisites

You have identified the actions your tool exposes. See Plan tool requirements for Microsoft Discovery.
You have a working implementation of the tool's core logic.
Python 3.8 or later (if following the Python patterns in this article).

Understand the action-based tool structure

An action-based tool container typically contains:

Component	Purpose
Entrypoint script	Receives the action name and parameters, validates inputs, and dispatches to the appropriate action function.
Action modules	Python modules (or equivalent) that contain the logic for each action.
I/O utilities	Helper functions for reading input files in supported formats and writing structured results.
Tool definition YAML	The file that declares each action to the Discovery platform. The `command` field in the YAML calls the entrypoint script.

Step 1: Design the entrypoint script

The entrypoint is the single executable the platform calls for every action. It receives the action name and all action parameters as command-line arguments, then dispatches to the appropriate function.

#!/usr/bin/env python3
"""Entrypoint script for action-based tool."""

import argparse
import sys

AVAILABLE_ACTIONS = {
    'action_one': {'description': 'Performs the first action'},
    'action_two': {'description': 'Performs the second action'},
}

def parse_arguments():
    parser = argparse.ArgumentParser(description="My Tool")
    parser.add_argument('--action', choices=AVAILABLE_ACTIONS.keys(),
                        required=True, help='Action to perform')
    parser.add_argument('--input', required=True,
                        help='Path to input directory or file')
    parser.add_argument('--output', required=True,
                        help='Path to output directory')
    # Add action-specific optional parameters here
    return parser.parse_args()

def main():
    args = parse_arguments()

    if args.action == 'action_one':
        success = run_action_one(args.input, args.output, vars(args))
    elif args.action == 'action_two':
        success = run_action_two(args.input, args.output, vars(args))
    else:
        print(f"Unknown action: {args.action}", file=sys.stderr)
        return 1

    return 0 if success else 1

if __name__ == "__main__":
    sys.exit(main())

Key principles:

Return exit code 0 on success and a non-zero code on failure. The Discovery platform uses the exit code to determine whether an action succeeded.
Write error messages to stderr and results to stdout or to the output directory.
Keep the entrypoint thin and dispatch quickly to action-specific modules rather than implementing logic directly in the entrypoint.

Step 2: Implement action functions

Each action function follows a consistent pattern: validate inputs, process data, write outputs, and return a boolean result.

def run_action_one(input_path: str, output_path: str, params: dict) -> bool:
    """
    Run action_one.

    Args:
        input_path: Path to the input directory or file (mounted by Discovery platform).
        output_path: Path to the output directory (mounted by Discovery platform).
        params: Additional parameters from the command line.

    Returns:
        True if the action completed successfully, False otherwise.
    """
    try:
        # 1. Set up logging
        setup_logger('action_one', output_path)

        # 2. Find and validate input files
        input_files = find_input_files(input_path, params)
        if not input_files:
            log_error("No valid input files found at: " + input_path)
            return False

        # 3. Process each file
        results = []
        for file_path in input_files:
            file_results = process_file(file_path, params)
            results.extend(file_results)

        # 4. Write structured output
        write_results(output_path, results)

        return True

    except Exception as e:
        log_error(f"Error in action_one: {e}")
        return False

Important

Use absolute paths throughout. The Discovery platform mounts input and output directories at container-level absolute paths such as /input and /output. Don't use relative paths.

Step 3: Support multiple input formats

Design your scripts to handle the input file formats your users are likely to provide. A common pattern is to detect the file format from the extension and delegate to a format-specific reader.

Example: supporting SMILES, CSV, and JSON inputs

# Plain SMILES file (one SMILES string per line)
# molecules.smi
CCO
CC(=O)O
c1ccccc1
CN1C=NC2=C1C(=O)N(C(=O)N2C)C

# CSV file with named columns
smiles,name
CCO,ethanol
CC(=O)O,acetic acid
c1ccccc1,benzene

[
    {"smiles": "CCO", "name": "ethanol"},
    {"smiles": "CC(=O)O", "name": "acetic acid"}
]

def find_input_files(input_path: str, params: dict) -> list:
    """Return a list of supported input files from a directory or single file path."""
    import os, glob

    if os.path.isfile(input_path):
        return [input_path]

    pattern = params.get('file_pattern', '*.*')
    return glob.glob(os.path.join(input_path, pattern))


def read_molecules(file_path: str, column_name: str = 'smiles') -> list:
    """Read molecule SMILES strings from a .smi, .csv, or .json file."""
    ext = os.path.splitext(file_path)[1].lower()
    if ext == '.smi':
        with open(file_path) as f:
            return [line.strip() for line in f if line.strip() and not line.startswith('#')]
    elif ext == '.csv':
        import pandas as pd
        df = pd.read_csv(file_path)
        return df[column_name].dropna().tolist()
    elif ext == '.json':
        import json
        with open(file_path) as f:
            data = json.load(f)
        return [item[column_name] for item in data if column_name in item]
    else:
        raise ValueError(f"Unsupported file format: {ext}")

Step 4: Implement batch processing

If your tool has an upper limit on how many items it can process at once, add batching logic in the container rather than relying on the agent to manage it.

def process_in_batches(items: list, batch_size: int, process_fn) -> list:
    """
    Process items in chunks to avoid memory exhaustion.

    Args:
        items: Full list of items to process.
        batch_size: Maximum items per batch.
        process_fn: Callable that processes a single batch and returns results.

    Returns:
        Aggregated results across all batches.
    """
    results = []
    total = len(items)

    for start in range(0, total, batch_size):
        batch = items[start:start + batch_size]
        print(f"Processing batch {start // batch_size + 1} "
              f"({start + 1}–{min(start + batch_size, total)} of {total})")
        batch_results = process_fn(batch)
        results.extend(batch_results)

    return results

Tip

Print progress messages to stdout. These messages appear in the Discovery conversation history and help agents and researchers understand where a long-running action is in its execution.

Step 5: Write structured output

Write results to a results.json file in the output directory, along with any supporting files (CSVs, PDB files, plots). The JSON summary provides a machine-readable record the agent can inspect with PreviewResource.

import json
import os
from datetime import datetime

def write_results(output_path: str, results: list, action_name: str, params: dict):
    """Write results to the output directory."""
    os.makedirs(output_path, exist_ok=True)

    summary = {
        "action": action_name,
        "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
        "parameters": params,
        "summary": {
            "total_items": len(results),
            "successful": sum(1 for r in results if r.get('status') == 'ok'),
        },
        "output_files": {},
        "status": "completed"
    }

    # Write detailed results to CSV
    import pandas as pd
    detail_path = os.path.join(output_path, f"{action_name}_detailed.csv")
    pd.DataFrame(results).to_csv(detail_path, index=False)
    summary["output_files"]["detailed"] = detail_path

    # Write summary JSON
    summary_path = os.path.join(output_path, "results.json")
    with open(summary_path, 'w') as f:
        json.dump(summary, f, indent=2)

Example results.json:

{
  "action": "identify_functional_groups",
  "timestamp": "2026-04-07 14:30:00",
  "parameters": { "files_processed": 1 },
  "summary": {
    "total_molecules": 4,
    "total_groups_found": 12,
    "group_distribution": {
      "alcohol": 2,
      "carbonyl": 1,
      "aromatic": 2
    }
  },
  "output_files": {
    "detailed_analysis": "/output/functional_groups_detailed.csv"
  },
  "status": "completed"
}

Step 6: Test the scripts locally

Before building your container image, verify that the scripts work as expected locally.

# Test a single action
python app/entrypoint.py \
  --action identify_functional_groups \
  --input ./sample-input/ \
  --output ./sample-output/

# Verify output was written
ls ./sample-output/
cat ./sample-output/results.json

Step 7: Integrate with the tool definition

Once your scripts are working locally, create the tool definition YAML that exposes each action to the Discovery platform. The command field uses Handlebars syntax to map action parameters to the entrypoint arguments.

actions:
  - name: identify_functional_groups
    description: Identifies common functional groups in molecule input files (SMILES, CSV, or JSON format).
    infra_node: worker
    input_schema:
      type: object
      properties:
        input_directory:
          type: string
          description: "Directory containing input files (SMILES, CSV, or JSON)."
        output_directory:
          type: string
          description: "Directory where output files and analysis results are written."
        column_name:
          type: string
          description: "For CSV files, the column containing SMILES strings. Defaults to 'smiles'."
        batch_size:
          type: number
          description: "Molecules per batch. Defaults to 100."
      required:
        - input_directory
        - output_directory
    command: >
      python3 /app/entrypoint.py
      --action identify_functional_groups
      --input {{input_directory}}
      --output {{output_directory}}
      {{#if column_name}}--column-name {{column_name}}{{/if}}
      {{#if batch_size}}--batch-size {{batch_size}}{{/if}}

For full tool definition guidance, see Create a tool definition.

Feedback

Was this page helpful?

Last updated on 2026-04-20