Transparency Note: Edge RAG Preview enabled by Azure Arc

What is a Transparency Note?

An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it is deployed. Creating a system that is fit for its intended purpose requires an understanding of how the technology works, what its capabilities and limitations are, and how to achieve the best performance. Microsoft’s Transparency Notes are intended to help you understand how our AI technology works, the choices system owners can make that impact system performance and behavior, and the importance of thinking about the whole system, including the technology, the people, and the environment.

Microsoft’s Transparency Notes are part of a broader effort at Microsoft to put our AI Principles into practice. To find out more, see the Microsoft AI principles.

The basics of Edge RAG

Introduction

Edge RAG is a retrieval-augmented generation (RAG) system designed to enhance responses from language models by grounding them in external data sources. It operates at the edge—on local or hybrid environments—enabling low-latency, on-premise data residency and more context-aware AI experiences.

Edge RAG combines two core capabilities:

  • Retrieval: It searches a predefined set of indexed documents or data sources (e.g., enterprise content, local files, or customer knowledge repositories) to identify relevant information.
  • Generation: It uses a language model to generate a response that incorporates the retrieved content, aiming to provide more accurate, grounded, and contextually relevant answers.

Key terms

Term Description
Edge RAG A retrieval-augmented generation (RAG) system designed to enhance responses from language models by grounding them in external data sources, operating in local or hybrid environments with low latency and privacy-preserving capabilities.
Retrieval A process that searches a predefined set of indexed documents or data sources to identify relevant information.
Generation A process that uses a language model to generate responses by incorporating retrieved content, ensuring accuracy and relevance.
No-code/Low-code Tools enabling deployment and management of AI models with minimal technical expertise required.
Prompt Engineering The technique of structuring prompts to optimize the performance and behavior of AI models.
Data Ingestion and Retrieval Pipeline A sophisticated system designed to support multiple data types for efficient data processing and retrieval.

Capabilities

System behavior

Edge RAG is a turnkey solution designed to facilitate the creation of custom chat assistants and the extraction of insights from customer data. This package includes all necessary components for customers to build tailored applications efficiently, including:

  • A choice of Generative AI (GenAI) language models running locally with support for both CPU and GPU hardware.

  • A turnkey data ingestion and RAG pipeline that stores all data locally, with Azure role-based access control (Azure RBAC) to help prevent unauthorized access.

  • An out-of-the-box prompt engineering and evaluation tool to find build, evaluate, and deploy custom chat solutions.

  • Azure-equivalent APIs to integrate into business applications, and a prepackaged UI to get started quickly.

Use cases

Edge RAG is well-suited for scenarios where contextual accuracy, data privacy, and low-latency responses are important to our customers. Recommended use cases include:

  • Enterprise Knowledge Retrieval: Assisting employees with internal documentation, policies, or procedures. For example, a user who wants to understand the specifics of dental coverage from their company’s medical plan can ask “What is the annual deductible for dental fillings?” Edge RAG would then search the company’s internal documentation (which has been ingested earlier into its local database by the data source addition stage) to find the relevant text chunk(s) that provides the answer to this question. It could then respond back to the user as “Annual Deductible for Class II - Basic Services like Dental Fillings is $25 individual/$75 family” and provide references to the exact document, chunk IDs which were used to derive the above information.

  • Customer Support: Enhancing chatbot responses with up-to-date, domain-specific knowledge. For example, a phone company wants to reduce call center volume by automatically answering common customer questions through a chatbot. The user might ask “Why is my mobile data not working even though I have a data plan?” Edge RAG then pulls relevant documents such as troubleshooting guides, FAQs, and past support tickets about mobile data issues. It synthesizes an answer like: "Mobile data might not work due to incorrect APN settings or a temporary network issue. Try restarting your phone and checking your APN configuration. If the problem continues, contact support or visit your nearest store" and gives links to customer trouble-shooting documentation.

  • Field Operations: Providing offline or low-connectivity access to technical manuals or troubleshooting guides. For example, technicians servicing residential or commercial furnaces might need quick access to repair protocols, wiring diagrams, or error code explanations while on-site. A user might ask “How do I fix error code E35 on a Model X500 furnace?” Edge RAG then pulls up relevant service manuals, error code databases, and past technician notes related to Model X500 and code E35. It can produce a step-by-step response like: "Error code E35 indicates a blocked condensate drain. First, turn off the unit, inspect and clear the drain line, then reset the system. If the error persists, check the pressure switch tubing for clogs. See section 5.3 in the X500 technician manual for diagrams.”

  • Healthcare (Non-Diagnostic): Surfacing relevant clinical guidelines or documentation for trained professionals. For example, front desk staff or care coordinators in a clinic need quick access to policy information, patient instructions, or administrative procedures to assist patients efficiently. A user might ask “Can I help a patient get a same-day refill if the doctor is out?” Edge RAG then pulls relevant clinic policies, staff guidelines, and past protocol documents about medication refills and provider availability. It provides an answer like "If the prescribing doctor is unavailable, a same-day refill can be approved by the covering provider if the patient is stable and the refill request is routine. Document the request in the EHR and follow up with the covering provider for confirmation." and gives a reference to the document(s) that covered this policy information

  • Education and Training: Delivering personalized learning content based on approved curricula. For example, academic advisors or administrative staff need quick access to curriculum details, enrollment policies, or training requirements to assist students effectively. A user might ask “Can a student substitute Intro to Statistics for the Quantitative Methods requirement in the Business program?” Edge RAG then gathers relevant course catalogs, program requirements, and past substitution approvals. It generates a response like: "Yes, Intro to Statistics might substitute for Quantitative Methods if the student is pursuing the Business Analytics track. Approval from the department chair is required—submit a course substitution form via the registrar’s portal.” and provides reference to the enrollment policy and course catalog.

  • Energy Sector: Power companies can leverage it to analyze real-time data from sensors and turbines to optimize energy production and better predict maintenance needs, to help ensure uninterrupted service. For example, field engineers or control room operators need quick access to safety protocols, maintenance procedures, or outage response steps to ensure smooth operations. A user might ask “What’s the procedure for re-energizing a transformer after scheduled maintenance?” Edge RAG then retrieves relevant operating manuals, maintenance logs, and safety procedures related to transformer re-energization. It generates a response like: "Ensure all grounding is removed, inspect for any visible damage, confirm isolation points are secure, then re-energize following the lockout-tagout removal checklist. Notify the control center before restoring power. Refer to SOP #T-102 for full details.” and provides a link to the manual(s) with details and figures.

  • Logistics and Supply Chain: Logistics providers can use it to streamline operations by analyzing fleet data for route optimization, improving delivery times, and reducing fuel consumption. For example, warehouse managers or logistics coordinators need quick access to shipment protocols, inventory handling procedures, or routing guidelines to keep operations running smoothly. A user might ask “What’s the process for rerouting a shipment that's already in transit due to a delivery exception?” Edge RAG the pulls relevant shipping SOPs, carrier agreements, and internal rerouting procedures. It delivers a response like: "Contact the carrier with the new delivery address and provide the exception code. Update the order status in the logistics system and notify the customer. Follow rerouting protocol outlined in SOP #R-208 for documentation and chain of custody compliance" and provides reference to the relevant policy documents.

  • Telecommunications: Network operators can deploy it to help monitor and analyze local network performance, proactively addressing connectivity issues. For example, network operations center (NOC) staff or field engineers need quick access to configuration steps, outage protocols, or service restoration procedures to maintain network uptime. A user might ask “What’s the standard process to restore service after a fiber line cut?” Edge RAG the retrieves relevant outage response protocols, fiber repair procedures, and escalation workflows. It provides a step-by-step response like: "Isolate the damaged segment and reroute traffic if possible. Dispatch a fiber repair crew, document the incident, and notify affected customers. Once repaired, test for signal integrity and restore service. Follow SOP #F-311 for full post-restoration checks." and provides reference to the repair document.

  • Media and Entertainment: Content creators and broadcasters can utilize it to manage large volumes of video and audio data, enabling quicker content indexing and personalized recommendations for viewers. For example, Post-production editors or VFX artists need quick access to version control protocols, naming conventions, or delivery standards to ensure consistency and compliance with client expectations. A user might ask “What’s the correct naming convention for final VFX shots being delivered to the studio?” Edge RAG then retrieves VFX pipeline documentation, asset naming guidelines, and client delivery standards. It produces an answer like: "“Final VFX shots should follow the format: ProjectCode_SceneNumber_ShotNumber_v###_FINAL.mov (e.g., AURA_102_045_v105_FINAL.mov). Make sure all layers are flattened, color-corrected, and match the delivery spec in SOP #VFX-410.” and gives a link to the guidelines document.

  • Agriculture: Farmers can employ it to analyze environmental data, optimize irrigation, to help predict crop yields, improving agricultural efficiency and sustainability. For example, Field supervisors or agronomists need quick access to crop treatment protocols, equipment settings, or seasonal planting guidelines to assist farmworkers efficiently. A user might ask "What’s the recommended fertilizer mix and application rate for corn during early growth stage?” Edge RAG then pulls relevant agronomy guides, soil reports, and past application records for corn. It provides a specific response like: "Apply a 20-10-10 NPK fertilizer at a rate of 120 lbs per acre during the V2–V4 growth stage. Ensure even distribution and avoid application right before rainfall. Refer to Fertilizer Protocol #AG-221 for soil condition adjustments." and provides link to the fertilizer application guidance document.

Considerations when choosing other use cases

We encourage customers to leverage Edge RAG in their innovative solutions or applications. However, here are some considerations when choosing a use case:

  • Avoid scenarios where use or misuse of the system could result in significant physical or psychological injury to an individual. For example, scenarios that diagnose patients or prescribe medications have the potential to cause significant harm. Incorporating meaningful human review and oversight into the scenario can help reduce the risk of harmful outcomes.
  • Sensitive information and PII: Edge RAG retrieves and processes data from local sources, which might include personally identifiable or business-sensitive information. Consider the privacy and security implications of exposing such data during retrieval and generation, and implement appropriate safeguards such as redaction, masking, or access controls before deploying in sensitive use cases.
  • Legal and regulatory considerations. Organizations need to evaluate potential specific legal and regulatory obligations when using any AI services and solutions, which might not be appropriate for use in every industry or scenario. Restrictions might vary based on regional or local regulatory requirements. Additionally, AI services or solutions are not designed for and might not be used in ways prohibited in applicable terms of service and relevant codes of conduct.

Limitations

  • Medical diagnosis and treatment: The potential for inaccurate or hallucinated outputs can result in incorrect diagnoses or dangerous treatment suggestions. This is because Edge RAG does not meet the rigor, accountability, and approval standards required by medical regulatory bodies such as the FDA. Additionally, Edge RAG is unable to access or accurately interpret nuanced medical histories or real-time data.
  • Legal or financial advice: The potential for inaccurate or hallucinated outputs can result in incorrect legal or financial advice. This could result in lawsuits, financial losses, regulatory penalties, or other legal and financial risks. Legal and financial regulations change frequently, and Edge RAG might return obsolete or jurisdictionally irrelevant information. Additionally, Edge RAG cannot substitute expert judgment or understand complex edge cases.might return obsolete or jurisdictionally irrelevant information. Additionally, Edge RAG cannot substitute expert judgment or understand complex edge cases. It might return obsolete or jurisdictionally irrelevant information. Additionally, Edge RAG cannot substitute expert judgment or understand complex edge cases.
  • Autonomous decision making in safety-critical systems (e.g., aviation, autonomous vehicles, emergency response): Due to Edge RAG's limitations, it is the responsibility of the user or deploying organization to ensure appropriate oversight, validate outputs, and confirm compliance with relevant safety and regulatory standards. Generated content might appear confident but be entirely false, leading to flawed decisions. Additionally, without human oversight, Edge RAG cannot handle unexpected scenarios safely.
  • Sensitive or regulated environments: Edge RAG has not been validated for compliance with HIPAA, FedRAMP, or other regulatory frameworks.
  • Generating or verifying factual claims in journalism, academic research, or public communications: Edge RAG is not 100% accurate at retrieving all the relevant information for a question, which could lead to inaccurate answers based on partial information. Edge RAG might provide vague or inaccurate sourcing, and outputs might appear trustworthy even when they are incorrect, potentially leading to the spread of misinformation if not verified with reputable and accurate resources.

System performance

Consider the following sample scenario of how Edge RAG can be leveraged:

Contoso corporation wants Edge RAG to help users find answers to questions about the company’s healthcare plan. The company supplies all documents related to its healthcare plan to Edge RAG, during the data ingestion stage. Once data ingestion is complete, a user can access Edge RAG and ask questions on various aspects of the healthcare plan. Edge RAG will supply those answers and provide references to where the answers were derived from the original document. The users should use these references to verify that the answer is accurate.

Question What is the annual deductible on dental fillings?
True Positive Annual Deductible for Class II - Basic Services like Dental Fillings is $25 individual/$75 family.
False Positive Annual deductible for dental fillings is $100 under all plans.
True Negative No deductible applies for Class I services like preventive care.
False Negative There is no deductible under all plans.
Fallback User should refer to the original document that is cited, at the bottom of each answer.
For example: contoso.com://fileshare/medicalplans/dentalplan.pdf

Best practices for improving system performance

  • Data Quality: Ensure that the data ingested into the system is of high quality, well-structured, and relevant to the use case. Poor data quality can lead to inaccurate or misleading results.

  • Scalability: Assess the system's ability to scale to handle large datasets and high query volumes. Ensure that the infrastructure can support the computational demands of the semantic search DB and LLMs.

  • Latency: Optimize the system to minimize latency in query responses. This is particularly important for real-time applications where quick responses are critical. Latency is affected by the GPUs at hand and network. Tuning hyperparameters—like top-N, chunk size, and others—can help balance latency and response quality. Additionally, LLM-specific factors, such as the length of the chat session, also impact overall latency.

  • Models: Edge RAG supports both text only and multimodality signals. While multimodal representation is applicable to text, it is recommended to use text optimized representations, when possible, for longer context reasoning.

  • Text Chunking and Hyper-parameter optimization: When setting up the system it is suggested to tune hyper-parameters and settings like text chunking sliding windows, temperature, and other configurations according to the data, use-case, and your expected user experience.

  • Compatibility: Ensure that Edge RAG is compatible with existing IT infrastructure and can be seamlessly integrated with other enterprise systems and applications.

  • Deployment Environment: Consider the deployment environment (on-premises, cloud, or hybrid) and ensure that the system is configured to operate efficiently in that environment.

  • Human-in-the-Loop: Incorporate human oversight where appropriate and especially for sensitive or high-impact decisions. This ensures that critical decisions are reviewed and validated by human experts.

  • Bias Mitigation: Regularly evaluate the system for biases in the data and the models. Implement strategies to mitigate any identified biases to ensure fair and equitable outcomes.

  • Transparency: Clearly communicate to users when AI-generated content is being used and provide explanations for the system's decisions to maintain transparency and trust.

  • Feedback Mechanism: Implement a feedback mechanism that allows users to provide input on the system's performance and outputs. Use this feedback to continuously improve the system.

Learn more about responsible AI

Learn more about Edge RAG