- How does AI automatically assign the sensitivity level work in practice?
a. Will the AI be screening all email content and all email attachments?
b. To what extent does the AI rely on the way employees have classified their emails and documents in the first six weeks when the process is manual?
c. What kind of information would the AI be looking for in order to assign each specific sensitivity label?
d. Does the AI system rely on pre-defined patterns – for example, search for SSNs, credit card numbers, driver’s license numbers, etc. just like a DLP solution would? If yes, is this done in combination with what the system has ‘learned’ during the first 6 weeks of manual process?
e. Does the system learn with live data only? What safeguards are put in place to ensure that training data is balanced and that all possible scenarios are sufficiently covered?
f. For how long will any live data provided (either for training or for label assignment purposes) be retained?
- a. What controls are in place to prevent incorrect decisions/ output by the AI system?
b. Are there any ‘trade off’s contemplated or known in relation to the proposed system, such as the trade off between accuracy of the tool vs its explainability – for instance a complex black box system will be more difficult to explain in a transparent way but the trade off is that such complexity may produce the most accurate and effective classifications?
c. How is the principle of data minimisation implemented and complied with?
d. What kind of output/ logs/ records would be available from the MIP operation? Would it be only logs with alarms triggered? What kind of information about each event would the log contain (i.e. employee name, email, data and timestamp, summary of the alert, etc.)
e. What would be the retention periods for such logs/ records? Who would have access to the logs (we need to ensure limited access on a need to know basis)?
f. Can we see a detailed end to end process for how the AI model will process personal data to:
i. learn during training stage
ii. operate in the live environment
- Scope -
a. Is it possible to exclude accounts to be scanned based on the location of the data subjects – e.g. exclude everyone based in a certain office because of concerns related to the legislation in said country?
a. What would be the consequences for individuals in case an email/ document has been wrongfully classified?
b. Has Microsoft prepared a DPIA on this tool, if so can they provide us with any information around their risks and mitigations?
c. Does all processing occur and does all information concerned reside, respectively, in Microsoft's cloud environment?