(Guidance level questions, mostly)
Suppose you have a government agency responsible for motor vehicle and driver license registration (much like DMV in the US). They have separate legacy systems for vehicle registration and driver license registration and driver infractions, which may or may not use a central customer data system. Other organizations, both government and private, require read access to this data as well as reporting and analysis, both curated and ad hoc.
I've been learning Data Lake technologies trying to assess if dumping all of the OLTP data into a data lake to form an enterprise data lake could fulfill all these various needs. What I've learned so far doesn't make me confident enough that could recommend this general solution.
Quick Read Access
Suppose the police need quick read access to check to whom a vehicle is registered (and history as well), or how many infractions a driver has. These are both cases where row-level access is required, suggesting that we should have a raw area using Avro. Or possibly construct separate areas for current and historical data on top of raw area as the current data is most oftten used. I haven't gotten as far as building a proof-of-concept solution that would allow me to test if this solution would give us good enough performance (assuming the competing solution would be e.g. Elastic).
Analytics and Reporting
Other actors, such as the department of transportation and the transportation industry, want curated reports as well as do analytics to help their decision making. For that, AFAIK, we should construct separate areas using column-oriented file format such as Parquet, which would then open the doors for various analytics and reporting technologies.
There is a lot in this solution that appeals to me, but I have a lot of concerns as well. All OLTP systems are on-prem and all sorts of database technologies so transferring data to Azure Data Lake is a bit of a challenge, but Data Factory should be able handle that. Data security is a concern. Read access requires fine-grained access control as it deals with sensitive data. On the upside, as one example, instead of us building a service for the police that returns vehicle or infraction data, we could just give the police limited access to the appropriate areas in the data lake and tell them to build their app. Similarly, we could grant the department of transportation data scientists access to the curated data area so they can do their own analytics without bothering us. My main motivation for even researching this solution is that it brings all the data together, passed and future, and opens new possibilities for interaction between different actors without setting up new projects every time new data is needed. Instead, we could just say, "here's the data, knock yourself out".
What I'm looking for is some guidance as to the viability of this solution, so that I can make the decision if I should pursue it or abandon it altogether. I realize this is very high-level, but I'm sure others have struggled with the same questions. Whitepapers, references and case studies would be greatly appreciated.