Data Integration
I like this term: it describes the main objective well... and as a matter of fact, this is a very popular problem many are trying to solve. I recently spent a week in NY and had a chance to talk to a few partners and customers in the financial vertical. Almost everyone have this need either because of compliance (SOX for example) or for getting a single high level view of business and just for simply consolidating things like customer data stuck in application silos … The problem is data is either in one of the various platforms (SQL Server, Oracle, DB2, Informix, Sybase etc.) or in one of the many representations (xml, comma delimited files, binary representations – images, movies etc.)…
So what does SQL Server 2005 do for this scenario?
- Many Ways to get to your data: you can still use the classic data access APIs; odbc, oledb, or ado.net. You can also talk SOAP. Create a new http endpoint for stored procedure or adhoc access. No connectivity libraries to install.
- Many ways to get to external data: We are used to linked servers representing external data in a relational manner through OleDb. It could be adhoc (opendatasource or openrowset) or well defined so you can reference the objects like they are local (server.database.schema.object or execute proc AT linked server name). In SQL Server 2005 you can also use the new bulk load provider to refer to files in the file system that are either in one of the bulk-loadable formats or you can represent the whole data as a single blob and do your own parsing. I have seen a great example of it where someone defined the whole xml file as blob and converted it to xml and used XQuery in the same query to shred it to relational pieces. You could also use your user defined types from SQLCLR to parse the binary data into your server side type.
- Transform and load in batch mode: Integration services provide a great value transforming the data in a very powerful and scalable way. If you were not a big fan of DTS I’d still encourage you to take a look at the new version. Huge stride forward.
- Move data close to real-time: replication is a great way to do that. Check out the new Peer-to-Peer replication and new support for replicating directly from Oracle to SQL Server.
Comments
- Anonymous
March 09, 2005
Actually I don't think data being in various platforms or in one of many representations is the most difficult problem to solve. Practically all platforms and standard representations of data are well documented by this time and its usually a matter of applying the right practices. The real vexing problems are the "application silos" mentioned earlier where you have data that is "owned" by an application which maintains the proprietary semantics and integrity of the data. In these cases more esoteric specialised application knowledge is needed to integrate the data not just platform technology knowledge.