Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Here are some notes on “SQL Server 2008 Integration Services (SSIS)” I took while attending an advanced class on SQL Server taught by Greg Low (from https://sqlblog.com/blogs/greg_low/ and https://www.sqldownunder.com/).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without some pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
SQL Server Integration Services (SSIS)
- Introduction
- See https://www.microsoft.com/sqlserver/2008/en/us/integration.aspx
- Tooling moved to Visual Studio
- When installing, select Business Intelligence Development Studio
- Templates, including Integration Services Connection
- See https://msdn.microsoft.com/en-us/library/ms141026.aspx
- See https://msdn.microsoft.com/en-us/library/ms141711.aspx
- Upgrade from DTS addressing previous issues: performance, language, all-in-one design surface
- Separating: Control Flow, Data Flow, Event Handlers, Package Explorer
- See https://msdn.microsoft.com/en-us/library/ms137973.aspx
Demo
- Open Visual Studio
- Create new project
- In Control, drag a two items from the toolbox (execute process task)
- Write notes to the control surface, text for documentation
- Noting the red (X) indicating something is missing
- Adding properties for each task (run calc.exe)
- Two calculators running at the same time, boxes turn yellow (running)
- Execution Results – your best friend, good place to look at what happened
- Package Explorer – Looking at the two tasks created
- Creating a sequence – green line means execute on success
- Check properties of the the green line: success/failure/completion
- Adding a third task. One runs on completion, one on success.
- Adding a fourth task. Depends on another. Two starting points now.
- Options to select two lines, configure properties, change from AND to OR…
ForEach
- Unlike DTS, looping containers. Processing file by file in a batch, for instance.
- Looking at the Foreach.
- Set the collection property. For each file, For each time, For each SMO (database).
- Drop a couple task inside.
- Much simpler than DTS, no need for specific variables, etc.
Data Flow
- Two tasks, data flow in the middle, links
- Some data flows from task to task.
- Data flow has its own tab, since the data flowing can be harder to describe.
- In the Toolbox: Data Source (like ADO.NET), Data Transformation.
- Drop an ADO.NET source
- Connection – look at objects, switch to SQL command, write a query
- Look at preview – very helpful
- Look at columns – can change the name in the “Output Column”
- Error output – What do you want to do when error occurs (you can fail, redirect, send to error output)
- There’s also a connection object
- Drop a Flat File Destination, connect the two.
- Flat File – Has a connection also (?!). File name, format, columns, preview
- Mapping – What input column goes to what output file columns
- Run the package. Data flow is red… Not good.
- Look at Progress… Can’t find column at the data source.
- Look at data source and mappings again.
- Run again, it’s happy…
Derived column
- Look at transformations: Derived column.
- Drop a derived column transformation.
- Properties, add as new columns, create an expression.
- Type an expression to add two columns, typecast for integer, return as Unicode.
- Expressions, typecasts (WSTR=Unicode)
- Add new flat file destination, now shows the derived column
- Run again. Note – run in data flow tab also runs everything, not just data flow.
- Looking at the resulting text file
Unicode
- Go to SSMS, create a new database
- Create TableA (ID int identity, description varchar(30))
- Create TableB (ID int , description varchar(30))
- Insert a few rows in table A.
- Now back Visual Studio, to move data from TableA to TableB
- Set up the connection to the database
- Add ADO.NET data source, use a SQL command to
- Different between ADO.NET destination and SQL database?
- Let’s try ADO.NET destination… Not really what we want.
- Trying SQL Server destination… Create new connection. Select database, TableB.
- Mapping are OK. Validation error: cannot convert between Unicode and non-Unicode (?!)
- DTS didn’t have any type safety … Now we do!
- Note – Everything it does internally is Unicode… Description is coming up as Unicode.
- How to fix? Add a data conversion. Input, Output, Data Type… Fix the mapping. Run. OK!
Script
- You can create your down data source with a script
- Source, Destinatio or Transformation.
- Looking at the properties of the script (script, input columns, inputs and ouput, connection)
- Script language options: Visual Basic 2008, Visual C# 2008
- You can then drop to code and write whatever you want (you get a starting template).
- You can also have a script task.
- Note – This ends up as compile code, so it has good performance
- Note – You can also write your own components in .NET
Also in the toolbox
- Sources: XML, Excel
- Destination: Data Mining model, SQL Server CE
- Transformations: Most are straightforward
Fuzzy lookups
- Using the fuzzy look transformation
- Only enterprise edition
- Looking for similar columns.
- You set the “similarity threshold” under “advanced”
- Useful for data de-duplication.
Lookup
- Lookup transformation
- Important for looking up data in data warehouse
- Looking for the key that’s different in the DW (surrogate keys in DW environment)
- Inferred members – I have a fact about the member, but I don’t know the member
- Inferred members in the error output in the lookup
- Slowly changing dimensions – Customer region as it is now or as it was then?
- Type two slowly changing dimension – Surrogate keys for column, associated with a start/end date
- Lookup performance is important, since you might have several lookups as you bring data in
- Optimizations - Cache mode: full cache (connection type), partial cache, no cache
Other transformations in data flow
- Merge
- Merge Join – sort order
- Union
- FTP – Basic, considering writing your own with .NET
- Web Service task – Also basic, remember you have full .NET support here…
Variables
- Whole set of system variables
- Add your own variables
- Can be scoped to package or component
- Scripts: ReadOnly or ReadWrite variables
Package
- Project files, can add different types of files.
- Looking at project properties. Can create a deployment utility.
- Use the build option. Create a package.
- Looking at the files that are part of it (.sln, .dtsx).
- In bindeployment – bits to deploy
- DTS package is just an XML file. You can open it with notepad.
- Config and SSISDeploymentManifest are also XML.
- Copy the files, right-click the manifest and click deploy, create desired agent jobs.
Package Configurations
- You can hard code connections in the package. Could be trouble down the line.
- Consider abstracting those.
- Check enable package configuration.
- Where do you want to store the configuration? Default is an XML file, good choice
- Other options: SQL Server, parent package variable, Registry, Environment variable
- Configuration properties: Long list of items, including database connections, working directory
- In connection, for instance, you can check the box for “Server Name”
- Look at XML configuration file. It still says <DTSConfiguration> :-)
- You can change it and run again with a custom value as you run in a different environment
- Design for that upfront. Don’t put that on later…
Development
- You could build the entire package in code.
- SSIS Developer Center in MSDN
- See https://msdn.microsoft.com/en-us/sqlserver/cc511477.aspx
- Webcasts on SSIS on MSDN site
- See https://www.sqlis.com/
- Codeplex examples
- See https://ssisctc.codeplex.com/
- Consider your source code control options.
SSIS 2008
- Improved Package Upgrade
- New ADO.NET Component
- New Data Profiling Task and Data Profile View – understand the data
- Connections Project Wizard – for connection-only projects
- Script Environment
- Improved Data type handing in Import/Export wizard
- Faster bulk loading
- Support for date/time data types, higher precision options
- Support for MERGE statement – option working row by row
- Support for Composable SQL – OUTPUT clause to track inserted, deleted rows, $action
Upgrades
- Upgrade advisor: Run before any upgrade (from DTS or SSIS 2005), reports on issues
- Package Migration Wizard: Creates new SSIS package, leaves the old one alone
- Package Migration Wizard – Careful – Can’t really deal with certains things, like transactions
- From SSIS 2005: Careful – Package upgrade is one-way only
- From DTS 2000: Consider using the DTS 2000 package task, dtsrun.exe still available
- From DTS 2000: Careful – ActiveX script tasks
- From DTS 2000: Careful – Connection related issues
- From DTS 2000: You really should consider re-writing … There are much simpler ways now!
- Consider works on SSIS 2008 ahead of time, before you install the SQL Server 2008 database
Manageability, Troubleshooting
- Design-time validation – Tries to connect to data sources to validated, can work offline
- Set breakpoints, locals, call stack, immediate window, row count transform
- For the data flow, consider using only a subset of data
- Data Viewer – Helps you see the data as it is flowing through
- Data Viewers - Commonly used with a Grid, also available as data charts
- Logging, log providers (SQL, text, etc..)
- Consider adding your own logging using the logging infrastructure
- Consider using package-level event-handler for OnError event (send e-mail, perform task)
- In SSMS – Connect to Integration Services (not associated with a specific SQL instance)
Change Data Capture
- Ideal for populating data warehousing
- Allows incremental loads
- Debug dump files (.mdmp and .tmp) provide info about what happened
- Consider using LSN to track instead not time (there are functions to convert)
- Return as BINARY(10), need to deal with conversions
- Books online example
- See https://msdn.microsoft.com/en-us/library/bb895315.aspx
Security
- Layered security. Both SQL and File System-based scenarios.
- Packages can be encrypted (levels), digitally signed, stored in SQL, secure with file system ACLs
- Can use SQL Server Agent to run SSIS Packages
- See https://support.microsoft.com/kb/918760/
- Where to store credentials? Consider storing the sensitive credential info in a database
Guidelines
- Keep packages simple, readable. Use comments, naming conventions.
- Consider dividing things into multiple packages, multiple containers.
- Understand the business: Kimball’s ETL and SSIS books
- Start with some blueprint design patters or ETL frameworks
- See Erik veerman, rushabh Mehta, pass summit 2008
- Careful – Have enough threads but understand asynchronous tasks, parallelism
- Consider using more T-SQL statements, MERGE statement comes to mind
- Careful – Avoid SELECT * and defining just “table”, which does a SELECT *
- Careful – Don’t bring data you don’t need: filter data in data source, use T-SQL there as required
- Careful – Consider optimizing the queries you use, covering indexes with INCLUDE
- Careful – Use IsSorted if input is ordered. SSIS won’t know otherwise…
- Consider using FastParse for text data that are clean
- Understand the data types returned by the databases, queuing systems
- Design your deployment strategy from the beginning, use configurations.
- Buffers: SSIS uses large buffers to optimize the process
- Careful - Width of the row used to define the size of buffer
- Consider removing output components to optimize buffer use
- Partially blocking components: Merge, Merge Join, Union All
- Blocking components: Sort, Aggregate
- CPU: Pipeline with multiple threads at once. Control with EngineThreads, MaxConcurrentExecutables
- Transactions: Default isolation is Serializable. Do you really need that?
- Discussion – Where to run SSIS – On same database server or another server? It depends :-)
Data Warehouse Loading
- ETL – Extract, Transform, Load
- ELT sometimes makes more sense…
- Can do some work at extract time, use advanced queries, typecasts
- Dimensions – Loaded first typically. Consider using multiple packages, run in parallel.
- Slow changing dimensions. Some built-in support. Might need to customize…
- Discussion – What to do with inferred members?
- Careful – Lookup transform is case sensitive
- Snowflakes – Dimensions that are subsets of other dimensions. Loading order is important
- Fact tables – Load dimensions first. Consider some advanced Lookup strategies, like MRU caching
- Data mining – SSIS can help populate a data mining model. Data mining query component.
Related links
- Project REAL
https://www.microsoft.com/sqlserver/2005/en/us/project-real.aspx - Top 10 SQL Server Integration Services Best Practices
https://sqlcat.com/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx - SSIS Team blog
https://blogs.msdn.com/mattm/