Import Web Log File Data
The Web server log import DTS task imports Web log file data generated by your Web site into the Data Warehouse.
The configuration of your application and Web servers will affect the log file data imported during the Web server log import DTS task. You use the Web server log import DTS task to specify which log files to import into the Data Warehouse, and where they are located. Web server log files stores data that is obtained when users visit your site and click links to site pages. The data includes the length of time a user spent visiting your site, referring site, ad clicks, ad reach, click frequency, and the path the user takes through the site, including entry and exit pages.
Imported Data
Use the Import Web Server Logs Dialog Box to specify basic and advanced properties governing the Web server log import DTS task. You specify what log files to import and where they are located.
The following tables list the columns extracted from the W3C log file, the transformation made, and the tables the data is saved to in the Data Warehouse.
Most Data Warehouse tables also contain a SiteName column. This column, just as TableID and TableInternalFlag, is also not listed in the tables in the following sections because the SiteName is always extracted from the same location for all DTS tasks.
Source columns from the W3C log file |
Transformation |
Target columns from the Request table in the Data Warehouse |
---|---|---|
N/A |
Generated Generated by the log import application. Incremented for each request line of a log file that is stored. This is unique across different imports. |
RequestNum |
cs-uri-query |
Copy Column Contains the complete query string from the log file. |
QueryString |
cs-uri-stem |
Copy Column |
Uri |
cs-bytes |
Copy Column Client to server bytes. |
BytesReceived |
sc-bytes |
Copy Column Server to client bytes. |
BytesSent |
N/A |
Generated This column is generated within the log import application as a unique identifier. |
VisitNum |
date and time |
Concatenation Convert to local time. |
TimeStamp |
N/A |
Generated This is the key member of the SiteURL class. The OLE DB provider will resolve this to SiteURLID in the physical SiteURL table. |
URL |
N/A |
Generated Determines whether this entry is considered a Request (versus a Hit).This can be set to false when the hit is filtered. The most common filtered hit is when an image is requested and it has a Commerce Event attached to it. |
IsRequest |
cs(Referer) |
Copy Column |
ReferrerURL |
cs(Referer) |
Middle of String Takes only the domain part of the URI. |
ReferrerDomainName |
cs-uri-query (UPM cookieless) OR cs(cookie) OR cs-username OR cs-ip + cs(User-Agent) |
Generated The user key is the first 8 bytes of an MD4 hash of whatever is used as the user key. What is used as the user key depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address and browser agent string. |
UserKey |
sc-win32-status |
Copy Column |
Win32Status |
sc-status |
Copy Column |
HttpStatus |
N/A |
Generated Log import will match a log line to a configured server binding (including the port number (s-port) and IP address (s-ip)). If the server binding is classified as being secure, this column will be set to true. |
IsSecure |
cs-method |
Generated GET = 1, POST=2, HEAD=3 |
Method |
time-taken |
Copy Column |
TimeTaken |
N/A |
Generated The first request of a visit detected within the log import is 1, and is successively incremented for each subsequent request in the visit. |
RequestIndex |
N/A |
Generated If log import detects there is no Commerce Event query string on this line, this will be set to false. |
HasCommerceEvent |
N/A |
Generated This is incremented for each instance of a log import. |
TaskID |
cs-uri-stem |
Middle of String Takes only the first level of the URI, if available. |
Level1Dir |
cs-uri-stem |
Middle of String Takes only the second level of the URI, if available. |
Level2Dir |
cs-uri-stem |
Middle of String Takes only the third level of the URI, if available. |
Level3Dir |
cs-uri-stem |
Middle of String Takes only the fourth level of the URI, if available. |
Level4Dir |
cs-uri-stem |
Middle of String Takes only the fifth level of the URI, if available. |
Level5Dir |
cs-uri-stem |
Middle of String Takes only the sixth level of the URI, if available. |
Level6Dir |
cs(User-Agent) |
Copy Column |
UserAgentName |
UPM Cookie Key Or cs(cookie) Or cs-ip + cs(User-Agent) |
Generated The UserId is the first 8 bytes of an MD4 hash of whatever is used as the user ID. What is used as the user ID depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address and browser agent string. |
UserID |
cs-uri-query |
Copy Column MD4 Hash |
QueryStringKey |
N/A |
Generated Used for check pointing support. |
SubTaskID |
cs-uri-stem |
Copy Column MD4 Hash |
UriKey |
cs(Cookie) |
Middle of String Only the matched cookie is extracted. The shipping schema does not physically store this property, although the schema can be logically extended to store it. |
Cookie |
The VisitInfo table represents closed user visits. This class is written from log import when the visit has closed. A visit is closed when an external referrer was encountered or a visit time-out occurred.
Source columns from the W3C log file |
Transformation |
Target columns from the VisitInfo table in the Data Warehouse |
---|---|---|
N/A |
Generated |
CountOfRequest |
N/A |
Generated |
Duration |
N/A |
Generated |
FirstTimeStamp |
N/A |
Generated |
FirstHTimeStamp |
N/A |
Generated This column is generated within the log import application as a unique identifier.This is a foreign key to the Visit table. It is also a foreign key to the VisitInfo table. However, the VisitInfo table is only inserted when the user’s visit is finished. |
VisitNum |
N/A |
Generated This is incremented for each instance of a log import. |
TaskID |
cs-uri-query (UPM cookieless) OR cs(cookie) OR cs-username OR cs-ip + cs(User-Agent) |
Generated The user key is the first 8 bytes of an MD4 hash of whatever is used as the user key. What is used as the user key depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address and browser agent string. |
UserKey |
cs(User-Agent) |
Copy Column |
UserAgentName |
|
|
FirstRequestNum |
|
|
SecondRequestNum |
|
|
LastRequestNum |
|
|
FirstUriKey |
|
|
SecondUriKey |
|
|
LastUriKey |
cs-uri-query (UPM cookieless) OR cs(cookie) OR cs-username OR cs-ip + cs(User-Agent) |
Generated The user key is the first 8 bytes of an MD4 hash of whatever is used as the user ID. What is used as the user ID depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address and browser agent string. |
UserID |
cs(Referer) |
Generated Only the domain part of the URI is loaded. |
ReferrerDomainName |
N/A |
Generated Used for check pointing support.This is a foreign key to the LogImportSubTask table. |
SubTaskID |
The LogImportSubTask class is used internally within the Web log import to track checkpoints for the restart feature.
Source columns from the W3C log file |
Transformation |
Target columns from the LogImportSubTask table in the Data Warehouse |
---|---|---|
N/A |
Generated Used for check pointing support. |
SubTaskID |
N/A |
Generated Incremented for each instance of a log import. |
TaskID |
N/A |
Generated A foreign key to the Site table. |
SiteID |
N/A |
Generated Server group name for which you are importing. There is a unique ServerGroup for each virtual server (for IIS, a distinct log file). |
ServerGroup |
N/A |
Generated A log file name within the import. |
FileName |
N/A |
Generated The offset within the log file where the checkpoint occurred. |
ByteOffsetHigh |
N/A |
Generated The offset within the log file where the checkpoint occurred. |
ByteOffsetLow |
N/A |
Generated The W3C fields directive at the time of the checkpoint. |
Field |
N/A |
Generated Current log file date and time when the checkpoint occurred. |
Date |
The OpenUserVisit class represents users who have active visits at the time of the end of a log import. To modify how open user visits are handled by the Web server log import DTS task you must adjust the log file rotation settings.
Source columns from the W3C log file |
Transformation |
Target columns from the OpenUserVisit table in the Data Warehouse |
---|---|---|
N/A |
Generated The count of requests in this current open visit. |
CountOfRequest |
N/A |
Generated |
FirstTimeStamp |
N/A |
Generated |
LastTimeStamp |
N/A |
Generated Generated within the log import application as a unique identifier. |
VisitNum |
cs-uri-query (UPM cookieless) OR cs(cookie) OR cs-username OR cs-ip + cs(User-Agent) |
Generated The user key is the first 8 bytes of an MD4 hash of whatever is used as the user key. What is used as the user key depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address and browser agent string. |
UserKey |
N/A |
Generated This is incremented for each instance of a log import. |
TaskID |
cs(User-Agent) |
Copy Column |
UserAgentName |
N/A |
Generated Foreign key to the request table. The first request in this visit. |
FirstRequestNum |
N/A |
Generated Foreign key to the request table. The second request in this visit. |
SecondRequestNum |
N/A |
Generated Foreign key to the request table. The last request in this visit. |
LastRequestNum |
N/A |
Generated Foreign key to the URI table. The first URI in this visit. |
FirstUriKey |
N/A |
Generated Foreign key to the URI table. The second URI in this visit. |
SecondUriKey |
N/A |
Generated Foreign key to the URI table. The last URI in this visit. |
LastUriKey |
cs-uri-query (UPM cookieless) OR cs(cookie) OR cs-username OR cs-ip + cs(User-Agent) |
Generated The user ID is the first 8 bytes of an MD4 hash of whatever is used as the user key. What is used as the user ID depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address and browser agent string. |
UserID |
cs(Referer) |
Middle of String Only the domain part of the URI is loaded. |
ReferrerDomainName |
N/A |
Generated Used for check pointing support. |
SubTaskID |
Source columns from the W3C log file |
Transformation |
Target columns from the SiteSummary table in the Data Warehouse |
---|---|---|
N/A |
Generated This is calculated during log import. All valid hits the parser was able to successfully parse from the log file. |
TotalHits |
N/A |
Generated Total number of hits that were not filtered out by exclude criteria. |
TotalImportHits |
N/A |
Generated Total number of visits (open and closed) that were encountered. |
TotalVisits |
N/A |
Generated Number of hits that were excluded by the server exclude criteria. |
TotalBadServerCount |
N/A |
Generated Number of hits that were excluded by not being able to match a configured site. If this number is high, check your application configuration. |
TotalBadSiteCount |
N/A |
Generated Number of hits that were excluded by the excluded host criteria. |
TotalExcludeHostCount |
N/A |
Generated Total number of hits that were excluded by all criteria. |
TotalExcludeCriteriaCount |
N/A |
Generated Number of hits that were excluded by the crawler exclude criteria. |
TotalMatchCrawlerCount |
N/A |
Generated Physical start time of the import. |
StartTime |
N/A |
Generated Physical end time of the import. |
EndTime |
Source columns from the W3C log file |
Transformation |
Target columns from the LevelDir table in the Data Warehouse |
---|---|---|
N/A |
Generated First 8-bytes of MD4 Hash |
URI1Key |
N/A |
Generated First 8-bytes of MD4 Hash |
URI2Key |
N/A |
Generated First 8-bytes of MD4 Hash |
URI3Key |
N/A |
Generated First 8-bytes of MD4 Hash |
URI4Key |
N/A |
Generated First 8-bytes of MD4 Hash |
URI5Key |
N/A |
Generated First 8-bytes of MD4 Hash |
URI6Key |
Source columns from the W3C log file |
Transformation |
Target columns from the PathInfo table in the Data Warehouse |
---|---|---|
N/A |
Generated |
FirstTimeStamp |
|
|
PathKey |
|
|
PathTypeEnum |
N/A |
Generated This is incremented for each instance of a log import. |
TaskID |
Source columns from the W3C log file |
Transformation |
Target columns from the LogUser table in the Data Warehouse |
---|---|---|
cs-uri-query (UPM cookieless) OR cs(cookie) OR cs-username OR cs-ip + cs(User-Agent) |
Generated The user key is the first 8 bytes of a MD4 hash of whatever is used as the user key. What is used as the user key depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address plus browser agent string. |
UserKey |
|
|
UserDomainName |
|
|
DateCreated |
cs-username |
Copy Column |
UserName |
N/A |
Generated If the user has a cookie, this is set to True. |
HasCookie |
s-ip |
Copy Column |
IpAddress |
cs-uri-query (UPM cookieless) OR cs(cookie) OR cs-username OR cs-ip + cs(User-Agent) |
Generated The user key is the first 8 bytes of a MD4 hash of whatever is used as the user key. What is used as the user key depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address plus browser agent string. |
UserID |
Source columns from the W3C log file |
Transformation |
Target columns from the URI table in the Data Warehouse |
---|---|---|
cs-uri-stem |
Copy Column |
URI |
N/A |
Generated This is the home page URL that was matched during log import. This is the first specified URL. |
URL |
cs-uri-stem |
Middle of String Takes only the first level of the URI, if available. |
Level1Dir |
cs-uri-stem |
Middle of String Takes only the second level of the URI (if one exists). |
Level2Dir |
cs-uri-stem |
Middle of String Takes only the third level of the URI (if one exists). |
Level3Dir |
cs-uri-stem |
Middle of String Takes only the fourth level of the URI (if one exists). |
Level4Dir |
cs-uri-stem |
Middle of String Takes only the fifth level of the URI (if one exists). |
Level5Dir |
cs-uri-stem |
Middle of String Takes only the sixth level of the URI (if one exists). |
Level6Dir |
cs-uri-stem |
Copy Column MD4 Hash |
UriKey |
Related Sections
How to Run the Web Server Log Import DTS Task