Web Q&A: Printing from a Web Page, Screen Scraping, Origin of an HTTP Request, and More

2019-10-24

Printing from a Web Page, Screen Scraping, Origin of an HTTP Request, and More

Robert Hess

Q I need a way to use a control on a Web page as a print button, just like the print option on the MicrosoftÂ® Internet Explorer toolbar. I don't want the print dialog to pop up; I want to use the default printer, and I want to constrain the print count to one.

A The need to control printing from the browser is fairly common, and the amount of functionality you can provide will depend on which version of the browser is involved. Beginning with Internet Explorer 4.0, you could use the ExecWB function to invoke a rudimentary print process (see Figure 1) but you had no control over the print settings.
Internet Explorer 5.0 added support for the print method, which didn't really add anything new, but made the process easier. One drawback is that

  window.print()

will always bring up the print dialog.
      Now in Internet Explorer 5.5 there is a robust process for creating print previews as well as printing called print templatesâ€"HTML files that can be created by accessing the object model. I should note, however, that this extended functionality is only available when you use C++ with the WebBrowser Control. For more information, see https://msdn.microsoft.com/workshop/browser/hosting/
printpreview/reference/behaviors/templateprinter.asp.

Q What exactly is a screen scrape? Is it legal or illegal?

A Screen scraping is the act of programmatically evaluating the information displayed on the screen and extracting from it the specific information you need. Sometimes this is done simply to display the information in a richer fashion, and sometimes it is used for sending information to another location for storage. Web search engines can be considered screen scrapers. They look at the HTML of a page and record the keywords they find.
      I'm not a lawyer, nor do I play one on TV, but there are both legal and illegal ways to do a screen scrape. A legal application would be something like a search engine. An illegal screen scraper would be one that steals content from another Web site and presents it as original content.
      The methods of performing a screen scrape are as varied as its applications. In the old days, a program would either access the terminal's display buffer directly, or capture content from the I/O stream. It would then walk the display buffer character by character, looking for particular keywords. Perhaps it would look for "Name:" and then look at the text that followed in order to find the value of "name." In this manner, information could be shared between applications that otherwise had no way of passing data back and forth.
      If you have a particular situation in which you need to do a screen scrape, then you should look at the format of the information you are trying to read (I assume it's HTML-based), then figure out how to best interpret that information to fit your needs. With HTML you could do things like look for a <TITLE> tag, then extract the title of the document, locate the first fragment of displayable text and use that as a summary. The pages you are reading might have description or author <META> tags which might be useful, or there might be some other formatting you can use to identify specific information.

Q Is there anything in an HTTP request (or in the Request object in ASP) that indicates if the request comes from a frame or from the top document?

A If I understand the question correctly, you should be able to use the HTTP_REFERER header information for this. For example, let's assume your pages have the following structure: https://server/path/default.html is the frameset definition, https://server/path/menu.html is the menu displayed in one of your frames, and https://server/path/document.asp is the document that is displayed.
      Upon server-side rendering of document.asp, the HTTP_REFERER, as retrieved via

  <%=Request("HTTP_REFERER")%>

will be https://server/path/default.html when the document page is loaded as part of the initial display of the frame (<frame src="document.asp">). It will be https://server/path/menu.html when the document is loaded by the user clicking on a targeted link in the menu frame.
As long as your document page is properly testing that its referer is a valid option, you should be able to find out during server-side rendering if the document is being called as part of an overall frameset definition.

Q I want to display three option buttons on my Web page, but only allow one to be selected at any time. I want to be able to pass the value of the selected button to an Active Server Page. The buttons do not need to be bound to a data source. Any suggestions?

A By using the same name attribute for each radio button, the user will only be allowed to select one (see Figure 2). This is a safe choice because you won't want to pass a value for more than one button.

Q I'm having problems aligning text with a DIV. The result of the following code is that the first DIV text is centered and the second DIV text is aligned right. What am I doing wrong?

  <style>
.buttonHyperLink
{position:relative;align:center;color:white;background-color:black;
width:177px;height:19px;font-familyt:arial;font-size:14px;
font-weight:bold;v-align:top;cursor:hand;}
</style>
<table> <tr>
    <td align="right"><a href="../products/index.htm">
    <div class="buttonHyperLink" align="center" >
    Product Information</div></a></td>
    <td align="right"><a href="../products/index.htm">
    <div class="buttonHyperLink" >Product Information
    </div></a></td></tr></table>

A Unfortunately, you are confusing align (an attribute of an HTML element) with text-align (a Cascading Style Sheets (CSS) style). In your <style> definition, just change

  <div align=center>

to the CSS syntax

  <div style="text-align:center">

and everything will work fine.

Q In the April 2000 Web Q & A column you said that developers should not perform long database tasks without providing some sort of feedback about the status of the process to the users. You suggested that instead, the operation should be performed asynchronously from the Web page.
      Could you please explain how to do this? If I should be using Remote Data Services (RDS), how can I avoid exposing the ConnectString and SQL string in the source code?

A There are many different ways to approach this problem. A worst case scenario is one in which the database request is going to be terribly long and impossible to break down into discrete components, and in which you have no way of knowing how far along the processing is.
      Such a problem would probably be best solved using an independent process running on the server. The Web request comes in to begin the data query, and this request is communicated to either a master process that is constantly running, or it starts up a new independent process. In either case the Web request gets an ID back from the external process, and can then return to the user, telling them that the process has begun.
      When the user manually requests the current status of the query (or the Web page automatically wants to request this), it returns to the server with the ID, and either asks the external process if it is done yet, or perhaps looks the ID up in a database queue which is being used to communicate the status of the request. When the external process finishes the data request, it stores the result locally, then records in the database that the process has completed and the location of the data-result. Thus, when the request comes back in for the status of the query, it can return the appropriate result-set to the user.
      So, the big hammer approach to solving this sort of problem would be to write a WindowsÂ® service which would be launched at boot time and maintain a queue of data requests that would be coming in from the Web. It would expose a COM interface which would allow a server-side component to be created in an ASP page, connect to the service, place the request for processing, get a Queue ID in return, and the component could be shut down. On later Web requests, the component could be created again in an ASP page, and request status information for a particular Queue ID. This is definitely not an elegant approach, but it would most likely provide a workable solution.
      Most any method you choose will depend on some "semi-synchronous" approach in which there is a middleman database or process that is maintaining the state and position of the requested query. Otherwise, with every trip from client to server, a huge amount of data would need to be passed back and forth in which the state of the current request is maintained. This information can be encrypted to provide security.

Q I have a simple yet long form that consists of lots of small tables. I decided to break it into as many tables as possible so that I could gain the granularity such that when it came time to print out the form, the page break would happen between two of my tables. This has been consistent since Internet Explorer 3.0 but now Internet Explorer 5.5 wants to break exactly at the margins, cutting my tables in half! I read that on a page break the browser will attempt to show the whole table, otherwise it will move it to the next page.

A It looks as if you are simply relying on chance to get the page-breaks to land between your tables. So it isn't surprising that your page layout differs slightly from version to version of Internet Explorer, and in fact you will even notice that it might differ from printer to printer, as well as from system to system.
      You could set the page break before style on a <TD> which will force a page break before it prints out that cell. The best way to make sure that page breaks occur where you want them is to use the CSS feature for this, page-break-before:always.
      Thus you could say

  <table style="page-break-before:always">

in the table that you want to start on a new page.
      You can find more information at: https://msdn.microsoft.com/workshop/author/dhtml/reference/properties/pageBreakBefore.asp.

Q My application has a page that allows updates to some input fields and stores the values to a SQL database. When the user submits the form, it inserts the record into the table and displays a new page with a "Record Added" message. When I return to the update page, it does not reload the page with the right contents. I need to press F5 to refresh and display the updated values.
      I know there is an option in Internet Explorer to set the page to reload every time it loads, but this is not necessary for every user. How can I embed the RELOAD function in my code?

A Hold on! Think about this for a second. Every time the page loads, it will call the reload method, which will call it to load again, and again, and again.
      What you really want is to prevent the page from being grabbed out of the client-side cache, and instead have it be retrieved fresh from the server.
      What you should do is use the following code:

  <%
   Response.CacheControl = "no-cache"
   Response.AddHeader "Pragma", "no-cache"
   Response.AddHeader "Expires", "0"
%>

Or, if you aren't using ASP, you could use these META tags:

  <META HTTP-EQUIV="Pragma" CONTENT="no-cache">
<META HTTP-EQUIV="Expires" CONTENT="0">

      For more information see the Knowledge Base article on Forward and Back Button behaviors at https://support.microsoft.com/support/ kb/articles/Q199/8/05.ASP.

Q I need to develop an ActiveXÂ® control that can control hardware interruptions. How can I accomplish this?

A The handling of hardware interrupts is done via a device driver. So if you have custom hardware that you need to manage, you will need to write a device driver to work with the hardware down at the interrupt level. You can then write a fairly standard ActiveX control which interfaces with the device driver and does whatever needs to be done. For example, a keyboard is a hardware device that generates interrupts. Applications don't camp out on the hardware channel for this device, instead, there are keyboard device drivers that understand the mechanics of the device at the hardware level, and then abstract its functionality out to the Windows operating system and other applications that might need to see the information that is coming in from it.
      To write a device driver you will need the Windows DDK, which you can download from https://www.microsoft.com/ddk.

Q I am considering using HTTP and XML in my application instead of WinSock with a custom data transmission protocol. I'd like to use port 80 to handle my HTTP traffic to avoid firewall issues, but I need to know if imposing my own HTTP traffic on port 80 will interfere with other applications such as Internet Explorer or other Web browsers?
      In other words, how does an instance of Internet Explorer filter out HTTP traffic that is intended for it?

A Neither Internet Explorer nor any browser is in charge of monitoring all network traffic. All traffic management is handled by the Network Transport Layer provided by the operating system. The easiest way to develop an application that uses HTTP connectivity is to use the WinINet functions. This allows you to connect to remote internet hosts and use FTP and GOPHER as well.
      For more information, see the article "Microsoft Win32 Internet Functions Overview".

Q I'd like to write a server application to handle communications with my HTTP-enabled client app by listening for the inbound HTTP requests, parsing through them and performing some task based on the results. The application would then return some data to the client via HTTP, either in the headers or the body of the response. I'd like to build a WinSock-based listening module to do this (if I've understood the WinINet documentation correctly, it doesn't work on the server side of things). Can I manually write strings of HTTP and ship them out through a socket as a response back to the client this way?

A First of all, you're correct, WinINet works on the client side only. As for how best to perform server-side processing and I/O, instead of writing your own HTTP transport, you might want to look into ISAPI or ISAPI filters. This is a way to write an actual program that runs on the server and performs I/O based on incoming connections. In the case of ISAPI, the incoming connection is directed at the application itself, in the case of an ISAPI filter, it participates in all connections, providing things such as hit tracking, compression, or report generation.

Comments and Corrections

In researching the answer to the question about SQL Serverâ„¢ Connection strings in the November issue, I was unable to find a documented method for supplying alternate port information. Since then, two of my astute readers came through with two different methods that they use to do this.

I just wanted to point out in response to the port number question in the Web Q & A column in the November 2000 issue of MSDNÂ® Magazine, that if your SQL Server has TCP/IP enabled, you can certainly access your data through ports other than 1433. For example, here is a connection string I use to connect to a SQL Server that is exposed through port 1510:
  <%
' â¢â¢â¢
   cst = "provider = SQLOLEDB;" &_
      "network = DBMSSOCN; server = 1.2.3.4,1510;" &_
      "database = pubs; uid = sa; pwd = password"
' â¢â¢â¢
%>
The secret is to separate the IP address and the custom port number by a comma, and to make sure you override any other network settings by forcing TCP/IP (DBMSSOCN). This information can also be found at https://www.aspfaq.com/faq/faqShow.asp?fid=120

And another good suggestion:

I was just reading the Web Q & A in the November 2000 issue of MSDN Magazine. Your answer to the connection string question is not 100 percent correct. You can specify the port in the connection string. The following sections need to be added to the connection string using the SQL OLEDB provider.
  Network Library=DBMSSOCN;Network Address=NMGWEB1,1432;
or
  Network Library=DBMSSOCN;Network Address=10.0.0.11,1432;
The Network Library key tells the driver to use TCP/IP. The default is named pipes, unless the user has configured the machine to use TCP/IP by default. The first parameter of the Network Address is the machine name or IP address. Specifying the IP address would be faster because it wouldn't have to resolve the name. The second parameter of the Network Address is the port. I changed the SQL configuration on my machine from the default of 1433 to 1432 for a test.

Robert Hess is currently the host of "The MSDN Show" and is a regular contributor to various areas of MSDN. Send e-mail to webqa@microsoft.com

From the January 2001 issue of MSDN Magazine

Share via

Comments and Corrections

Additional resources