Extracting HTML Table with Powershell

tyler chau 1 Reputation point
2022-10-06T16:41:54.267+00:00

So I was able to use Read-HtmlTable and get the table back with the correct columns and rows but the data itself returns null. Note that data is constantly changing so not sure if this makes a difference. But anyone knows of a way for me to extract the VA value? Sorry I am still fairly new to powershell.

Instantaneous Average


VA - -

248109-image.png

Windows Server PowerShell
Windows Server PowerShell
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.PowerShell: A family of Microsoft task automation and configuration management frameworks consisting of a command-line shell and associated scripting language.
5,462 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Rich Matheisen 45,906 Reputation points
    2022-10-08T20:03:26.353+00:00

    That is the most screwed up set of tables!

    There are 5 tables in that page. Some of them actually work (i.e., there's a header for each column). The rest of them either use a non-breaking space for the 1st column header (which sometimes seems to throw the script into situation where it throws an exception), or puts the column header in one table and then the data in the following table (e.g., tableindex 2 and 3).

    Without trying to debug author's script, this is the best I could come up with:

    $w = Read-HtmlTable.ps1 -InputObject c:/junk/powerenergy_mhtml.htm -tableindex 0 # -Header 'Instantaneous','Pos Average','Neg Average'  
    $x = Read-HtmlTable.ps1 -InputObject c:/junk/powerenergy_mhtml.htm -tableindex 1  
    # skip tableindex2 -- it casses an exception  
    $y = Read-HtmlTable.ps1 -InputObject c:/junk/powerenergy_mhtml.htm -tableindex 3 -Header '1','2','3','4'  
    $z = Read-HtmlTable.ps1 -InputObject c:/junk/powerenergy_mhtml.htm -tableindex 4  -Header '1','2','3'  
    
    1 person found this answer helpful.

  2. Rich Matheisen 45,906 Reputation points
    2022-10-07T01:21:21.91+00:00

    If the web page has only one table, and the table has column headers, then this should allow you to get both the name of the column and each rows column data:

    .\Read-HTMLTable https://github.com/iRon7/Read-HtmlTable |  
        ForEach-Object{  
            $td = $_  
            $td.psobject.properties.Name |  
                ForEach-Object{  
                    $_              # the column name  
                    $td.$_          # this row's value for that column  
                    "----------------------------"  
                }  
                "=============================="    # end of   
        }  
    

    If the table has no headers you'll have to supply the values to be assigned to each column. I don't know how that would work if there were multiple tables without headers, though.


  3. Rich Matheisen 45,906 Reputation points
    2022-10-12T19:55:23.14+00:00

    Working from the HTML file (not the URL), here's one way of extracting the information from all the properties found by using the "id" attribute in the HTML elements. As long as those remain constant this should work. The only thing I can't tell from just the small data sample is whether or not anything other than "k" is used as a suffix for the values -- or whether or not there will ever be a negative value in any of the elements that use decimal fractions.

    # $v = Read-HtmlTable.ps1 -InputObject c:/junk/powerenergy_mhtml.htm -tableindex 0 -notrim # -Header 'Instantaneous','Pos Average','Neg Average'  
    # $w = Read-HtmlTable.ps1 -InputObject c:/junk/powerenergy_mhtml.htm -tableindex 1  
    # #x = Read-HtmlTable.ps1 -InputObject c:/junk/powerenergy_mhtml.htm -tableindex 2 -notrim -header 'blank1','Primary','2','3' # skip tableindex2 -- it casses an exception  
    # $y = Read-HtmlTable.ps1 -InputObject c:/junk/powerenergy_mhtml.htm -tableindex 3 -notrim -Header '1','2','3','4'  
    # $z = Read-HtmlTable.ps1 -InputObject c:/junk/powerenergy_mhtml.htm -tableindex 4 -notrim #-Header '1','2','3'  
      
    # Map HTML elements by "id" to $Table hash keys  
    $ids = @{  
        # Power /Real Time  
        INSWAT  = 'W Instantaneous'  
        PAVWAT  = 'W Pos Average'  
        NAVWAT  = 'W Neg Average'  
        INSVAR  = 'VAR Instantaneous'  
        PAVVAR  = 'VAR Pos Average'  
        NAVVAR  = 'VAR Neg Average'  
        INSPF   = 'PF Instantaneous'  
        PAVPF   = 'PF Pos Average'  
        NAVPF   = 'PF Neg Average'  
        INSV_A  = 'VA Instantaneous'  
        AVGV_A  = 'VA Average'  
        # Energy  
        WATHNET = 'Wh Net'  
        WATHREC = 'Wh Delivered'  
        WATHTOT = 'Wh Total'  
        WATHDEL = 'Wh Received'  
        VARHNET = 'VARh Net'  
        VARHPOS = 'VARh Delivered'  
        VARHTOT = 'VARh Total'  
        VARHNEG = 'VARh Received'  
        VAHTOT  = 'VAh Total'  
      
    }  
      
    # Power and Energy  
    $Table = [ordered]@{  
        # Power /Real Time  
        'W Instantaneous'   = '0.000 k'  
        'W Pos Average'     = '0.000 k'  
        'W Neg Average'     = '0.000'  
        'VAR Instantaneous' = '0.000 k'  
        'VAR Pos Average'   = '0.000 k'  
        'VAR Neg Average'   = '0.000'  
        'PF Instantaneous'  = '0.000 k'  
        'PF Pos Average'    = '0.000 k'  
        'PF Neg Average'    = '0.000'  
        'VA Instantaneous'  = '0.000 k'  
        'VA Average'        = '0.000k'  
        # Energy  
        'Wh Net'            = '0.000 k'  
        'Wh Delivered'      = '0.000 k'  
        'Wh Total'          = '0.000 k'  
        'Wh Received'       = '0.000 k'  
        'VARh Net'          = '0.000 k'  
        'VARh Delivered'    = '0.000 k'  
        'VARh Total'        = '0.000 k'  
        'VARh Received'     = '0.000 k'  
        'VAh Total'         = '0.000 k'  
    }  
      
    $wp = get-content c:/junk/powerenergy_mhtml.htm -Raw  
      
    $ids.GetEnumerator()|  
        ForEach-Object{  
            $k = $_.Key  
            $v = $_.Value  
            $r = "(?s)id\=""$k\"">(\d+\.\d+\s?k?|(-?\d+\s?k?))\</td>"  
            if ($wp -match $r){  
                $Table.$v = $matches[1]  
            }  
        }  
    

    There may be problems if the element id's and values are separated by newline breaks. But that looks like it could be easily solved by simply replacing the newline chacters by "nothing" or perhaps a single space. HTML ignores line breaks.

    0 comments No comments