Share via

VBA Code to convert html to text file

Anonymous
2014-10-28T03:13:19+00:00

HI,

I am finding a VBA idea to convert html to text, I am working on excel 2010 and tried the following script but those are very slow.

    TextLine = ""

    Length = 1

    Flag = True   

    Line Input #1, HtText

    Do While Length <= Len(HtText)        

        ReadText = Mid(HtText, Length, 1)

        If ReadText = "<" Then

            Flag = False

        ElseIf ReadText = ">" Then

            Flag = True

        Else

            'Do Nothing

        End If        

        If Flag = True And ReadText <> ">" Then

            TextLine = TextLine & ReadText

        End If        

        Length = Length + 1    

    Loop


    Do While InStr(HtText, "<") > 0

        S = InStr(HtText, "<")

        E = InStr(S, HtText, ">")

        L = Len(HtText)

        If S = 1 Then

            HtText = Right(HtText, L - E)

        ElseIf L = E Then

            HtTex = Left(HtText, S - 1)

        Else

            HtTex = Mid(HtText, 1, S - 1) & Right(HtText, L - E)

        End If

    Loop


Please suggest me a very efficient method to do html to text conversion.

Note: Please note that the VBA script should work without using CreateObject("Scripting.FileSystemObject")

Microsoft 365 and Office | Excel | For home | Windows

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments No comments

Answer accepted by question author

Anonymous
2014-10-28T08:28:04+00:00

Hi,

method 1

download a .htm file from web

and SaveAs  .txt file  in a folder on "c:\ convert htm to txt"

Sub htm_to_txt()

On Error Resume Next

Application.ScreenUpdating = False

Sheets.Add

'copy htm from web

url1 = "http://www.hsfdatabase.com/tn_greenbrier2000.htm" '<< url address

With ActiveSheet.QueryTables.Add(Connection:="URL;" & url1, Destination:=Range("$A$1"))

.FieldNames = True

.RowNumbers = False

.FillAdjacentFormulas = False

.PreserveFormatting = True

.RefreshOnFileOpen = False

.BackgroundQuery = True

.RefreshStyle = xlInsertDeleteCells

.SavePassword = False

.SaveData = True

.AdjustColumnWidth = False 'True

.RefreshPeriod = 0

.WebSelectionType = xlSpecifiedTables

.WebFormatting = xlWebFormattingNone

.WebTables = "1" '"1,2,3,4"

.WebPreFormattedTextToColumns = True

.WebConsecutiveDelimitersAsOne = True

.WebSingleBlockTextImport = False

.WebDisableDateRecognition = False

.WebDisableRedirections = False

.Refresh BackgroundQuery:=False

End With

Dim v1 As Variant, v2 As Variant

v1 = Split(url1, "/")

v2 = Split(v1(UBound(v1)), ".")

Dim fN

fN = v2(0)

Dim nFd 'add a new folder

nFd = "c:\convert htm to txt"

If Dir(nFd, vbDirectory) = Empty Then MkDir nFd

'convert activesheet to text file

Dim r As Long, c As Long, i As Long, j As Long

r = ActiveSheet.Cells.Find(What:="*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row

Dim v() As Variant

Open nFd  & fN & ".txt" For Output As #1

For i = 1 To r

c = ActiveSheet.Cells(i, Columns.Count).End(xlToLeft).Column

ReDim v(1 To c)

For j = 1 To c

v(j) = Cells(i, j)

Next

Print #1, Join(v, vbTab)

Next

Close #1

'delete active sht

Application.DisplayAlerts = False

ActiveSheet.Delete

Application.DisplayAlerts = True

Application.ScreenUpdating = True

MsgBox "done"

End Sub

XXXXXXXXXXXXXXXXXXX

method2

convert all .htm files (on PC) to .txt files

htm files in a folder "c:\files htm"

results (txt files) in a new folder "c:\convert htm to txt"

Sub Convert_HTM_TXT()

'Oct 28, 2014

Dim wb As Workbook, wb1 As Workbook

Set wb = ThisWorkbook

Dim path1

path1 = "c:\files htm"    '<< source htm path

Dim path2 'add new folder

path2 = "e:\convert htm to txt"     '<< target txt path

If Dir(path2, vbDirectory) = Empty Then MkDir path2

Dim url1

url1 = Dir(path1 & "*.htm")

Application.ScreenUpdating = False

Do While url1 <> ""

Dim v1 As Variant, v2 As Variant

v1 = Split(url1, "")

v2 = Split(v1(UBound(v1)), ".")

Dim fN

fN = v2(0)

Application.DisplayAlerts = False

Set wb1 = Workbooks.Open(Filename:=path1 & url1)

ActiveWorkbook.SaveAs Filename:=path1 & "abc.xlsx", FileFormat:=xlOpenXMLWorkbook

Application.DisplayAlerts = True

wb.Activate

Dim r As Long, c As Long, i As Long, j As Long

Dim v() As Variant

Open path2 & fN & ".txt" For Output As #1

r = wb1.Sheets(1).Cells.Find(What:="*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row

For i = 1 To r

c = wb1.Sheets(1).Cells(i, Columns.Count).End(xlToLeft).Column

ReDim v(1 To c)

For j = 1 To c

v(j) = wb1.Sheets(1).Cells(i, j)

Next

Print #1, Join(v, vbTab)

Next

Close #1

wb1.Save

wb1.Close False

url1 = Dir()

Loop

Kill path1 & "abc.xlsx"

Application.ScreenUpdating = True

MsgBox "done"

End Sub

Was this answer helpful?

0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Anonymous
    2014-10-28T06:55:24+00:00

    Graham, Thanks for your quick reply.

    I have around 100 html files in a floder and I need to get some required details from it. But when I open it using a "open" command it given even html tag too. I want to supress the html tags and get only the text content as visible when i open it in browser.

    For that reason I am using the above code. But which is very slow if the html file is big.

    I need a support is there any single command for achiveing the above code or any idea to convert html to text using VBA.

    I hope the question is very clear now

    Was this answer helpful?

    1 person found this answer helpful.
    0 comments No comments
  2. Anonymous
    2014-10-28T06:29:18+00:00

    Do you need VBA for this? Open the file in a browser and SaveAs TXT. The result is almost instantaneous.

    Was this answer helpful?

    0 comments No comments