VBA method, edit CSV string for "stray" commas

Question

VBA method, edit CSV string for "stray" commas

Anonymous

Am looking for "best" way to determine if there are extra commas in a csv string from imported data so I can:

Ignore the extra commas

Fix them.

There's a LOT of explanation here, I'm sorry I could not make this any shorter. The key question is at ***

I have working code now, psuedo code follows, but am looking a better technique that's different from what I have

in mind or another way to "look" at the data.

The App: Several users are adding/changing contact data in a Google Gmail account. Contact data is exported by

me to a csv file via Google export functionality, and the .csv file is opened with Excel(Tab delimited) in my

macros and eventually ends up in a names master worksheet. Opening the *.csv using a comma delimter did not

work well for me as it was harder to process multiple excel rows resulting from one exported name.

I have not yet "asked Google" (I don't know if that's possible?) what their rules are for bookending text with "

" when the text contains commas or other special characters. See the images below.

The "offending" Google fields most of the time are addresses and a Notes field. Users are instructed to NEVER

use a comma anywhere in the Gmail contact screen, but they're human and they forget. The time it takes to fix this

error is becoming too much.

The exported data VBA challenge is that Google often puts the exported contact data into MORE then 1 row for one

contact name's data.

This is working now: Row 1 of the export has the Gmail field names of what's exported,

simplified example: "__" = fields not shown

Name,Given Name,Additional Name,Family Name,__,Notes,Group Membership,__,Address # 1 - Type,Address # 1 -

Formatted,,Address # 2 - Type,Address # 2 - Formatted,__

the quantity of fields from Name through Group Membership are FIXED. Thereafter, the quantity of fields can vary

depending on how many names are exported and how many varying types of data are attached to the name. (multiple

email addresses, phone#'s and postal addresses.)

zHoldAy = split(range("a1"), ",")

Row1UBnd = UBound(zHoldAy) 'all exported name rows SHOULD have the same ubound

'EndRow of input .csv is found

'mainline loop

For row = 2 to EndRow

DataAy = split(Range("A" & row), ",")

if UBound(DataAy) = Row1UBnd then

' all is well, no extra commas, normal processing.

else

CSVline = Range("A" & row)

GoSub SeeBelow

end if

next row

Exit Sub

this is an image of row 1 of an export. It presents here as wrapped.

Name,Given Name,Additional Name,Family Name,Yomi Name,Given Name Yomi,Additional Name Yomi,Family Name Yomi,Name

Prefix,Name Suffix,Initials,Nickname,Short Name,Maiden Name,Birthday,Gender,Location,Billing Information,Directory

Server,Mileage,Occupation,Hobby,Sensitivity,Priority,Subject,Notes,Group Membership,E-mail 1 - Type,E-mail 1 -

Value,E-mail 2 - Type,E-mail 2 - Value,Phone 1 - Type,Phone 1 - Value,Phone 2 - Type,Phone 2 - Value,Phone 3 -

Type,Phone 3 - Value,Phone 4 - Type,Phone 4 - Value,Address 1 - Type,Address 1 - Formatted,Address 1 -

Street,Address 1 - City,Address 1 - PO Box,Address 1 - Region,Address 1 - Postal Code,Address 1 - Country,Address

1 - Extended Address,Address 2 - Type,Address 2 - Formatted,Address 2 - Street,Address 2 - City,Address 2 - PO

Box,Address 2 - Region,Address 2 - Postal Code,Address 2 - Country,Address 2 - Extended Address,Organization 1 -

Type,Organization 1 - Name,Organization 1 - Yomi Name,Organization 1 - Title,Organization 1 -

Department,Organization 1 - Symbol,Organization 1 - Location,Organization 1 - Job Description,Custom Field 1 -

Type,Custom Field 1 - Value

Below are the exported rows for 1 name. The leading two digits are the row numbers(for illustration here, row #'s

are NOT IN the text) as the data will wrap in this presentation. Note that Google puts a double quote sign to the

left of data ok in the Google Notes field. It does NOT end this field with a " at the beginning of row 15 as the

note text did NOT contain a comma. If the text HAD a comma within it, the first two characters of row 15 would be

", The same holds true for address data. Cityname, MN 55423 is a bad address but I can live with it; the

USPS does not like commas and I could strip them later.

13Alma yyyker xxxxxren,Alma,yyyker,xxxxxren,,,,,,,,,,,,,,,,,,,,,,"data ok nz 27jun11 via telecon w/ Alma. prefers

two emails.

14

15,* My Contacts,* Home,@comcast.net,HomeB,@gmail.com,Home,123 123 9947,,,,,,,Home,7513 hhhhhhh Av

South

16Cityname MN 55423",,,,,,,,,,,,,,,,,,,,,,,,,,

*** The best I can come up with is to scan the column A cell for ," as the start of text that contains comma(s)

or special characters, BYPASSING "" if there's a double quote sign within the text. If the 'right side'

bookend of ", is not in the same row, look at subsequent rows.

- Since the Notes field is the big culprit for embedded commas, maybe build CSVline in parts. The Base 0 element

of it is known, work on the data to its left, work on the Notes, then work on the varying fields to the right of

"Group Membership"

I'm OPEN for any suggestions. Thanks.

SeeBelow: 'CONCAT ROWS 'TIL 'STANDARD' UBound QTY IS REACHED.

NextRow = Row + 1

Do

NextLine = Range(A & NextRow)

'here's where the extra commas get nasty

zNextAy = Split(NextLine, ",")

UBndNext = UBound(zNextAy) ' UBnd is prior row

CSVline = CSVline & NextLine

' ? are we getting to a row that completes a name's data ?

zHoldAy = Split(CSVline, ",")

UBnd = UBound(zHoldAy)

If UBnd = Row1UBnd Then 'Done, Next row completed the concat

'interim output sheet all data in 1 csv cell

OutWs.Range(A & OutRow) = CSVline

OutRow = OutRow + 1 'set up for next output

ElseIf UBnd < Row1UBnd Then 'not done, look at next row down

NextRow = NextRow + 1

Else 'ERROR CONDITION, TOO MANY COMMAS IN CSVLINE

Application.ScreenUpdating = True

ActiveWindow.ScrollRow = Row

Range("A" & Row & ":" & "A" & NextRow).Select

MsgBox UBnd & " MORE then Std UBound of " & Row1UBnd & vbCr & vbCr & "Look for dbl Quo's and commas", _

vbCritical, "Google Export"

GoTo Quit 'the equivalent of Exit Sub

End If

Loop Until UBnd >= Row1UBnd

Row = NextRow + 1

Return

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments

2 answers

Answer 1

Yes it helps a lot. For the fixed part of Textline the third argument will be 24 as the Notes

field is 25 so extra commas are OK. I will have to use the row 1 values as a better mapping tool to the Address fields and I think divide the truly variable portion of Textline into multiple splits to achieve what I want. I've outlined data examples below, so a decently quick read should enable you to say yay or nay. Thanks.

I apologize for the crappy problem statement, but I was not able to communicate the essence of the problem.

Sadly, the "complexity" of the data determines how Google exports it and how excel opens the .csv exported file. let's try a simple no-problem situation, an "easy" problem, and then the worst that I see by explanation. The row 1 literal text values are FIXED and put there by Google.

Simple No-problem: row 1 of exported file, followed by 1 data row out of say, 50 'name data' rows.

Name,FldA,FldB,Notes,Address Type,Address Formatted

Neal,,,A simple text note,Home,123 Main St City NY 12345

Note: this one is so simple that when Excel opens the .csv, the data IS ALREADY put into columns and my macro has a lot less work to do.

the Ubound of row 1 is 5 after the split. Row1UBnd = 5

splitting a name data row gives same UBound, so I KNOW there are no extra commas.

------end of simple no problem -----------------------------

Simple Problem: All of the data is in column A. Same row 1, after the spitting, same UBound of 5.

Name,FldA,FldB,Notes,Address Type,Address Formatted

The Mary data row is bad due to the embeded comma in the Note and the Address, say the string is in cell A27. Google bookends a bad field with " and "

3 would be the third argument in the Textline below, and (3) would contain the note AND the address type and address data, but I can find this easily by looking for " as the 1st character of (3) and then looking for the " to its right. I think I should be able to use the row 1 data to figure out what the (#) index values are for the data in the Worst case below. Cetainly worth a try.

Mary,,,"This note, has a comma in it.",Home,"456 Elm St XXXX, MA 02489"

The goal is to allow a note of: This note, has a comma in it. to flow downstream.

The goal is to allow an address of: 456 Elm St XXXX, MA 02489 to flow downstream

The Worst case is just more data over more rows. A solution to the above problem will work for the worst case data below.

Worst Case: Row 1 has a UBound of 66, there are 700 names,

The actual address portions of the row 1 data is:

Address # 1 - Type,Address # 1 - Formatted,Address # 2 - Type,Address # 2 - Formatted,,Address # 3 - Type,Address # 3 - Formatted

The data for 1 name can use up to 5-6 excel rows in the way Google writes it.

The challenge here is that i'll have to concatenate more than one row or count differently to know when the data for (25), the notes data ends.

Albatross,,,,,,,,,,,,,,,,,,,,,,,,,"this note text is two paragraphs, and uses 5 Excel rows to hold it

xxx,

xxx",other fields,Home,"123 Street

Kalamazoo, MI,

12345-1234", more address stuff here, rest of Google record here

Answer 2

I know you put a lot of effort into your explanation, but I had trouble following it. It is too detailed and too specific to follow if you are not familiar with it or what you are trying to do with it. However, let me tell you what I think you may be asking about and, if I am correct, how to solve it.

I'm going to use a simplified example (the numbers are wrong, but you should concetrate on the technique used). It sounds like you have text lines with a fixed amount of field, comma delimited, followed by a comment field which may contain commas that you do not want treated as delimiters. Let's say this is your text line...

TextLine = "one,two,three,this comment section's text, and any commas in it, is one field"

and that in it, there are 3 fields of data and everything after the third data field is comments which may contain comas. If that is your situation, you can split this data into four exact fields.... three data field and a fourth comment field like this...

FieldArray = Split(TextLine, ",", 3)

The contents of the FieldArray will now be as follows...

FieldArray(0) ==> "one"

FieldArray(1) ==> "two"

FieldArray(2) ==> "three"

FieldArray(3) ==> "this comment section's text, and any commas in it, is one field"

The third argument to the Split function is the upper bound you want to impose on the split out array.

Does any of the above address your problem?

Share via

VBA method, edit CSV string for "stray" commas

of it is known, work on the data to its left, work on the Notes, then work on the varying fields to the right of

2 answers