File conversion encoding default

Question

File conversion encoding default

Anonymous

I have a large Word document using the DATABASE field pulling lookup data from a CSV file. Updating the fields causes a pop-up box asking for the File conversion encoding. Despite this being Windows default it asks on every single instance. So I have to sit at the keyboard and click OK for each of a few thousand entries. I don't really have the few hours this takes and it is hardly productive. Is there a way of fixing this? Some while ago, using an earlier version of Word, I could set it running and it would automatically update - now I have to sit and watch it.

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments

3 answers

Answer 1

A few things that might help...

Does your process allow you to open the .csv in Word (as a text file) and save it as a .docx, then specify the .docx in the DATABASE fields rather than the .csv?

You should be able to modify the DATABASE fields by using Alt-F9 to display field code, then use a global Find/Replace.

You will probably get the encoding dialog once when you open the file, but because .docx files are basically Unicode files anyway, Word should not need to ask for an encoding when it opens the .docx in the DATABASE field.

[Another variation on this theme would be to write a small piece of code to convert the file to UTF-8 format, which I think Word will always recognise. In arecent conversation, someone mentioned an approach described here: https://stackoverflow.com/questions/2524703/save-text-file-utf-8-encoded-with-vba .

You can create a key in Windows registry that tells Word what encoding to assume for text files that it opens. You basically create a DWORD entry called DefaultCPG in the registry under HKEY_CURRENT_USER\Software\Microsoft\Office\16.0\Word\Options (for Word 2016 - if you are using an older version of Word you will need to change the "16.0" to the appropriate number). In that case, Word should use the encoding you have specified in that key to open the file. (For UTF-8, for example, the value should be 65001. If your locale is US English, you should try the LCID for that locale, which is 1033, and so on.).
There is one other possibility if Word is connecting to the data source using the OLE-DB method. It may help if you can post the code of the DATABASE field before pursuing that.

Answer 2

As far as I can tell the second solution will work OK. As a general rule these days, things are likely to go more smoothly if you can get the export in UTF-8 format, but I think even in that case you will need the DefaultCPG value set to 65001 (decimal). Here, I leave a DefaultCPG value in my registry file, but rename it to DefaultCPH when I do not need it.

I would guess that even in those parts of the World where people rarely need the characters outside the US ASCII/Windows Western European character sets, genealogical data is increasingly likely to contain such characters, so if you can persuade your genealogical program's authors to ensure they export a .csv in UTF-8 format, so much the better. There are two varieties - one with a "Byte Order Mark" (BOM), which is a 3-byte sequence at the beginning of the file, and one without. I would suggest anyone coding an export options lets you choose whether to have the BOM or not.

I checked the third option again. Broadly speaking, Word is likely to open the .csv using its text converter, which means that the idea I had probably won't work. The idea is that if you can get Word to open the file using OLE DB instead, you can specify the encoding for your file in a schema.ini file. But actually, in order to do that, you actually have to rename the .csv to a .txt, introduce another file called a .odc file *and* create the schema.ini file. Further, it won't work if there are more than 255 or 256 fields in a row, and so on.

The performance problem you see is probably a consequence of having to open the data source file over and over again, quite possibly for every single DATABASE field Word accesses. Word, or Windows, or one of the other technologies involved might do some caching that would speed that up, but even with the .txt/.odc/.ini approach it is difficult to see that happening. If you feel the need to try to improve the performance, it might be better to get the data into Word some other way, or, if you are using DATABASE fields in a mail merge, to try to combine the merge data source and your DATABASE data sources prior to your mail merge. Difficult to be specific on the "how" without further info.

Answer 3

Hi Peter - thanks for such a quick and comprehensive response. I've tried the first option as it was no problem to spend an extra minute converting the data source to .docx format. There were 3110 .csv replacements and the main document ran without intervention from me for just under five hours (about 10 seconds per lookup). Maybe the opening and closing of the new Word source data file is a little slower than with the csv file but the automation outweighs that.

I might still try your second suggestion as it seems quite neat.

For your information in connection with a possible third solution, here is a standard field entry in the main document (I first wrote this over a decade ago, probably closer to two, so there may be better code now):

{DATABASE \d "E:\Genealogy\Books\{REF Name \* MERGEFORMAT }.docx" \c "Entire Spreadsheet" \s "SELECT Relationship FROM E:\Genealogy\Boks\{REF Name \* MERGEFORMAT }.docx WHERE ((Names='Elizabeth Sarah Love'))" \* MERGEFORMAT }

The 'Name' variable comes from an input prompt field at the beginning of the document and is used to identify the data source file:

{ASK Name "Who is the book for?"\*MERGEFORMAT}

The data source file is generated by a genealogy program which only has CSV and PDF outputs - I have suggested they add TXT/DOC options.

Share via

File conversion encoding default

3 answers