Share via

Issues executing bulk data transfers using the SQL Server bcp utility.

Juhi Bhatnagar 0 Reputation points
2026-06-15T06:46:25.35+00:00

SQL Server Tools & Utilities (bcp utility)
We attempted to use bcp out with the native format flag (-n) to stream data through a Linux named pipe (FIFO), the process encounters data corruption. This issue specifically triggers when the table contains text columns populated with embedded Unicode, binary data, or special characters.

Why This is Failing (Technical Context)

Through our investigation, we discovered a conflict between how BCP handles character data and how the Linux kernel manages named pipe buffers:

  1. The Encoding Trap: Even when Native Mode (-n) is active, the BCP client recognizes text columns as character data and attempts to apply code page translation between the SQL Server collation and the Linux client locale.
  2. Byte Mutation: Because the customer's data contains raw Unicode strings and special character byte sequences inside a standard text field, BCP's translation engine mangles the bytes mid-stream, altering the data's length and structure.
  3. Pipe Misalignment: A Linux named pipe expects strict byte boundaries. When BCP streams this mutated data, its internal block-size calculations get thrown off. The Linux pipe reader reads a corrupted byte as an EOF marker (causing truncation).

Questions to Microsoft Support

  1. Native Mode Behavior:** Why does the Linux BCP client attempt character code page translation on text/LOB columns when the Native Format (-n) flag is explicitly set? Is -C RAW an undocumented requirement when exporting text columns containing arbitrary Unicode/binary content?**
  2. Linux FIFO Bug: Is there a known limitation or buffer-handling bug within mssql-tools BCP when streaming raw/native binary data blocks through a POSIX named pipe (FIFO) on Linux?
  3. Long-Term Fix: Is a fix planned for the Linux BCP client to properly handle kernel pipe streaming without choking on block boundaries, or is writing to a flat file the only officially supported architecture?
  4. Configuration: Are there specific TDS packet sizes (-a) or Linux kernel pipe capacity configurations required to make Linux named pipes stable under heavy BCP binary streams?
Azure SQL Database

2 answers

Sort by: Most helpful
  1. Erland Sommarskog 134.7K Reputation points MVP Volunteer Moderator
    2026-06-15T09:18:55.7233333+00:00

    First, the text data type is deprecated, and for a good reason. Convert to varchar(MAX).

    BCP will apply character conversion even if with native format, because native format does mean that all code-page issues are removed. If you export data in native mode from one database with a Spanish collation and import it into a Polish collation, the character ñ will not survive, since this character is not available in varchar (or text) in a Polish collation with a legacy cod page.

    You can use the -C option to specify the code page of the source data, but I think that it is better to use a format file and specify the collation in the format file.

    Was this answer helpful?

    0 comments No comments

  2. AI answer

    2026-06-15T06:46:44.1733333+00:00

    The available documentation does not describe or confirm:

    • Any special or undocumented requirement such as -C RAW for text/LOB columns in native mode (-n).
    • Any known bug or limitation specific to using the Linux bcp client with POSIX FIFOs (named pipes) that would cause corruption or EOF misinterpretation.
    • Any planned fix or roadmap item for improving bcp behavior with Linux named pipes.
    • Any required or recommended TDS packet size (-a) or Linux kernel pipe configuration specifically for stabilizing bcp native/binary streams over FIFOs.

    From the provided material, the following points are documented and relevant:

    • Native mode (-n) is intended for SQL Server–to–SQL Server transfers and uses native (database) data types for noncharacter data and Unicode for character data, with better performance than -w and without prompting for each field. It is recommended when transferring data that contains ANSI extended characters and when taking advantage of native-mode performance.
    • Unicode character mode (-w) uses nchar storage, tab (\t) as field separator, and newline (\n) as row terminator, and is not compatible with -c.
    • On Linux and macOS, the default field terminator is tab (\t) and line terminator is newline (\n), and backslashes in command-line arguments must be escaped. Native mode (-n) is recommended for SQL Server–to–SQL Server transfers; character mode (-c) is recommended when crossing non–SQL Server systems or when the data file should not contain extended characters.
    • bcp enforces data validation and checks (for example, verifying that Unicode data has an even-byte length). Invalid data that might have loaded in older versions can now fail during bulk load.
    • The -a option is documented only as TDS version selection (-V in the context, not -a), and there is no mention of pipe-specific tuning.
    • The documentation explicitly notes that bcp supports native data files compatible with all supported SQL Server versions, but it does not extend this guarantee to streaming through FIFOs.

    Because the context does not cover:

    • Internal code-page handling for text/LOB columns in native mode on Linux,
    • FIFO-specific behavior or bugs,
    • Any special guidance for using bcp with named pipes,
    • Or any roadmap commitments,

    no further authoritative details can be provided beyond the general bcp behavior and options described above.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.