How to split concatenated CSV into several sources

Antoine Leca 21 Reputation points
2021-07-19T15:27:31.507+00:00

As input I have a (rather large) text file which is the concatenation of several CSV files, each having its own header line then all the data lines. As a rule the first column is the name of the "subCSV"; the first field of the to-be-header line looks like xxxx_CAB while the other left fields for the xxxx sub-table are xxxx_DAT.

A small extract would look like this (in reality there are about 22 sub-tables, some with thousands of lines, and up to 20 fields):

FACT_CAB    CICLO   FACT_NO FECHA_EMISION
FACT_DAT    20200922    CI0916783512    20200922
RCFE_CAB    FACT_NO CAT_CONCEP  TIPO_CONCEP
RCFE_DAT    CI0916783512    Cuotas  Voz
RCFE_DAT    CI0916783512    Cuotas  Internet y Datos
RCFE_DAT    CI0916783512    Consumo Voz

I would like to transform all of this into several "sources" (or tables, or queries.) And I do not know how to start with this in M.

I have read about reading all the lines then some functions like Text.StartsWith()/Text.Split()/Text.Select(), but I see it as inefficient if I should repeat the same stuff 22 times; plus dealing with lacking sub-tables, or a newly added sub-table and the like.
Is there some smarter way to deal with that?

Not Monitored
Not Monitored
Tag not monitored by Microsoft.
39,119 questions
{count} votes

Accepted answer
  1. Lz._ 9,006 Reputation points
    2021-07-20T11:18:08.313+00:00

    Hi @Antoine Leca

    What I was writing was more about avoiding to have to deal with 22 queries, which will read 22 times the same content, filter it (to different subsets) and then perform similar steps: I believe the costs of all those full-size readings are going to be rather prohibitive

    Not really. Actually once you get your big CSV in Power Query (I bypass a couple of required steps) you Group that big table by xxx_CAB, yyy_CAB... In your case you end-up with 22 nested table, something like:

    116322-demo.png

    and load that query as a Connection only - this is a kind of staging query. Then, with VBA (or manually) you access each nested table to load it as a new query/table on a sheet

    Hope this clarify things a bit. Now, if this is a one time exercise and you know how to do it with VBA that's probably the best approach as doing what I describe above and then write some VBA to iterate over each nested table or doing it manually will necessarily take more time than a VBA script

    Any question or need for clarification let me know

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.