How to get number of occurrences in varbinary datatype?

aravind aravind 6 Reputation points
2022-08-12T10:03:34.027+00:00

I have a table with varbinary datatype (file stream enabled) column which store pdf files as binary , form that i can get result as which of file have 'invoice' word using following query.

 SELECT * FROM FileStore WHERE CONTAINS(Document,'invoice')  

Here "Document" is varbinary column which store pdf and word files as binary.

From this i want to get words which appear how many times, i mean number of occurrences.

E.G: In one of the pdf got 'invoice' word 5 times , so want to count how many time word appear in that table using query,

Note: i am asking in varbinay datatype, not for nvarchar datatype.

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,552 questions
{count} votes

4 answers

Sort by: Most helpful
  1. Erland Sommarskog 100.8K Reputation points MVP
    2022-08-12T21:59:27.79+00:00

    There are a couple DMVs for fulltext that can help you. I think you should start with sys.dm_fts_index_keywords_by_document .

    0 comments No comments

  2. Tom Phillips 17,716 Reputation points
    2022-08-15T12:38:10.98+00:00

    SQL Server does not have a function which counts the strings in full text indexes. It only identifies they exist or do not exist in the document.

    0 comments No comments

  3. Isabellaz-1451 3,616 Reputation points
    2022-08-16T06:23:39.043+00:00

    Hi @aravind aravind

    This can be done, but it’s not straight-forward because SQL Server doesn’t have a specific function to do it.

    I refer to this thread,hope it will help you :https://sql-bits.com/sql-server-how-to-count-occurrences-of-a-substring-in-a-string/

    Best Regards,
    Isabella


  4. Erland Sommarskog 100.8K Reputation points MVP
    2022-08-19T21:16:04.11+00:00

    No , this is not, in that link they mention how to configure fulltext index and so on,

    No, not in the link I was pointing to. Seems that you did not read it very carefully. Anyway attached is a demo. And, no, in this demo there is no binary data, but that shouldn't matter. You are querying the fulltext index, not the binary raw data. (I could include the code directly in the post, because apparently, I triggered a spam filter.)

    233000-slask.txt