I've been digging into this, and I 'm afraid that this is a dead end. I wrote a longer post to explain why, but the forum software ran into an unexpected error. To read the longer answer, see https://www.sommarskog.se/temp/slask.txt.
filestream fulltext-index language-for-wordbreaker
Trying again....(Not sure why my previous post was deleted)
I need to use "_" as a word breaker for full text search. Setting the language as Korean does what I need: SELECT * FROM sys.dm_fts_parser ('Happy_Hour',1042,0,0)
When I put "'Happy_Hour' inside a txt file and have a column FileStream for this file content, setting Korean as Language-Word-Breaker doesn't work for FullText Search indexing on the FileStream column. I was using sys.dm_fts_index_keywords_position_by_document to check the indexed keywords, only "Happy_Hour" is indexed.
Any ideas? Thank you Thanks
SQL Server | Other
-
Erland Sommarskog 122.4K Reputation points MVP Volunteer Moderator
2023-06-14T20:09:46.5066667+00:00
2 additional answers
Sort by: Most helpful
-
José Antonio Campero Morales 105 Reputation points
2023-06-13T22:11:12.55+00:00 Based on your description, it seems like you are facing an issue with full-text search indexing on a FileStream column when using the Korean language as the word breaker. Specifically, you mentioned that only "Happy_Hour" is indexed instead of being split into separate keywords.
First, it's important to note that the behavior of full-text search indexing can be influenced by various factors, such as the language settings, the version of SQL Server you are using, and the configuration of the full-text index. It's possible that there might be a configuration issue or limitation causing the problem you're experiencing.
Here are a few suggestions to troubleshoot the issue:
- Check the language settings: Make sure that the language settings for the FileStream column and the full-text index are both set to Korean. Double-check that the language identifiers are correct. The language identifier for Korean is 1042, as you mentioned in your example.
- Verify the word breaker behavior: You can use the sys.dm_fts_parser function to test the word breaker behavior directly on the FileStream column. Try running a query similar to the one you provided, but replace "Happy_Hour" with the name of the FileStream column. This will help you confirm whether the word breaker is functioning as expected.
- Check the full-text index configuration: Ensure that the full-text index on the FileStream column is set up correctly and that the appropriate word breaker and language settings are applied. You may need to recreate the full-text index if there are any configuration issues.
- Consider upgrading SQL Server: If you are using an older version of SQL Server, it's possible that there might be limitations or issues with full-text indexing. Upgrading to a newer version could potentially resolve the problem.
- Consult the SQL Server documentation or community: If the issue persists, it may be helpful to consult the official SQL Server documentation or reach out to the SQL Server community for further assistance. They may have specific insights or suggestions based on your SQL Server version and configuration.
Remember to take appropriate precautions when making any changes to your database or configuration. It's recommended to test changes in a non-production environment before applying them to your live system.
-
Ping S 40 Reputation points
2023-06-14T15:31:51.8533333+00:00 Thanks again.
- I couldn't figure out how to check the language setting for the filestream column. Could you point me to the right direction?
- I ran the following query but the result isn't too promising
select A.*, cast(FileBin as varchar(max)), B.* FROM [Test].[dbo].DemoFileStreamTable_FTSTest A cross apply sys.dm_fts_parser(A.FileBin,1042, NULL, 0) B Result: FileBin(FS column): 0x73776565745F737472617762657272795F69735F61735F676F6F645F6173 Converted String: sweet_strawberry_is_as_good_as Keyword: 0x777365655F7474736172627772657972695F5F737361675F6F6F5F647361 display_term: 睳敥彴瑳慲扷牥祲楟彳獡束潯彤獡
- EXEC sp_help_fulltext_columns shows the filestream column fulltext_language is 1042
After these 3 points, I'll check further 4 and 5. Thanks!