Smart URL refresh doesn't work (adds duplicate QnA Pairs)
Hi,
I am refreshing a URL to one of my sources in Langauge Studio using the "Refresh URL"-button. According to the documentation https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/question-answering/how-to/smart-url-refresh, the refresh should be "smart" in the sense that it should add, delete and merge QnA pairs. However, when I try this feature without updating the content of the source (in this case a Word document), it just adds new QnA pairs. As a result, I just have a duplicate of every QnA pair. Does anyone know why this might happen?
Azure AI Language
Azure AI services
-
VasaviLankipalle-MSFT • 18,576 Reputation points
2023-02-28T22:40:51.88+00:00 Hi @Daniel Hjelm , Thanks for using Microsoft Q&A Platform.
Please keep in mind that smart URL refresh is only applicable to URL sources. I tried refreshing the URL (such as a web page: FAQ page), and it worked fine for me. Could you please share more details on how you are trying this? Are you trying with word document URL?
Regards,
Vasavi -
Daniel Hjelm • 50 Reputation points
2023-03-01T09:43:38.5833333+00:00 Hi @VasaviLankipalle-MSFT , thank you for your answer!
I also tried using a Web Page such as this: https://www.microsoft.com/en-us/microsoft-365/microsoft-365-for-home-and-school-faq and it worked fine as well.
I am trying to use the Smart URL refresh with a URL to a Word document which is stored in an Azure Storage blob. I want to store it in a format where I can utilize the Multi-turn formatting: https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/question-answering/reference/document-format-guidelines#multi-turn-document-formatting. I am storing the Word document in Sharepoint as well where a user can update the document and then I created a Logic App that checks if the document in Sharepoint is updated and updates the blob version in Azure Storage. Then I was thinking I could use Smart URL refresh on that URL to fetch new and updated information to Language Studio. I am doing this because the Security team didn't approve the permissions listed here: https://learn.microsoft.com/en-us/azure/cognitive-services/qnamaker/how-to/add-sharepoint-datasources#active-directory-manager-grant-file-read-access-to-qna-maker which the QnaMakerPortalSharepoint app requires.
From your answer, I guess a URL to Word document doesn't work. So what would you suggest I should use in my case?
Thanks!
-
VasaviLankipalle-MSFT • 18,576 Reputation points
2023-03-02T02:36:28.0966667+00:00 @Daniel Hjelm , Thanks for sharing your scenario with us. Let me double check on this and get back to you.
-
VasaviLankipalle-MSFT • 18,576 Reputation points
2023-03-06T05:55:23+00:00 Hi @Daniel Hjelm , Thank you for your patience. Looks like smart URL refresh with a URL to a Word document stored in an Azure Storage blob works, but the blob's permissions have to be public. Can you please try this? Thanks!
-
VasaviLankipalle-MSFT • 18,576 Reputation points
2023-03-08T03:38:26.53+00:00 Hi @Daniel Hjelm , anything more you are looking help for?
-
Daniel Hjelm • 50 Reputation points
2023-03-09T15:15:00.5866667+00:00 Yes it does work as you said. However, when I refresh the URL in Language Studio it is not “smart” in the sense that it adds duplicates of non-updated questions. For example, when I download this web page: https://www.microsoft.com/en-us/microsoft-365/microsoft-365-for-home-and-school-faq as a HTML page, upload it to Azure Storage as a blob, use a the URL to it to import the content to Language Studio and refresh the URL, it adds all the questions again and I get duplicates of all questions. This does not happen if I just use the URL above directly in Language Studio. Can you please check on this?
-
VasaviLankipalle-MSFT • 18,576 Reputation points
2023-03-10T05:14:33.7833333+00:00 Hi @Daniel Hjelm , thank you for sharing this with us. Please allow some time to check internally and get back to you.
-
VasaviLankipalle-MSFT • 18,576 Reputation points
2023-03-14T15:12:53.49+00:00 @Daniel Hjelm, Thank you for your patience. Is it possible to share the content with us you have tried to import the content to Language Studio and refresh the URL that adds all the questions? So that we can repro on our end as well.
-
Daniel Hjelm • 50 Reputation points
2023-03-15T12:48:33.9566667+00:00 @VasaviLankipalle-MSFT I provided an example of the content I've tested in my latest comment. That is, this page: https://www.microsoft.com/en-us/microsoft-365/microsoft-365-for-home-and-school-faq
Note that as I stated above, the refresh works perfectly when I provide the URL directly to Language Studio. However, if I download the page as an HTML file, put it as a blob in Azure Storage and provide a URL to that blob, it doesn't work. It just adds all the questions again.
So for you to reproduce on your end you have to upload whatever QnA content/file/page (use for example the page I provided above), put it in an Azure blob, use the URL to that in Language Studio and try if the Smart URL works.
-
VasaviLankipalle-MSFT • 18,576 Reputation points
2023-03-15T13:14:31.82+00:00 @Daniel Hjelm , you mean to say when used something like this "downloads/path/What%20is%20Microsoft%20Office%20and%20Microsoft%20365%20_%20FAQs.html" it added duplicates right?
-
Daniel Hjelm • 50 Reputation points
2023-03-16T08:16:59.6366667+00:00 @VasaviLankipalle-MSFT Yes, something like that. I use the "Generate SAS" for the blob so the URL looks something like this: https://azurestorage.blob.core.windows.net/qna/What%20is%20Microsoft%20Office%20and%20Microsoft%20365%20_%20FAQs.html?sp=r&st=2023-03-16T08:10:49Z&se=2023-03-16T16:10:49Z&spr=https&sv=2021-12-02&sr=b&sig=HeL5qPY%2Bk%2BlmdNsR2IHFbRvsYRmF3kl%2F052RADNTAMk%3D
-
VasaviLankipalle-MSFT • 18,576 Reputation points
2023-03-17T05:06:52.3433333+00:00 Hi @Daniel Hjelm , I understand what you're trying to accomplish. If you have already added the URL https://www.microsoft.com/en-us/microsoft-365/microsoft-365-for-home-and-school-faq to the source, the knowledge base will contain a total of 67 pairs in this case. When you add a different source type,https://azurestorage.blob.core.windows.net/qna/What%20is%20Microsoft%20Office%20and%20Microsoft%20365%20_%20FAQs.html?sp=r&st=2023-03-16T08:10:49Z&se=2023-03-16T16:10:49Z&spr=https&sv=2021-12-02&sr=b&sig=HeL5qPY%2Bk%2BlmdNsR2IHFbRvsYRmF3kl%2F052RADNTAMk%3D but still it's the same content, it adds duplicates as you instructed to add to the knowledge base, so the pairs will now be 134. This is expected behavior because we are creating a new source and adding them as a new pair. The term "smart URL refresh" refers to the process of getting the most recent content from a source URL and updating the corresponding project with a single click.
When I used the URL from the blob as you have suggested, the refresh worked perfectly. Please try to delete the existing knowledge base pairs and try once again.
I hope it helps.
-
Daniel Hjelm • 50 Reputation points
2023-03-17T10:01:20.0633333+00:00 Hi again @VasaviLankipalle-MSFT ,
No, I am not using both URLs but only this one: https://azurestorage.blob.core.windows.net/qna/What%20is%20Microsoft%20Office%20and%20Microsoft%20365%20_%20FAQs.html?sp=r&st=2023-03-16T08:10:49Z&se=2023-03-16T16:10:49Z&spr=https&sv=2021-12-02&sr=b&sig=HeL5qPY%2Bk%2BlmdNsR2IHFbRvsYRmF3kl%2F052RADNTAMk%3D
When I update this one, it adds duplicates.
Sign in to comment