Megjegyzés
Az oldalhoz való hozzáféréshez engedély szükséges. Megpróbálhat bejelentkezni vagy módosítani a címtárat.
Az oldalhoz való hozzáféréshez engedély szükséges. Megpróbálhatja módosítani a címtárat.
Learn how to use Curl to run Apache Sqoop jobs on an Apache Hadoop cluster in HDInsight. This article demonstrates how to export data from Azure Storage and import it into a SQL Server database using Curl. Ez a cikk az Apache Sqoop és a Hadoop HDInsightban való használatának folytatása.
Curl is used to demonstrate how you can interact with HDInsight by using raw HTTP requests to run, monitor, and retrieve the results of Sqoop jobs. This works by using the WebHCat REST API (formerly known as Templeton) provided by your HDInsight cluster.
Előfeltételek
Befejeződött a tesztelési környezet beállítása az Apache Sqoop és a Hadoop használatával a HDInsightban.
A client to query the Azure SQL Database. Consider using SQL Server Management Studio or Visual Studio Code.
Curl. Curl is a tool to transfer data from or to a HDInsight cluster.
jq. The jq utility is used to process the JSON data returned from REST requests.
Familiarity with Sqoop. További információt a Sqoop felhasználói útmutatójában talál.
Submit Apache Sqoop jobs by using Curl
Use Curl to export data using Apache Sqoop jobs from Azure Storage to SQL Server.
Megjegyzés:
When using Curl or any other REST communication with WebHCat, you must authenticate the requests by providing the user name and password for the HDInsight cluster administrator. You must also use the cluster name as part of the Uniform Resource Identifier (URI) used to send the requests to the server.
For the commands in this section, replace USERNAME with the user to authenticate to the cluster, and replace PASSWORD with the password for the user account. Replace CLUSTERNAME with the name of your cluster.
A REST API védelméről alapszintű hitelesítés gondoskodik. You should always make requests by using Secure HTTP (HTTPS) to help ensure that your credentials are securely sent to the server.
A könnyű használat érdekében állítsa be az alábbi változókat. Ez a példa egy Windows-környezeten alapul, szükség esetén módosítsa a saját környezetéhez.
set CLUSTERNAME= set USERNAME=admin set PASSWORD= set SQLDATABASESERVERNAME= set SQLDATABASENAME= set SQLPASSWORD= set SQLUSER=sqluserFrom a command line, use the following command to verify that you can connect to your HDInsight cluster:
curl -u %USERNAME%:%PASSWORD% -G https://%CLUSTERNAME%.azurehdinsight.net/templeton/v1/statusYou should receive a response similar to the following:
{"status":"ok","version":"v1"}Use the following to submit a sqoop job:
curl -u %USERNAME%:%PASSWORD% -d user.name=%USERNAME% -d command="export --connect jdbc:sqlserver://%SQLDATABASESERVERNAME%.database.windows.net;user=%SQLUSER%@%SQLDATABASESERVERNAME%;password=%PASSWORD%;database=%SQLDATABASENAME% --table log4jlogs --export-dir /example/data/sample.log --input-fields-terminated-by \0x20 -m 1" -d statusdir="wasb:///example/data/sqoop/curl" https://%CLUSTERNAME%.azurehdinsight.net/templeton/v1/sqoopEzen parancs paraméterei a következők:
-d - Since
-Gisn't used, the request defaults to the POST method.-da kéréssel küldött adatértékeket adja meg.user.name - The user that is running the command.
command - The Sqoop command to execute.
statusdir - The directory that the status for this job will be written to.
This command will return a job ID that can be used to check the status of the job.
{"id":"job_1415651640909_0026"}
A feladat állapotának ellenőrzéséhez használja az alábbi parancsot. Replace
JOBIDwith the value returned in the previous step. For example, if the return value was{"id":"job_1415651640909_0026"}, thenJOBIDwould bejob_1415651640909_0026. Revise location ofjqas needed.set JOBID=job_1415651640909_0026 curl -G -u %USERNAME%:%PASSWORD% -d user.name=%USERNAME% https://%CLUSTERNAME%.azurehdinsight.net/templeton/v1/jobs/%JOBID% | C:\HDI\jq-win64.exe .status.stateIf the job has finished, the state will be SUCCEEDED.
Megjegyzés:
This Curl request returns a JavaScript Object Notation (JSON) document with information about the job; jq is used to retrieve only the state value.
Miután a feladat állapota sikeresre változott, lekérheti a feladat eredményeit az Azure Blob Storage-ból. A
statusdirlekérdezéssel átadott paraméter tartalmazza a kimeneti fájl helyét, ebben az esetbenwasb:///example/data/sqoop/curl. This address stores the output of the job in theexample/data/sqoop/curldirectory on the default storage container used by your HDInsight cluster.You can use the Azure portal to access stderr and stdout blobs.
To verify that data was exported, use the following queries from your SQL client to view the exported data:
SELECT COUNT(*) FROM [dbo].[log4jlogs] WITH (NOLOCK); SELECT TOP(25) * FROM [dbo].[log4jlogs] WITH (NOLOCK);
Korlátozások
- Bulk export - With Linux-based HDInsight, the Sqoop connector used to export data to Microsoft SQL Server or Azure SQL Database doesn't currently support bulk inserts.
- Batching - With Linux-based HDInsight, When using the
-batchswitch when performing inserts, Sqoop will perform multiple inserts instead of batching the insert operations.
Összefoglalás
As demonstrated in this document, you can use a raw HTTP request to run, monitor, and view the results of Sqoop jobs on your HDInsight cluster.
For more information on the REST interface used in this article, see the Apache Sqoop REST API guide.
Következő lépések
Use Apache Sqoop with Apache Hadoop on HDInsight
For other HDInsight articles involving curl: