Data Engineering with Avishkar: Pentaho Data Integration

December 05, 2014

Pentaho Data Integration

Pentaho Data Integration (also called Kettle) is the component of Pentaho responsible for the Extract, Transform and Load (ETL) processes. It can be used for following purposes:

Data Warehouse
Migrating data between applications or databases
Exporting data from databases to flat files
Loading data massively into databases
Data cleansing
Integrating applications

Spoon:
Spoon is the graphical tool with which you design and test every PDI process.
In Spoon, you build Jobs and Transformations. PDI offers two methods to save them:
Database repository and Files
If you choose the repository method, the repository has to be created the first time you execute Spoon. If you choose the files method, the Jobs are saved in files with the kjb extension, and the Transformations are in files with the ktr extension.
Starting Spoon
Start Spoon by executing spoon.bat on Windows, or spoon.sh on Unix-like operating systems. As soon as Spoon starts, a dialog window appears asking for the repository connection data. Click the No Repository button.

Data Engineering with Avishkar

December 05, 2014

Pentaho Data Integration

No comments:

Fashion Catalog Similarity Search using Datastax AstraDB Vector Database

Search This Blog