Talend Open Studio Cookbook

Book description

Getting familiar with Talend Open Studio will greatly enhance your data handling and integration capabilities. This is the perfect reference book for beginners and intermediates with a host of practical recipes that clarify even complex features.

  • A collection of exercises covering all development aspects including schemas, mapping using tMap, database and working with files
  • Get your code ready for the production environment by including the use of contexts and scheduling of jobs in Talend
  • Includes exercises for debugging and testing of code
  • Many additional hints and tips regarding the exercises and their real-life applications

In Detail

Data integration is a key component of an organization’s technical strategy, yet historically the tools have been very expensive. Talend Open Studio is the world’s leading open source data integration product and has played a huge part in making open source data integration a popular choice for businesses worldwide.

This book is a welcome addition to the small but growing library of Talend Open Studio resources. From working with schemas to creating and validating test data, to scheduling your Talend code, you will get acquainted with the various Talend database handling techniques. Each recipe is designed to provide the key learning point in a short, simple and effective manner.

This comprehensive guide provides practical exercises that cover all areas of the Talend development lifecycle including development, testing, debugging and deployment. The book delivers design patterns, hints, tips, and advice in a series of short and focused exercises that can be approached as a reference for more seasoned developers or as a series of useful learning tutorials for the beginner.

The book covers the basics in terms of schema usage and mappings, along with dedicated sections that will allow you to get more from tMap, files, databases and XML.

Geared towards the whole lifecycle, the Talend Open Studio Cookbook shows readers great ways to handle everyday tasks, and provides an insight into all areas of a development cycle including coding, testing, and debugging of code to provide start-to-finish coverage of the product.

Table of contents

  1. Talend Open Studio Cookbook
    1. Table of Contents
    2. Talend Open Studio Cookbook
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers and more
        1. Why Subscribe?
        2. Free Access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    8. 1. Introduction and General Principles
      1. Before you begin
      2. Installing the software
        1. How to do it…
      3. Enabling tHashInput and tHashOutput
        1. How to do it…
    9. 2. Metadata and Schemas
      1. Introduction
        1. Schema metadata
        2. Schemas
          1. Repository schemas
          2. Generic schemas
            1. Shared schemas
            2. Generated data sources
          3. Fixed schemas and columns
      2. Hand-cranking a built-in schema
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There’s more...
          1. Date patterns
          2. Nullable elements
      3. Propagating schema changes
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There’s more…
      4. Creating a generic schema from the existing metadata
        1. How to do it…
          1. How it works…
      5. Cutting and pasting schema information
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There’s more…
      6. Dropping schemas to empty components
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There’s more…
      7. Creating schemas from lists
        1. Getting ready
        2. How to do it...
        3. How it works…
        4. There’s more…
    10. 3. Validating Data
      1. Introduction
      2. Enabling and disabling reject flows
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more...
        5. See also
      3. Gathering all rejects prior to killing a job
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more...
        5. See also
      4. Validating against the schema
        1. Getting ready
        2. How to do it…
        3. How it works…
      5. Rejecting rows using tMap
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
      6. Checking a column against a list of allowed values
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
        5. See also
      7. Checking a column against a lookup
        1. Getting ready
        2. How to do it…
        3. How it works…
      8. Creating validation rules for more complex requirements
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
        5. See also
      9. Creating binary error codes to store multiple test results
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
          1. Decrypting the error code
    11. 4. Mapping Data
      1. Introduction
        1. The tMap component
          1. Single line of code
          2. Batch versus real time
      2. Simple mapping and tMap time savers
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      3. Creating tMap expressions
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
          1. Testing expressions
          2. Expression editor
          3. Getting around the 'one line' limitation
        5. See Also
      4. Using the ternary operator for conditional logic
        1. Getting ready
        2. How to do it...
          1. Single ternary expression: if-then-else
          2. Ternary in ternary: if-then-elsif-then-else
        3. How it works…
        4. There's more…
      5. Using intermediate variables in tMap
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
      6. Filtering input rows
        1. Getting ready
        2. How to do it...
        3. How it works…
        4. There's more…
      7. Splitting an input row into multiple outputs based on input conditions
        1. Getting ready
        2. How to do it...
        3. How it works…
        4. There's more…
      8. Joining data using tMap
        1. Getting ready
        2. How to do it...
        3. How it works…
        4. There's more…
        5. See Also
      9. Hierarchical joins using tMap
        1. Getting ready
        2. How to do it...
        3. How it works…
      10. Using reload at each row to process real-time / near real-time data
        1. Getting ready
        2. How to do it...
        3. How it works…
          1. Loading the data into memory
          2. The globalMap key
          3. The WHERE clause
          4. The result
        4. There's more…
    12. 5. Using Java in Talend
      1. Introduction
      2. Performing one-off pieces of logic using tJava
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. See also
      3. Setting the context and globalMap variables using tJava
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
        5. See also
      4. Adding complex logic into a flow using tJavaRow
        1. Getting ready
        2. How to do it…
        3. How it works…
      5. Creating pseudo components using tJavaFlex
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
      6. Creating custom functions using code routines
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
        5. See also
      7. Importing JAR files to allow use of external Java classes
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
    13. 6. Managing Context Variables
      1. Introduction
        1. Transportable code
        2. Context variables
        3. Common values in contexts
        4. Passing command line parameters
        5. Setting context variables in the code
        6. Database context variables
      2. Creating a context group
        1. How to do it...
        2. How it works...
        3. There’s more…
          1. Context types
          2. Prompt for variable values using the tree mode
      3. Adding a context group to your job
        1. Getting ready
        2. How to do it...
        3. How it works…
        4. There’s more…
      4. Adding contexts to a context group
        1. Getting ready
        2. How to do it...
        3. There’s more…
      5. Using tContextLoad to load contexts
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There’s more…
          1. Print operations
          2. Warnings
          3. Context file location
      6. Using implicit context loading to load contexts
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There’s more…
      7. Turning implicit context loading on and off in a job
        1. Getting ready
        2. How to do it...
        3. How it works...
      8. Setting the context file location in the operating system
        1. Getting ready
        2. How to do it...
        3. How it works…
        4. There’s more…
          1. Variable not present
          2. Implicit context load
    14. 7. Working with Databases
      1. Introduction
      2. Setting up a database connection
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
          1. Using the connection
        5. Always create database connections
          1. Connection names
          2. Context
      3. Importing the table schemas
        1. Getting ready
        2. How to do it…
        3. How it works...
        4. There's more…
      4. Reading from database tables
        1. Getting ready
        2. How to do it…
          1. Selected rows and columns
          2. Multiple tables and complex queries
        3. How it works…
        4. There's more…
          1. Efficiency versus readability
          2. SQL string
          3. SQL style
      5. Using context and globalMap variables in SQL queries
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
          1. The globalMap variables
          2. Developing the query
          3. Reloading at each row
      6. Printing your input query
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
      7. Writing to a database table
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
          1. Creating tables
          2. Update and delete keys
          3. Batches
          4. Bulk loading
          5. Bulk loading to temp table
      8. Printing your output query
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
      9. Managing database sessions
        1. Getting ready
        2. How to do it…
        3. How it works…
          1. Executions
        4. There's more…
          1. Multiple outputs
          2. Don't forget the commit
          3. Committing but not closing
      10. Passing a session to a child job
        1. Getting ready
        2. How to do it…
        3. How it works…
      11. Selecting different fields and keys for insert, update, and delete
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more...
          1. Updating
          2. Deleting
      12. Capturing individual rejects and errors
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
          1. Die on error
          2. Efficiency
          3. Error management
      13. Database and table management
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
      14. Managing surrogate keys for parent and child tables
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more...
          1. Added efficiency using hashMap key table
          2. Ranges
          3. Sequences
          4. Auto increment keys
          5. The LastInsertId component
          6. Auto increment procedure
      15. Rewritable lookups using an in-process database
        1. Background
        2. Getting ready
        3. How to do it…
        4. How it works…
          1. In-memory components
          2. Initialize the data
          3. tMap
          4. Write back
        5. There's more…
          1. Memory
        6. See also
    15. 8. Managing Files
      1. Introduction
      2. Appending records to a file
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
          1. Concatenating files using the append method
      3. Reading rows using a regular expression
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      4. Using temporary files
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
        5. See also
      5. Storing intermediate data in the memory using tHashMap
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      6. Reading headers and trailers using tMap
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      7. Reading headers and trailers with no identifiers
        1. Getting ready
        2. How to do it...
        3. How it works...
      8. Using the information in the header and trailer
        1. Getting ready
        2. How to do it...
          1. Validation subjob
          2. Use the header information subjob
        3. How it works...
          1. Validating using the trailer information
          2. Using the header information in the detail
        4. There's more…
      9. Adding a header and trailer to a file
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
        5. See also
      10. Moving, copying, renaming, and deleting files and folders
        1. Getting ready
        2. How to do it...
          1. Copying a file to another directory
          2. Copying file to a different name
          3. Renaming a file
          4. Moving a file
          5. Deleting a file
        3. How it works...
        4. There's more…
      11. Capturing file information
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      12. Processing multiple files at once
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      13. Processing control/validation files
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      14. Creating and writing files depending on the input data
        1. Getting ready
        2. How to do it...
        3. How it works...
          1. tJavaRow code explained
        4. There's more…
    16. 9. Working with XML, Queues, and Web Services
      1. Introduction
      2. Using tXMLMap to read XML
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
          1. Document objects
          2. XML Structure
      3. Using tXMLMap to create an XML document
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      4. Reading complex hierarchical XML
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
          1. Managing the relationships
          2. File information
          3. XML to database mapping
          4. XPATH
          5. Web service XML
      5. Writing complex XML
        1. Understanding the XML structure
        2. Node
        3. Method
        4. Java DOM
        5. Getting ready
        6. How to do it...
        7. How it works...
          1. So here we go…
          2. tWriteXMLField
          3. Code utilities
          4. tFlowToIterate
          5. tHash components
          6. XPATH Condition
          7. Putting it all together
        8. There's more…
          1. Job "shape"
      6. Calling a SOAP web service
        1. Getting ready...
        2. How to do it...
        3. How it works...
        4. There’s more…
          1. Decoding the response
          2. Using web service calls in-flow
      7. Calling a RESTful web service
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      8. Reading and writing to a queue
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      9. Ensuring lossless queues using sessions
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
    17. 10. Debugging, Logging, and Testing
      1. Introduction
        1. Debugging
        2. Logging
        3. Testing
      2. Find the location of compilation errors using the Problems tab
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      3. Locating execution errors from the console output
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
        5. See also
      4. Using the Talend debug mode – row-by-row execution
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
        5. See also
      5. Using the Java debugger to debug Talend jobs
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      6. Using tLogRow to show data in a row
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      7. Using tJavaRow to display row information
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      8. Using tJava to display status messages and variables
        1. Getting ready
        2. How to do it...
        3. How it works...
      9. Printing out the context
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      10. Dumping the console output to a file from within a job
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      11. Creating simple test data using tRowGenerator
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      12. Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      13. Creating random test data using lookups
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      14. Creating test data using Excel
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      15. Testing logic – the most-used pattern
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more…
      16. Killing a job from within tJavaRow
        1. Getting ready
        2. How to do it...
        3. How it works...
    18. 11. Deploying and Scheduling Talend Code
      1. Introduction
        1. Context Variables
        2. Executable code
        3. Managing job dependencies within Talend
      2. Creating compiled executables
        1. How to do it...
        2. How it works…
      3. Using a different context
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
      4. Adding command-line context parameters
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
      5. Managing job dependencies
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
          1. Die on error
          2. Adding error checks to the schedule
          3. Restartability
      6. Capturing and acting on different return codes
        1. Getting ready
        2. How to do it…
          1. How it works…
          2. There's more…
      7. Returning codes from a child job without tDie
        1. Getting ready
        2. How to do it…
          1. How it works…
          2. There's more…
      8. Passing parameters to a child job
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more
      9. Executing non-Talend objects and operating system commands
        1. Getting ready
        2. How to do it…
        3. How it works…
        4. There's more…
    19. 12. Common Mistakes and Other Useful Hints and Tips
      1. Introduction
      2. My tab is missing
        1. How to do it…
          1. Show view:
          2. Reset the perspective
      3. Finding the code routine
        1. How to do it…
      4. Finding a new context variable
        1. How to do it…
      5. Reloads going missing at each row global variable
        1. How to do it...
      6. Dragging component globalMap variables
      7. Some complex date formats
      8. Capturing tMap rejects
      9. Adding job name, project name, and other job specific information
      10. Printing tMap variables
      11. Stopping memory errors in Talend
        1. Increasing the memory allocated to a job
        2. Reducing lookup data
        3. Using hashMap/in-memory tables
        4. Splitting the job
        5. Dropping data to disk
        6. Split the files
        7. Hardware solutions
    20. A. Common Type Conversions
    21. B. Management of Contexts
      1. Introduction
      2. Manipulating contexts in Talend Open Studio
      3. Understanding implicit context loading
      4. Understanding tContextLoad
      5. Manually checking and setting contexts
    22. Index

Product information

  • Title: Talend Open Studio Cookbook
  • Author(s): Rick Barton
  • Release date: October 2013
  • Publisher(s): Packt Publishing
  • ISBN: 9781782167266