Talend Online Training

Talend Online Training

Introduction On Talend DI for Big Data:

  • About Talend Corporation and Their Journey
  • Products under Talend Platform?
  • What is Talend?
  • Advantages of using Talend over other competitor integration tools?
  • Why Talend is getting popular in the current trend?
  • Talend Installation System Requirements?
  • Types of repository connections to connect Talend Studio?
  • Use of workspace, Project?
  • What is Big data!!! List of software platforms come under Big data?
  • What is Hadoop and How it is different from traditional technologies?
  • What are the advantages with using Hadoop? In cost and Architectural feasibility prospective.
  • High level Hadoop cluster architecture and physical core components
  • Hadoop eco system components.
  • what are the challenges in Implementing a Big data project with conventional Hadoop framework?
  • Pros and Cons in using Talend BD DI compared to conventional Hadoop eco system components?
  • Talend Architecture and its components.
  • Demo on Talend sample job design and execution.

Talend GUI and Internal Tools

  • Main window
  • Menu bar and tools
  • Repository tree view
  • Design Workspace
  • Palette
  • Configuration tabs
  • Outline and code summary panels
  • window — show view, preferences

Brief explanation on

  • working with Projects – Create, open, import, delete, export project
  • Job: Create job, Add desired components to job
  • Types of component connection links
  • Row connection: Main, Reject, Unique, Duplicate, Iterate connection
  • Trigger connection: on subjob ok, on component ok, onsubjob error, on component error, run if
  • How to change label format for components and component connections
  • Component connection indicators
  • How do I determine Job starting point?

Centralize Metadata and Schemas

  • Database connection
  • Flat file, Excel file, XML file
  • Hadoop cluster
  • FTP
  • Schema types and difference between the schemas.

Data Validation:

  • Roll of Die on error
  • Enable & Disable reject flows
  • Capture rejected data prior to job failure
  • Input data validation against the schema object
  • Lab practical

Pre-requisites to design and execute a Talend job

  • How to determine and fix Talend job errors with the help of problems tab.
  • Major and commonly using components
  • File
  • Database
  • Logs & Errors
  • Orchestration
  • System
  • Lab practical with combination of above components

Essential processing components:

  • tConvertType
  • tFilterRow
  • tSortRow
  • tJoin
  • tMap
  • tAggregateRow
  • Comparison between tJoin and tMap components

Data Mapping:

  • Basic mapping
  • Expressions in tMap
  • Conditional logic with ternary operator
  • Variables, Filters usage in tMap expressions
  • Row split into multiple routes
  • Joins in tMap
  • Reload at each row lookup
  • Reject data handling in tMap
  • Testing expressions
  • Built in Functions
  • Lab practical with tMap

More Practical on:

  • File – Multi structure, Regex
  • Orchestration — tFlowtoIterate, tLoop
  • XML readers/writers — tXMLMap

Context Variables:

  • what is globalMap variables and how to use globalMap variables
  • Context group creation
  • Add a context group to job
  • Add contexts to context group
  • tContextLoad, Implicit Load context from a file, tContextDump
  • Context file location assignment with operating system environment variables
  • Talend Job debugging

Custom Java in Talend

  • Conditional logic implementation with tjava & tJavaRow
  • Set context and globalMap variable values with tJava
  • Code routines
  • How to use external java classes
  • Difference between tJavaRow and tJavaFlex

Talend with Database reader and writers: (S3)

  • Read from database tables
  • How to use context,glomapMap variables in sql override
  • Print sql override query in output log
  • Write to database table
  • Database connection session management, Shared database connection
  • Column selection for Update, Insert operation
  • Rejects and error management – Bulk load

Logging and Testing

  • Log console output to an operating system file
  • Custom job killing using system.exit(<custom return code>)
  • Code deployment & execution
  • Compiled executables – JAR files
  • Select desired context group from context group list
  • Command line context parameters
  • Job dependency management
  • Return codes from child job without Die
  • Parent & child job management

Miscellaneous

  • Miscellaneous components — FixedFlowInput, tRowgenerator, tMemorizerows, tBufferInput, tBufferOutput
  • CDC implementation in Talend
  • SCD2 implementation in Talend
  • Incremental Loading
  • Unit testing
  • Joblets
  • Difference between Talend open studio and Talend Enterprise edition
  • Jobs execution in parallel
  • tParallelize vs Multi thread execution

Theory on Enterprise edition features

  • Remote Repository connections
  • Sandbox Project
  • @Reference project
  • SVN branches
  • Lock Types – Checkin, Checkout
  • Talend Administration center
  • Talend Activity monitor console
  • Talend SDLC – Job deployment process
  • Job publishing into Artifact repository — From studio or command
  • Difference between Job Server & Runtime server
  • Talend products related to DI Prospective:
  • Talend open studio – Data Integration edition, Bigdata edition
  • Talend Subscription Solutions – DI, BD – Bigdata, Bigdata platform, Real-time Bigdata platform