Talend Online Training
Introduction On Talend DI for Big Data:
- About Talend Corporation and Their Journey
- Products under Talend Platform?
- What is Talend?
- Advantages of using Talend over other competitor integration tools?
- Why Talend is getting popular in the current trend?
- Talend Installation System Requirements?
- Types of repository connections to connect Talend Studio?
- Use of workspace, Project?
- What is Big data!!! List of software platforms come under Big data?
- What is Hadoop and How it is different from traditional technologies?
- What are the advantages with using Hadoop? In cost and Architectural feasibility prospective.
- High level Hadoop cluster architecture and physical core components
- Hadoop eco system components.
- what are the challenges in Implementing a Big data project with conventional Hadoop framework?
- Pros and Cons in using Talend BD DI compared to conventional Hadoop eco system components?
- Talend Architecture and its components.
- Demo on Talend sample job design and execution.
Talend GUI and Internal Tools
- Main window
- Menu bar and tools
- Repository tree view
- Design Workspace
- Configuration tabs
- Outline and code summary panels
- window — show view, preferences
Brief explanation on
- working with Projects – Create, open, import, delete, export project
- Job: Create job, Add desired components to job
- Types of component connection links
- Row connection: Main, Reject, Unique, Duplicate, Iterate connection
- Trigger connection: on subjob ok, on component ok, onsubjob error, on component error, run if
- How to change label format for components and component connections
- Component connection indicators
- How do I determine Job starting point?
Centralize Metadata and Schemas
- Database connection
- Flat file, Excel file, XML file
- Hadoop cluster
- Schema types and difference between the schemas.
- Roll of Die on error
- Enable & Disable reject flows
- Capture rejected data prior to job failure
- Input data validation against the schema object
- Lab practical
Pre-requisites to design and execute a Talend job
- How to determine and fix Talend job errors with the help of problems tab.
- Major and commonly using components
- Logs & Errors
- Lab practical with combination of above components
Essential processing components:
- Comparison between tJoin and tMap components
- Basic mapping
- Expressions in tMap
- Conditional logic with ternary operator
- Variables, Filters usage in tMap expressions
- Row split into multiple routes
- Joins in tMap
- Reload at each row lookup
- Reject data handling in tMap
- Testing expressions
- Built in Functions
- Lab practical with tMap
More Practical on:
- File – Multi structure, Regex
- Orchestration — tFlowtoIterate, tLoop
- XML readers/writers — tXMLMap
- what is globalMap variables and how to use globalMap variables
- Context group creation
- Add a context group to job
- Add contexts to context group
- tContextLoad, Implicit Load context from a file, tContextDump
- Context file location assignment with operating system environment variables
- Talend Job debugging
Custom Java in Talend
- Conditional logic implementation with tjava & tJavaRow
- Set context and globalMap variable values with tJava
- Code routines
- How to use external java classes
- Difference between tJavaRow and tJavaFlex
Talend with Database reader and writers: (S3)
- Read from database tables
- How to use context,glomapMap variables in sql override
- Print sql override query in output log
- Write to database table
- Database connection session management, Shared database connection
- Column selection for Update, Insert operation
- Rejects and error management – Bulk load
Logging and Testing
- Log console output to an operating system file
- Custom job killing using system.exit(<custom return code>)
- Code deployment & execution
- Compiled executables – JAR files
- Select desired context group from context group list
- Command line context parameters
- Job dependency management
- Return codes from child job without Die
- Parent & child job management
- Miscellaneous components — FixedFlowInput, tRowgenerator, tMemorizerows, tBufferInput, tBufferOutput
- CDC implementation in Talend
- SCD2 implementation in Talend
- Incremental Loading
- Unit testing
- Difference between Talend open studio and Talend Enterprise edition
- Jobs execution in parallel
- tParallelize vs Multi thread execution
Theory on Enterprise edition features
- Remote Repository connections
- Sandbox Project
- @Reference project
- SVN branches
- Lock Types – Checkin, Checkout
- Talend Administration center
- Talend Activity monitor console
- Talend SDLC – Job deployment process
- Job publishing into Artifact repository — From studio or command
- Difference between Job Server & Runtime server
- Talend products related to DI Prospective:
- Talend open studio – Data Integration edition, Bigdata edition
- Talend Subscription Solutions – DI, BD – Bigdata, Bigdata platform, Real-time Bigdata platform