It always seems impossible until it is done. Nelson Mandela
Idea Transcript
Flume The Flume destination writes data to a Flume source. When you write data to Flume, you pass data to a Flume client. The Flume client passes data to hosts based on client configuration properties. When you configure a Flume destination, you can configure one or more hosts to use, connection information, the client type, and the data format to use. You can also configure some client properties, such as the Flume batch size or connection timeout. The Flume destination can write data to the following client types: Apache Avro Failover RPC Client Apache Avro LoadBalancing RPC Client Apache Thrift
Data Formats The Flume destination writes data to a Flume source based on the data format that you select. You can use the following data formats: Avro The destination writes records based on the Avro schema. You can use one of the following methods to specify the location of the Avro schema definition: In Pipeline Configuration - Use the schema that you provide in the stage configuration. In Record Header - Use the schema included in the avroSchema record header attribute. Confluent Schema Registry - Retrieve the schema from Confluent Schema Registry. The Confluent Schema Registry is a distributed storage layer for Avro schemas. You can configure the destination to look up the schema in the Confluent Schema Registry by the schema ID or subject. If using the Avro schema in the stage or in the record header attribute, you can optionally configure the destination to register the Avro schema with the Confluent Schema Registry. You can also optionally include the schema definition as part of the Flume event. Omitting the schema definition can improve performance, but requires the appropriate schema management to avoid losing track of the schema associated with the data. You can also compress data with an Avro-supported compression codec. When using Avro compression, do not use other compression