Understanding DirectTransfer
PeopleSoft Search Framework makes use of the DirectTransfer technology to transfer data from a PeopleSoft application database to Elasticsearch. It has two options for transferring search data (search data means search documents with or without attachments):
- Using Integration Gateway
- Bypassing Integration Gateway.
When Integration Broker is not involved in the transfer of search data, it is known as Full Direct Transfer. Full Direct Transfer is also the default option in PeopleSoft Search Framewor and it became available from PeopleTools patch releases 8.55.20, 8.56.05 and 8.57.00 respectively.
If you choose not to use the Full Direct Transfer option, then search documents without attachments will be transferred using the Integration Gateway but, the search documents with attachments will be transmitted directly to the Elasticsearch search engine.
DirectTransfer technology supports search definitions that are based on Connected Query and Query.
DirectTransfer Configurations:
DirectTransfer requires the following configurations:
1. Specify whether or not you want to use the Full Direct Transfer option (default is Yes).
2. Specify the number of attachment handlers.
3. Specify the maximum attachment error count.
All the above configurations are set via Search Options. To navigate there, go to PeopleTools -> Search Framework -> Administration -> Search Options
1. If you want to disable Full DirectTransfer, change it to N. Irrespective of whether you enable or disable FullDirect Transfer, firewall rules should allow communication between the process scheduler and integration broker.
2. Default value of attachment handlers is 20, meaning that during indexing a maximum of 20 handlers are created. This is an optimal value when your average attachment size is 100 KB and the bulk thread queue size on Elasticsearch is 50 (default value).
If you have large attachments (for example, 10 MB or more), the value of Attachment Handlers should be reduced to a lower value, say 10 or even below.
The optimal value of Attachment Handlers depends on the data volume, pattern and also system considerations on PeopleSoft and Elasticsearch.
3. Max Attachment Error Count setting is used to specify the maximum error transactions permitted during indexing. If during indexing, the number of errors exceeds the specified value, the indexing process will exit after completing the process of sending the data which is already available in memory.
Memory Usage on PeopleSoft
The amount of memory used by DirectTransfer for runtime data storage is dependent on the segment size and number of attachment handlers. For example, for the default segment size of 10MB and 20 attachment handlers, DirectTransfer would utilize an additional 200 MB on the PeopleSoft Batch Server during data send.
Memory Usage on Elasticsearch
When DirectTransfer sends data to Elasticsearch, it makes use of the memory configured using the ES_HEAP_SIZE environment variable, which you set during the installation of Elasticsearch. So, it is one parameter, you may need to optimize for your environment.
The incoming data is stored in bulk thread queues on Elasticsearch during the ingestion process (ingestion is the pre-processing of documents before the actual document indexing). The amount of memory used for this purpose is based on the bulk thread queue size.
Number of system cores decide the amount of parallel ingestions that can occur. The number of parallel bulk ingestions is equal to the number of cores.
During ingestion, for documents containing attachments, Elasticsearch uses a large amount of memory for document parsing. An example is parse a large attachment of 10 MB size, the memory required for parsing it is around 100 MB.
Examples:
- To index a search definition with average attachment size of 100 KB and Elasticsearch server is a 2 core system and memory available for Elasticsearch JVM is >= 8 GB, then an Attachment Handler value of 20 should suit in most circumstances.
- To index a search definition with average attachment size of 1 MB and Elasticsearch server is a 4 core system and memory available for Elasticsearch JVM is >= 16 GB, then an Attachment Handler value of 10 should suit in most circumstances.
- To index a search definition with average attachment size of 100 MB and Elasticsearch server is a 4 core system and memory available for Elasticsearch JVM is >= 16 GB, then an Attachment Handler value of 5 should suit in most circumstances. The http.max_content_length setting on the elasticsearch.yml configuration file should also be increased in this scenario. The default value is 100 MB, but for large attachments you may set the value to a higher value, for example, http.max_content_length=512mb
Please comment below if you would like to add anything or you need any clarification.
Hi Apurva,
Whats the security need to reach to the PeopleTools -> Search Framework -> Administration -> Search Options page? currently i am not authorized to this page.
regards
Som