Blog Post

Running SAP Applications on the Microsoft Platform
6 MIN READ

Memory errors during data extraction from SAP using Azure Data Factory SAP Table connector

Sachin-Ghorpade's avatar
Oct 02, 2020

Memory errors during data extraction from SAP using Azure Data Factory SAP Table connector

 

Azure Data Factory (ADF) is a fully managed data integration service ​for cloud-scale analytics in Azure.  ADF provides more than 90 out of the box connectors to integrate with your source and target system. When we think about enterprise systems, SAP play a major role.

 

ADF has six different connectors to integrate with your SAP systems. Here are the connectors for ADF:

  1. SAP BW Open Hub​
  2. SAP BW MDX​
  3. SAP HANA​
  4. SAP Table​
  5. SAP C4C​ (Cloud for Customer)
  6. SAP ECC​

You can read more about these connectors, when to use which one, support scenarios and more details here: https://github.com/Azure/Azure-DataFactory/blob/master/whitepaper/SAP%20Data%20Integration%20using%20Azure%20Data%20Factory.pdf.

 

In this bog, we focus on memory error during data extraction from SAP using Azure Data Factory SAP Table connector. The SAP table depends on three layers of implementation:

  1. The SAP RFC Function module (/SAPDS/RFC_READ_TABLE2 or custom one)
  2. The SAP Driver – SAP Nco
  3. ADF SAP Table Runtime – Compile the partition queries/Trigger the SAP connection/calls/Wash the data/put the data into the downstream ADF runtime/sinks

 

In the recent customer engagement, we learned that while creating ADF Pipeline and transferring data, the following error occurs on a specific table.

 

 Operation on target Copy_z0z failed: 'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed to invoke function /SAPDS/RFC_READ_TABLE2 with error: SAP.Middleware.Connector.RfcAbapRuntimeException, message: No more memory available to add rows to an internal table., stacktrack: at SAP.Middleware.Connector.RfcConnection.ThrowRfcErrorMsg() at SAP.Middleware.Connector.RfcConnection.ReadBytes(Byte* buffer, Int32 count) at SAP.Middleware.Connector.RfcConnection.ReadRfcIDBegin(Int32& length) at SAP.Middleware.Connector.RfcConnection.ReadUpTo(RFCGET readState, RfcFunction function, RFCID toRid) at SAP.Middleware.Connector.RfcConnection.RfcReceive(RfcFunction function) at SAP.Middleware.Connector.RfcFunction.RfcDeserialize(RfcConnection conn, IRfcIOStream stream) at SAP.Middleware.Connector.RfcFunction.RfcCallReceive(RfcConnection conn, IRfcIOStream stream, RFCID rid) at SAP.Middleware.Connector.RfcFunction.Invoke(RfcDestination destination) at Microsoft.DataTransfer.Runtime.SapRfcHelper.SapRfcClient.InvokeFunctionInternal(IRfcFunction function, Boolean initStatefulSession, Boolean checkForErrors) at Microsoft.Practices.TransientFaultHandling.RetryPolicy.<>c__DisplayClass1.<ExecuteAction>b__0() at Microsoft.Practices.TransientFaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func) at Microsoft.DataTransfer.Runtime.SapRfcHelper.SapRfcClient.InvokeFunction(IRfcFunction function, Boolean initStatefulSession, Boolean checkForErrors).,Source=Microsoft.DataTransfer.Runtime.SapRfcHelper,''Type=SAP.Middleware.Connector.RfcAbapRuntimeException,Message=No more memory available to add rows to an internal table.,Source=sapnco,'

 

This message also generated the short dump (ST22) in the SAP system. If you run a transaction ST22 in SAP application, you can see below error indicates SAP memory allocation to user is not enough.

 

 

This obviously means, you are extracting a way too much data that your SAP application with current memory configuration can handle. It does not mean, you server or VM running SAP is not capable, it just that you have restricted SAP application instance to restrict the memory.  Now, logically there are a few possible solutions:

 

  1. Increase the memory parameters to high value in your application instance so it can handle more load.
    1. Use ABAP program RSMEMORY to adjust memory dynamically in the SAP application server. Temporary fix and reboot of app servers is not necessary.
    2. This can be done by adjusting the PHYS_MEMSIZE parameter to high value. You set this parameter in application instance profile, and it requires an SAP application instance where the parameter is configured, to be restarted (not a VM reboot)
  2. Reduce the data extraction size.
    1. This may be possible when you are doing incremental or delta extract. However, if you are doing a full pull it may not be possible.
  3. Reduce the data at source itself!
    1. This is sometimes possible by archiving or deleting the data from the source system itself. Be careful and ensure your decision of archiving/deleting before you do it.

Note: When you extract for the first time, you may have tons of data to load and need a lot of memory, and later with incremental load, you may reduce that memory footprint.

With the above recommendations, you did the first aid, and it may already solve your problem. However, you can further tune it for optimal performance.

 

Let’s take a look.

 

Sometimes, this can be a code issue, which can be an  SAP standard code, or a Customer Code (aka Z programs)?

 

This usually results from reading in a dataset into an Internal Table, and trying to process it. This works for smaller source data. But if you try to do this and read from a database table that  has grown beyond the size that a single work process is allowed to allocate, this error will result. The program needs to be able to handle the data in smaller chunks, Otherwise you will end up in a situation where you will progressively need to increase these parameters and will lead to poor performance over time. This is where you work with basis and dev team to run traces to look memory usage, runtime etc.  One of the useful tools to assess SAP ABAP program is ABAP Runtime Analysis (transaction SE30).

For performance issues and ABAP runtime terminations due to data size, tracing the load using ST12 is a good starting point. It will give you additional insights on ABAP code, reading type, Programs and function module involved.  Additionally, it is important to do a Table Analysis to identify which field can be used for range calculation, transaction DB05 can be used to analyze the distinct values of a table.

 

For huge SAP tables, the analysis should be started in a background task. The result is stored in the spool of background job TABLE_ANALYSIS_<table name>.

 

 

Recommendation

Here are some recommendations to consider while using SAP table connectors with ADF:

 

  1. Use the latest SHIR: Changing protocol from RFC to BASXML in runtime (which should be part of recent runtime anyway) not only helped to handle the bigger size data, but performance was improved significantly.
  2. Optimize the SAP application instance memory. Sometimes this could take several iterations to get the optimal memory value.
  3. Review the program code. For SAP program causing performance issues, often you have SAP OSS note to fix it. For customer codes (aka Z programs), dev team can trace and optimize it. Avoid using “Select *” in your extraction.
  4. Trigger extraction using a batch job. Often, dialog process has session expiration limit and cause process to fail.
  5. When planning Data extraction or replication scenarios, review the following:
    1. Table size
    2. Change rate.
    3. Memory and CPU requirements on the source, target and middleware layer

The classification of tables according to their size (Small, Medium and Large) and change rate derives proper sizing of the of all components involved. 

 

  1. If the volume is quite large and there is column that can be used for partitioning, it is recommended to leverage the partitioning setting of the SAP Table source which will bring three main benefits:
    1. It can alleviate the memory consumption on the SAP instance
    2. It can help alleviate the memory burden on the SHIR machine as well
    3. Besides, it will be best for your performance since we offer the capabilities to retrieve data in parallel. The Unit of each processing is the partition you defined using the partition settings. For example, there is a table with 10 million rows, if you can have some partition settings to split it into 100 partitions which means each partition will have around 100K rows.  With the parallelCopies set to 10(once partition Option is set with some NONE value, the default is 4. We are also enhancing the auto-Tuning logic to also increase based on the IR nodes number) then there will be 10 partitions processed at the same time. Suppose each thread has 3 M/s throughput, in ideal case, we can get 3 M/s * 10 = 30 M/s throughput. 
    4. Currently we support 4 types of partitions: Int, Year(yyyy), CalendarMonth(yyyyMM) and CalendarDate(yyyyMMdd)

 

Thank you Damien Rodriguez, Wenbo Fu, Roman Broich, Morgan Deegan, Hitoshi Ikemoto, and Chan Uk Jin for your contribution to this blog.

Updated Feb 09, 2021
Version 2.0
  • Lawrence Duff's avatar
    Lawrence Duff
    Brass Contributor

    Hi Maria,

     

    This was a while ago and I seem to remember that no solution other than the "RFC_READ_TABLE2" endpoint in SAP was forthcoming from Microsoft.  We ended up staying with BODS (Business Objects Data Services) for the Extract of our ETL, then ADF and SSIS for the downstream TL (Transform and Load).

    I've just dug out the email trail I had with my colleagues which might have some pointers for you.  There are competing connectors in open source that don't use the slow "RFC_READ_TABLE2" endpoint of the .Net Connector 3.0 that SAP wrote for Microsoft to use in Microsoft's SHIR - see the last few comments by me "LLL".  And in the earlier comments are our thoughts and research including third party ETL tools comparisons with performance results and comments about the internal "engines" used in those products.

    Hopefully you might find this useful!  (I've redacts all names, sensitive info and images) ...

    *******

     

    There’s one for node as well, and this one explicitly says it supports call-backs! …

    https://github.com/SAP/node-rfc

    Jeez, like buses, you wait for ages and then two come along at the same time!

    And why TF does everyone else’s stacks work with the SAP NW SDK except Microsoft and the .Net stack?  Maybe because SAP wrote the NCo 3.0 and deliberately left out the functionality for big tables handling (callbacks, batches needed for streaming type functions) to hobble Microsoft’s ADF connector and protect their BODS / SAP business, as PPP suggested?

    Oh wait, that’s made me think of something … oh blimey, yes, there’s a competing .Net one in Github too! …

    https://github.com/dbosoft/YaNco

    THREE buses!

    LLL

    ***

    DDD,

    By George and all the Saints – you beauty, you!

    We might be in luck …

    https://www.gitmemory.com/issue/SAP/PyRFC/60/502967179

    Looks like that guy is using call-backs with the Pyrfc kit, albeit he’s struggling with getting the config of the batches etc. required to drive the throughput.

    I thought this was a dead dog, but I’ll look into it when I get a moment,

    Cheers,

    LLL

    ***

    Hi LLL,

    As a suggestion, can we try the to Python (PyRFC) or you think we will be hit by the NW RFC issue with it as well ?

    A few documentations:

    https://blogs.sap.com/2020/06/09/connecting-python-with-sap-step-by-step-guide/

    http://wbarczynski.pl/calling-bapis-with-python-and-pyrfc/

    https://blogs.sap.com/2016/02/21/how-to-use-actual-sap-netweaver-rfc-library-with-python-call-abap-report/ ( called an ABAP report using Python)

    https://stackoverflow.com/questions/24172130/open-sql-condition-in-rfc-read-table-call-via-pyrfc ( something you have done )

    https://community.cloudera.com/t5/Support-Questions/Efficient-way-to-call-RFC-ABAP/td-p/284553

    https://github.com/SAP/PyRFC

    Regards,

    DDD

    ***

    Hi All,

    I’ve used the Visual Studio on xxxx server to develop a tester to access CDPOS.

    It seems the limitation in the ADF SAP tables connector that binds it only to the slower /SAPDS/RFC_READ_TABLE2 function but not the faster streaming /SAPDS/RFC_STREAM_READ_TABLE function is in the underlying SAP .Net Connector 3.0 (NCo) framework on which the ADF connector is based.  The NCo framework doesn’t implement the call-back function functionality required for batching the results.  This functionality is the essence of how the /SAPDS/RFC_STREAM_READ_TABLE - and similar SAP functions that use call-backs – works.  As I hit the issue I searched and found these – see the comments from Markus Tolksdorf of SAP “NCo 3.0 is not supporting callbacks from ABAP to .NET.” in both …

    https://answers.sap.com/questions/10098425/callback-function-with-nco-30.html

    https://answers.sap.com/questions/11509014/callback-not-supported-error-when-invoking-a-funct.html

    So while my tester app accesses SAP and uses the slower functions like /SAPDS/RFC_READ_TABLE2 OK, it cannot access the faster functions that use call-backs, like the streaming /SAPDS/RFC_STREAM_READ_TABLE function.

    So, beneath the NCo is a lower level framework called the “SAP NetWeaver Remote Function Call (RFC) Software Development Kit (SDK)” out of which the NCo itself is likely built …

    https://support.sap.com/en/product/connectors/nwrfcsdk.html

    I suspect vendors like Aecorsoft have used this framework to build their tool that is faster than BODS – indeed BODS itself is likely built with this low level framework too.

    I know C but not C++, so building a solution in that framework will take me a long time given it’s C/C++.  I think Aecorsoft is our only solution to get a direct connection working end-to-end from “SAP to the App”.

    For input into our next meeting,

    Cheers,

    LLL

    ***

    Hi All,

    In continuation to our testing of different ETL tools for the purpose of data extraction from SAP, we have now installed and tested Microsoft’s SSIS ETL tool with help of a third-party connector. The SAP ERP connector is provided by a company call CData and is a trial version valid for 30 days. The reason behind using this connector is because its 30 days trial version was freely available without any additional formalities(probably we can use any other connector like Aecorosoft as suggested by LLL). The choice of selecting a feasible third party connector for this purpose has been left for later as our current PoC was more centered around the idea of making SSIS work with SAP rather than making a particular connector work. Please find the link for SAP ERP CData connector used for this exercise below.

    https://www.cdata.com/kb/tech/sap-ssis-task-import-2008.rst

     

    Now coming back to the performance of SSIS in comparison to BODS. We performed data extraction of SAP CDPOS table form QAS 220(which had approximately 40 million records in QAS) in both SSIS and BODS. Below are the results.

    1. RFC function usage and type of processing
    • BODS – As mentioned in our previous email, BODS uses an RFC stream function which process data in batches.
    • SSIS – SSIS along with CData connector, currently makes use of RFC_READ_TABLE2 function which also processes data in batches with continuously releasing memory upon completion of each batch unlike Azure which never releases memory throughout the process as it tries to read data in one single read instead of batch. So, this function overcomes the issue of memory dump error which was encountered whilst extracting data using Azure. One point to be noted here is that, the Cdata connector also provides the option for specifying any RFC function as shown below, however, the function needs to be customized in order to accept parameters from the connector. We attempted to use the RFC stream function used by BODS but got an error where the function was expecting a value for row count but was not being sent by the connector. 

    Result – SSIS with CData connector processes data in batches similar to what BODS does overcoming the memory dump issue that was encountered in Azure.

     

    1. Execution time
    • BODS – The processing time in BODS to extract approx. 40 million records from SAP CDPOS table in QAS was just 2 minutes and 35 seconds as shown below.

    <<image removed>>

     

    The reason behind a faster execution time is because BODS uses a DOP(Degree of parallelism) concept called round robin which splits the incoming stream into multiple streams to   enable parallel processing of data simultaneously as shown below.

    <<image removed>>

     

    • SSIS – The processing time in SSIS to extract approx. 40 million records from SAP CDPOS table in QAS was 20 minutes and 53 seconds as shown below.

    <<image removed>>

     

    Result – It is clear by the results that the processing in BODS is way to faster than SSIS with CData connector. Note that although, the processing time can vary based on how the data is fetched by the third party connector, at this point it is not clear if any other process/transformation in SSIS can have a better performance than BODS.

    Please share your suggestions and let’s discuss this in our next review call. Thank you.

    Regards,

    PPP

    ***

    Hi All,

    Happy new year to you and hope you had a great time off.

    SSS & I have compared and tested the SAP data extraction mechanism in various ETL tools that are available or have been suggested to us. As part of this exercise, we have attempted to compare four ETL tools: BODS, Azure, Talend & SSIS. Out of these four, we were able to thoroughly replicate the extraction process for BODS and Azure but ran into issues at the very initial step for the other two. The testing for these two will commence at a later stage as there are certain dependencies(mentioned below) that needs to be addressed before moving further.

    Below are our findings:

    1. BODS – When extracting data from SAP, BODS calls an RFC enabled function called “RFC_STREAM_DATA_TABLE”. This function processes data in batches and has an option to control the batch size by specifying the number of records to be fetched in each batch. This function is customizable, and the parameters can be controlled from within SAP application server. Due to this method of extraction, BODS can pull large data sets as the memory never goes out of bound because the function executes data in batches enabling it to consume less memory in the SAP system, thereby improving the performance. Currently a Default batch size of 5000 is configured for one RFC session per table as shown below.

     

    1. Azure – Azure SAP table extract method makes use of an RFC enabled function called “RFC_READ_TABLE2” as shown below. This function tries to read table in one go instead of reading in batches which impacts the overall performance as the complete data set is first cached into memory. As a result, this method is not suitable for big SAP tables as the process aborts within minutes because of insufficient memory error.

     

    In order to overcome the shortcomings of this process, Azure provides an option to extract data from SAP with partition (which is an extension of the above mentioned process) but apparently, this also fails to improve the performance and runs into space issues as it creates multiple RFC sessions per table as shown below. Another short coming of using the partition method is that, currently Microsoft supports only a few data types with this function.

     

    <<image removed>>

     

    Azure also provides an alternate method of data extraction from ECC using odata which is SAP’s version of consuming data using RESTful APIs. We were not able to test this at this point as it requires an additional process to be configured within the SAP server post approval from RB SAP owner and security team.

     

    1. Talend – For extracting SAP data using Talend, a Talend subscription is required as this feature is not included in the Talend Open Studio version which is the free version of Talend. A decision needs to be made by the management if we want to proceed with this tool which will involve a cost in procuring a license for the required subscription.

     

    1. SSIS – There’s not much to mention here as we are at a nascent stage of SSIS implementation. A ticket xxxx has already been raised with the support team for the installation.

     

    In a nutshell, we can say that with the current set of tools and configurations available to us, BODS seem to be a clear winner when it comes to data extraction from SAP system. Azure DF, on the other hand, still needs to mature as an ETL tool as it needs a lot of features to qualify as a full-fledged ETL tool(and requires a lot of additional configurations\set ups to suit our current data extraction requirement). As mentioned in the beginning, not much can be said for Talend and SSIS at this point as we need to test these tools first in order to do a successful comparison and see whether any of these tools can replace BODS as a primary SAP extraction tool.

    Please send your opinions\suggestions and let’s discuss this in our next review call. Thank you.

    Regards,

    PPP

  • Maria_Fleissner's avatar
    Maria_Fleissner
    Copper Contributor

    Dear Sachin-Ghorpade , 

     

    can you indeed share the knowledge on how to change the protocol of the SHIR from RFC to BSXML as this unfortuantely cannot be found in the documentation publicly available. 

    LozzieD Have you found an answer to your question from last year or how did you resolve your 'out of memory' error while loading data from SAP? 

     

    Any help on this is appreciated! Best regards, 

    Maria Fleissner

  • LozzieD's avatar
    LozzieD
    Copper Contributor

    We are suffering from the same issue and would desperately like to get hold of the documentation for changing the protocol from RFC to BASXML to solve the out of memory problem.

     

    Please could you let us know?

     

    Cheers.