SSIS Fuzzy Lookup Error
A Fuzzy Lookup Task in SSIS that I was working on was bombing out, and I had a heck of a time figuring out why. The error did not tell me much:
Error: 2011-05-18 10:42:09.33 Code: 0xC0047022 Source: Fuzzy Match InfoUSA SSIS.Pipeline Description: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component “Fuzzy Lookup” (49) failed with error code 0x8000FFFF while processing input “Fuzzy Lookup Input” (50). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.
End Error
Error: 2011-05-18 10:42:11.50 Code: 0xC02020C4 Source: Fuzzy Match InfoUSA CUSTOMER [1] Description: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.
End Error
Error: 2011-05-18 10:42:11.58 Code: 0xC0047038 Source: Fuzzy Match InfoUSA SSIS.Pipeline Description: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component “CUSTOMER” (1) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
End Error
After a lot of digging, what I found out is that it was a server memory issue: The Fuzzy Lookup Task loads the entire reference table defined in the task into memory. A large reference table could eat up all of the memory on the server. That is what was happening to me. The server had 8MB of memory, and I was loading in 2 million records. Once all the free memory was gone, the task bombed. When the task runs, you can jump on the server and watch how the memory slowly gets eaten up. The only solution if you want to continue using the Fuzzy Lookup Task is to get more memory, free up memory, or reduce the number of rows in the reference table.
Also note that if the job does not bomb but there is very little free memory, the Fuzzy Lookup Task will perform very slowly. When I reduced the number of rows in the reference table to 1.8 million, the task did not bomb, but took over an hour as the free memory shrunk to about 3%. When I reduced the number of rows to 1.5 million, it left 10% free memory and the task took only 10 minutes.
Is there any way you can us “Blocking Indexes” to partition and run the match in parallel. Here an example. http://blog.melissadata.com/mt-search.cgi?blog_id=2&tag=Record%20Linkage&limit=20