Raise your hand if you have a love-hate relationship with SQL Server and SSIS error messages. I know I do. How terribly descriptive the messages tend to be (not) when you need them the most.
My problem today is an SSIS package that has failed two weeks in a row (it runs weekly) on exporting data to an Excel spreadsheet. The package pulls security permissions down into a document for DBA verification. We check server, database, and object level permissions. It's pretty nifty, if I do say so myself (yes, I designed the current version). Our servers run SQL 2008 64 bit, and we use the 32 bit runtime to execute this package. The package has been working fine, without changes up, until the recent failures. Which makes the first troubleshooting question: "What changed?"
My first step is to pull the errors from the job log file and put them (and only them) in a separate document for easy viewing. This is what I find:
Error: 2012-02-19 14:05:24.94
Code: 0xC0202009
Source: dt_Objects_QC Excel ObjectPerm_QC [37]
Description: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.
End Error
Error: 2012-02-19 14:05:24.94
Code: 0xC0209029
Source: dt_Objects_QC Excel ObjectPerm_QC [37]
Description: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "input "OLE DB Destination Input" (50)" failed because error code 0xC020907B occurred, and the error row disposition on "input "OLE DB Destination Input" (50)" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
End Error
Error: 2012-02-19 14:05:24.94
Code: 0xC0047022
Source: dt_Objects_QC SSIS.Pipeline
Description: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component "Excel ObjectPerm_QC" (37) failed with error code 0xC0209029 while processing input "OLE DB Destination Input" (50). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.
End Error
Error: 2012-02-19 14:05:25.04
Code: 0xC02020C4
Source: dt_Objects_QC QC [1]
Description: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.
End Error
Error: 2012-02-19 14:05:25.04
Code: 0xC0047038
Source: dt_Objects_QC SSIS.Pipeline
Description: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component "QC" (1) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
End Error
Oh, yeah. This helps a lot. (@Set Sarcasm = OFF). I pull the package from production, and compare it to the latest source control version to see what changed. While I'm doing that, I take the code from the Data Flow Task in question and run the SELECT in SSMS, with the intent on checking the data for bad values and the row count. Additionally, I disable all tasks but the ones pointing to our QC environment and run the package on my desktop via BIDS to see what errors out.
Oddly, the failing Data Flow task shows only 2782 rows transferred to Excel before it fails. I concentrate my efforts on finding bad data. The error messages in BIDS are exactly the same as what's in the job log file, and just as useless. I go through them one at a time. "Description: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005." is completely generic, and doesn't help because there's nothing wrong with the SQL Server connection, nor with the code being run against SQL Server. The next two messages are also completely unhelpful. The fourth message ("Description: The attempt to add a row to the Data Flow task buffer failed with error code 0xC0047020.") I might be able to Google. Google comes up with all sorts of interesting things like "That error message is completely unrelated to the real error," "the system's memory is exhausted when using full-cache mode," etc. On a whim, I check the available NAS space to verify the report has enough space to be created. Yes, it does.
I open the existing Excel sheet to see what 2782 rows loaded into the tab in question. Wait. There's 65536 rows on that tab. Really?
I check my SELECT statement (which I'd forgotten to do earlier) and see 72k+ rows in the result set.
DOH! Poor Homer needs a donut. Guess what? "The attempt to add a row to the Data Flow task buffer failed" does really describe the problem. The SSIS package was overflowing the available row size on Excel. It literally could not add another row.
Note to self: Troubleshooting SQL Server is complicated enough. Let's not ignore the simple possibilities until we check them first. I used to know this when I was fresh-minted DBA. Lesson officially re-learned.

