It takes extremely long time to run IA reports

This forum contains ProfileStage posts and now focuses at newer versions Infosphere Information Analyzer.

Moderators: chulett, rschirm

Post Reply
eimhelper01
Premium Member
Premium Member
Posts: 10
Joined: Mon May 23, 2011 5:19 am

It takes extremely long time to run IA reports

Post by eimhelper01 »

I am profiling my database in SQL Server 2008.

Column analysis is finished OK, then I am using report template to generate some profiling reports.

However, these reports takes extremely long time to finish. Some of them take around 2hours.

All Frequency distribution reports, they might be big, they run overnight and still not finished yet.

Just wondering.
1) Is there any place I can check the IA report running progress apart from the very simple interface inside IA ?

2) Why these reports take so long time to run ? Is there any way to improve ? Or how can I check if anything goes wrong ?
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

How many tables and fields? How many records?
Master data? Transactional data?
Are you including the primary keys in the frequency distribution reports?
What format? XML? PDF?

I don't know of a way to check progress beyond the standard UI.

To make them run quicker, choose what you put into them.
As I said, putting PKs into the FD report is not only going to take time, but you won't learn anything from it anyway: you can see that it's 100% unique from the Summary Statistics report.
Imagine 50 million lines in a FD report for the PK of a transactional table: crazy, not to mention time-consuming.
eimhelper01
Premium Member
Premium Member
Posts: 10
Joined: Mon May 23, 2011 5:19 am

Post by eimhelper01 »

Around 20 tables, however, most of them are very small. The big tables only have around half million rows.

maybe around 400 rows of these 20 tables are profiled.

It is a good idea that I may should tick off some columns not very meaningful. However, at this stage, I just want to take a rough look at everything.

Then I did column analysis on all columns of these tables, and run reports on all columns too.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

eimhelper01 wrote:Around 20 tables, however, most of them are very small. The big tables only have around half million rows.

maybe around 400 rows of these 20 tables are profiled.
Volumes aren't that high.
400 rows profiled? Couldn't imagine it taking more than a minute or 2 per report for that.
eimhelper01 wrote:It is a good idea that I may should tick off some columns not very meaningful. However, at this stage, I just want to take a rough look at everything.
Running a report is probably not the optimal way of getting a rough look.
Best to use the tool first: use the View Details tabs to look through before you do any kind of report. Click on the various column headings like length, format, value, frequency, data type etc to get a feel of the data. It makes it obvious which fields you would then include in the various reports.
eimhelper01 wrote:Then I did column analysis on all columns of these tables, and run reports on all columns too.
Does this mean you are doing the FD report on the big tables? Without any sort of filtering options, 500k lines of report for a single PK is a lot of report (10-20k pages), even if it's not a lot of data.

You could also cop a decent rendering overhead depending on the output format you choose.
eimhelper01
Premium Member
Premium Member
Posts: 10
Joined: Mon May 23, 2011 5:19 am

Post by eimhelper01 »

Not 400 rows, my fault, 400 columns. biggest table is around 0.5 million rows.

It could because the workload is big, then it is slow.

However, I suspect IA is not function properly, then show extremely slow performance.

Any way of checking if IA is function properly ?
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

How long did it take to run the column analysis?

Running reports does take longer than you might expect.
How long did it take to run the Summary Statistics report?

Are you doing the FD report over the whole 400 columns in 1 go?

Really hard to talk about what reasonable expectations might be unless we know more about the environment specs etc, what's going into the report and what format it is.
It's Windows: what does the Task Manager say?
eimhelper01
Premium Member
Premium Member
Posts: 10
Joined: Mon May 23, 2011 5:19 am

Post by eimhelper01 »

Really very good thoughts regarding performance.

Column analysis is not that slow, took around 2-3 hours to finish all of them. I didn't feel anything strange in column analysis.

Now, I can go to Column Analysis, the view detail to see FD.

Summary Statistics report is not that slow, just checked the log, it takes around 1.5 hours to finish.

Originally, I did all columns in one ago. However, I cancelled that job later on, now only picked up 30% of all tables to run, still show very slow performance.

All reports are in PDF format. I have no access to check server task manager.

In addition, there is one Java error (Null reference object) when I open the interface of report menu. Not quite sure if it is related.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

The java error sounds fairly normal.

Are you still doing the FD on the primary keys?
Remember, Summary Statistics prints at most a page or 2 per column.

FD might print any number of pages per column. I learned that the hard way the first time I printed the FD report for what I didn't think was a huge table, only to wait ages for a 1900 page PDF to come back. Not overly useful.

Try running reports to both CSV and PDF formats as a comparison, against the same set of tables and fields. You might find the PDF rendering is part of the issue.

That said, you should look to run the reports once you know what the data is like and know how you want to target the reports.
The Summary Statistics plus some well-made observations in the Notes can allow you to run the more detailed reports in a very targeted manner and remove the info overload element.
Post Reply