Fixing corrupted log files on reboot
Moderators: chulett, rschirm, roy
Fixing corrupted log files on reboot
Hi All,
I've performed a search for corrupted log files, but I would like a more coplete picture on the issue from the experianced experts here.
In 1 of the projects I'm involved in at this time, there are some reboots on the DS Server macihine due to power failiour.
I know and agree that this situation is not acceptable, but it seems I'll have to live with this for a while till they fix it.
Now the problem is when a power failiour occurs and a DS job/s is/are running the logs get corrupted and need to be fixed before another run is made, also the status of jobs need to be reset.
I want, to give a temporary solution, till the power issues are resolved, to automatically fix all the log files and status files on startup.
AFAIK, I need some select NAME,JOBNO from DS_JOBS and use uvfixfile.exe on the RT_LOG<JOBNO> foreach job (except myself)
and also preform a CLEAR.FILE RT_STATUS<JOBNO>.
the question is: do I need somethign else, or is this enough?
My plan it to build something that will be imbeded in the system startup, after DS services are up and perform this operation before any further run of regular DS jobs is made.
any insight on this would be apreciated
Thanks in advance,
I've performed a search for corrupted log files, but I would like a more coplete picture on the issue from the experianced experts here.
In 1 of the projects I'm involved in at this time, there are some reboots on the DS Server macihine due to power failiour.
I know and agree that this situation is not acceptable, but it seems I'll have to live with this for a while till they fix it.
Now the problem is when a power failiour occurs and a DS job/s is/are running the logs get corrupted and need to be fixed before another run is made, also the status of jobs need to be reset.
I want, to give a temporary solution, till the power issues are resolved, to automatically fix all the log files and status files on startup.
AFAIK, I need some select NAME,JOBNO from DS_JOBS and use uvfixfile.exe on the RT_LOG<JOBNO> foreach job (except myself)
and also preform a CLEAR.FILE RT_STATUS<JOBNO>.
the question is: do I need somethign else, or is this enough?
My plan it to build something that will be imbeded in the system startup, after DS services are up and perform this operation before any further run of regular DS jobs is made.
any insight on this would be apreciated
Thanks in advance,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Hi,
I was wondering, if I'm not interested in log history, would a CLEAR.FILE to the RT_LOG## be enough
Thanks in advance (again),
I was wondering, if I'm not interested in log history, would a CLEAR.FILE to the RT_LOG## be enough
Thanks in advance (again),
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Which hashed files in the repository you need to check depends on which hashed files were being written when the power failed. You're right that the most likely candidates will be the log files and the status files. But, if development work was being done at the time, there's also the config files and the DS_... files to check.
Clearing the files should eliminate any corruption caused by an interrupted write but you do lose information.
You can check for corruption in hashed files using uvfixfile or fixtool from the operating system command line, so that you know which ones need possible repair.
Clearing the files should eliminate any corruption caused by an interrupted write but you do lose information.
You can check for corruption in hashed files using uvfixfile or fixtool from the operating system command line, so that you know which ones need possible repair.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
No it doesn't. The next event number is kept in a control record called //SEQUENCE.NO in the data portion.
There are two other control records in a log file; //PURGE.SETTINGS and //JOB.STARTED.NO, which is why it's never a good idea to use CLEAR.FILE on DataStage logs.
There are two other control records in a log file; //PURGE.SETTINGS and //JOB.STARTED.NO, which is why it's never a good idea to use CLEAR.FILE on DataStage logs.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi,
Thanks guys .
actually there is no developlment done there it's a production system.
the things is that I have 1 controll job and it runs sequence and server jobs that are multi instance.
when power failiour occurs log files get corrupted.
and after the machine goes back up it disrupts the normal flow of things and messes it all up.
would Ken's compile routine be more efective in this case?
I got some conflicting answers about mullti instance jobs and their RT_LOG & RT_STATUS files.
since only jobs that were running while the power went down are candidates for this problem I thought of checking for thier status, but then thought that it might be more simple to compile the main multi instance job since all together there are 40 or so jobs that run in multiple instances.
do you have any tips on multi instances jobs handling in this case?
Thanks,
Thanks guys .
actually there is no developlment done there it's a production system.
the things is that I have 1 controll job and it runs sequence and server jobs that are multi instance.
when power failiour occurs log files get corrupted.
and after the machine goes back up it disrupts the normal flow of things and messes it all up.
would Ken's compile routine be more efective in this case?
I got some conflicting answers about mullti instance jobs and their RT_LOG & RT_STATUS files.
since only jobs that were running while the power went down are candidates for this problem I thought of checking for thier status, but then thought that it might be more simple to compile the main multi instance job since all together there are 40 or so jobs that run in multiple instances.
do you have any tips on multi instances jobs handling in this case?
Thanks,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
A hard crash that you are describing is tricky to programmatically recover. Who watches the watcher? If the main controlling job itself crashes, corrupting its log, status, and config files, then how does that get automatically rectified?
I think in the event of a catastraphic failure, such as a reboot during a run, you should simply sweep the system. I hope you see the wisdom behind building an ETL application that stages load ready data, and defers all loads until transforms are done and then can simply load the result sets. Not only is this easier to do, amenable to bulk loading, restarts, etc, but it also won't leave your target in a semi-updated state that is more difficult to recover. That being said, I think you should get your hands on a programmatic recompile tool and recompile all of your jobs. You mentioned using one supplied by me.
In either case, you should consider a system wide log purge using a "CLEAR.FILE", as well as that doesn't do a programmatic remove, but more like a "cat /dev/null > file" type operation. Your log purge setting row is actually comingled with log data, so if the log is corrupted this setting is unrecoverable anyway. I have a utility for mass setting the auto-log purge setting if you are interested. If I were you, I'd write a Batch job to clear the status and config file for every job as well. So to recap, a utility Batch job that sweeps all jobs and clears their status, log, and config files. Then, get the log purge setting utility I mentioned to mass set the lost purge settings again.
I hope you also see why it is paramount to track job execution history outside DataStage, as its own internal logging structures are sensitive not only to hard system crashes and corruptions, but if you run the project out of disk space the result is the same as if you kicked the power plug out.
I think in the event of a catastraphic failure, such as a reboot during a run, you should simply sweep the system. I hope you see the wisdom behind building an ETL application that stages load ready data, and defers all loads until transforms are done and then can simply load the result sets. Not only is this easier to do, amenable to bulk loading, restarts, etc, but it also won't leave your target in a semi-updated state that is more difficult to recover. That being said, I think you should get your hands on a programmatic recompile tool and recompile all of your jobs. You mentioned using one supplied by me.
In either case, you should consider a system wide log purge using a "CLEAR.FILE", as well as that doesn't do a programmatic remove, but more like a "cat /dev/null > file" type operation. Your log purge setting row is actually comingled with log data, so if the log is corrupted this setting is unrecoverable anyway. I have a utility for mass setting the auto-log purge setting if you are interested. If I were you, I'd write a Batch job to clear the status and config file for every job as well. So to recap, a utility Batch job that sweeps all jobs and clears their status, log, and config files. Then, get the log purge setting utility I mentioned to mass set the lost purge settings again.
I hope you also see why it is paramount to track job execution history outside DataStage, as its own internal logging structures are sensitive not only to hard system crashes and corruptions, but if you run the project out of disk space the result is the same as if you kicked the power plug out.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Thanks Ken,
actually my job should have no problems in 97% of cases to rerun again and the remainig 3% I have a job that reporocesses everything in 2 hours or so.
I want to get something clear, if I may, in case a RT_LOG file is corrupted will a CLEAR.FILE on it and the RT_STATUS do the work?
and another thing, AFAI understood the RT_LOG file is shared by multi instances is it so? and the RT_STATUS as well.
(I rather sound dumb or stupid and get it 100% right then have a 1% doubt and fail in my task )
Thanks again,
actually my job should have no problems in 97% of cases to rerun again and the remainig 3% I have a job that reporocesses everything in 2 hours or so.
I want to get something clear, if I may, in case a RT_LOG file is corrupted will a CLEAR.FILE on it and the RT_STATUS do the work?
and another thing, AFAI understood the RT_LOG file is shared by multi instances is it so? and the RT_STATUS as well.
(I rather sound dumb or stupid and get it 100% right then have a 1% doubt and fail in my task )
Thanks again,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
CLEAR.FILE on RT_LOG and RT_STATUS is highly likely to clear any logical corruption. This is not, however, 100% guaranteed, and definitely is not guaranteed to fix any physical corruption (for example bad spot on disk).
It also means you lose the control records. DataStage will re-create the control records in the log file as needed, but you will lose any job-specific purge settings.
To automate this checking process on re-boot you should create it as a BAT file and organize for it to execute once DataStage has re-started.
The one RT_LOGnn file (hashed file) is shared by all instances of the job.
It also means you lose the control records. DataStage will re-create the control records in the log file as needed, but you will lose any job-specific purge settings.
To automate this checking process on re-boot you should create it as a BAT file and organize for it to execute once DataStage has re-started.
The one RT_LOGnn file (hashed file) is shared by all instances of the job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks Ray,
for some reason my support provider said Ascential says there are seperate RT_LOG files where multi instance is concerned so thought STATUS as well, I remember you mentioning this once or more here.
since Ascential said so, as I gather I'll check this on a new, clean project ASAP.
Thanks again ,
for some reason my support provider said Ascential says there are seperate RT_LOG files where multi instance is concerned so thought STATUS as well, I remember you mentioning this once or more here.
since Ascential said so, as I gather I'll check this on a new, clean project ASAP.
Thanks again ,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The advice you have received is, quite simply, wrong.
You get different views (one per instance) in the Director log view. This may have confused your support provider. (If your support provider can't understand the concept of a view, maybe you have another problem!)
However, there is only one RT_LOGxx file for all instances.
You get different views (one per instance) in the Director log view. This may have confused your support provider. (If your support provider can't understand the concept of a view, maybe you have another problem!)
However, there is only one RT_LOGxx file for all instances.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks ,
I tested this on a clean project and as Ray said there is only 1 phisical RT_... set of files for a multiple instance job .
(tested on version 6)
I tested this on a clean project and as Ray said there is only 1 phisical RT_... set of files for a multiple instance job .
(tested on version 6)
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Hi,
Just Wanted to fill you in.
the CLEAR.FILE for the RT_LOGnn & RT_STATUSnn files
did the job
Thanks All,
Just Wanted to fill you in.
the CLEAR.FILE for the RT_LOGnn & RT_STATUSnn files
did the job
Thanks All,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org