Calculating top vaues
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 497
- Joined: Sun Dec 17, 2006 11:52 pm
- Location: Kolkata
- Contact:
-
- Premium Member
- Posts: 497
- Joined: Sun Dec 17, 2006 11:52 pm
- Location: Kolkata
- Contact:
How can the values be lined in a single row?DSguru2B wrote:You need to sort these values ascending, line them up in a single row, get the max value (last value), calculate the 10 percent breaking point, filter all the records equal to or less than that breaking point.pravin1581 wrote: It is top 10% of a particular column in a file.
If we are talking about Top 10% on values then you have to follow the method I gave above. If its Top 10% positional values, then all you need to do is get the number of records, sort again and get the 10% of the number of records. Get all records before that breaking point.
-
- Premium Member
- Posts: 497
- Joined: Sun Dec 17, 2006 11:52 pm
- Location: Kolkata
- Contact:
How can we get the record count before running the job . My requirement is the row count after sorting the data. The @OUTROWNUM function is not returning the row count.chulett wrote:Get the record count before the job runs. For UNIX or MKS 'wc -l' was suggested earlier so I didn't repost that. Pass that number in as a job parameter. After you sort, use a constraint where @OUTROWNUM <= 10% of the input rows.pravin1581 wrote:I am facing problems in record count also as DSLINKINFo always returns 0 . How can we restrict the data to top 10 % .chulett wrote:In Server get the record count and Sort the data descending on the value in question. Constrain your output to the 'top 10%' of the records. ...
Quoted post it self has the required information. 'wc -l' the suggested approach to find the row count before the run.pravin1581 wrote:How can we get the record count before running the job . My requirement is the row count after sorting the data. The @OUTROWNUM function is not returning the row count.chulett wrote:Get the record count before the job runs. For UNIX or MKS 'wc -l' was suggested earlier so I didn't repost that. Pass that number in as a job parameter. After you sort, use a constraint where @OUTROWNUM <= 10% of the input rows.pravin1581 wrote: I am facing problems in record count also as DSLINKINFo always returns 0 . How can we restrict the data to top 10 % .
Could you pleas post some input data sample and what you expect from that.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Premium Member
- Posts: 497
- Joined: Sun Dec 17, 2006 11:52 pm
- Location: Kolkata
- Contact:
Suppose the values are 2,4,6,8,10,10,11,11,12,13. There are 8 distinct values and I want to extract out top 10% values. The reqd value is 13.kumar_s wrote:Quoted post it self has the required information. 'wc -l' the suggested approach to find the row count before the run.pravin1581 wrote:How can we get the record count before running the job . My requirement is the row count after sorting the data. The @OUTROWNUM function is not returning the row count.chulett wrote: Get the record count before the job runs. For UNIX or MKS 'wc -l' was suggested earlier so I didn't repost that. Pass that number in as a job parameter. After you sort, use a constraint where @OUTROWNUM <= 10% of the input rows.
Could you pleas post some input data sample and what you expect from that.
In case if you input is 2,4,6,8,10,10,11,11,12,13,2,4,6,8,10,10,11,11,12,13 then you need to get 13,13 or 12,13?
i.e., only distinct values or just the top values found in data.
Do wc -l of you file in a ExecuteCommand activity (which will give you total record count of the file), and pass the Command.Output of the Execute.Command Activity as parameter to the job.
Read the file, and sort it. In transformer, restrict the rows using @INROWNUM <= Input.paramter/10.
But this logic will vary based on your reply to this.
i.e., only distinct values or just the top values found in data.
Do wc -l of you file in a ExecuteCommand activity (which will give you total record count of the file), and pass the Command.Output of the Execute.Command Activity as parameter to the job.
Read the file, and sort it. In transformer, restrict the rows using @INROWNUM <= Input.paramter/10.
But this logic will vary based on your reply to this.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Wow, to many nested quotes. Anywho...
How is your data coming in. Its not hard to line them up. You can do it at the unix level using awk or in datastage.
Line them up, get the percentage value to find out how many positions come under top 10% and get those many items. Its a few lines of code added on to the other median code I provided to you.
How is your data coming in. Its not hard to line them up. You can do it at the unix level using awk or in datastage.
Line them up, get the percentage value to find out how many positions come under top 10% and get those many items. Its a few lines of code added on to the other median code I provided to you.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Premium Member
- Posts: 497
- Joined: Sun Dec 17, 2006 11:52 pm
- Location: Kolkata
- Contact:
We want only the distinct top values. So the data needs to be sorted and distinct.kumar_s wrote:In case if you input is 2,4,6,8,10,10,11,11,12,13,2,4,6,8,10,10,11,11,12,13 then you need to get 13,13 or 12,13?
i.e., only distinct values or just the top values found in data.
Do wc -l of you file in a ExecuteCommand activity (which will give you total record count of the file), and pass the Command.Output of the Execute.Command Activity as parameter to the job.
Read the file, and sort it. In transformer, restrict the rows using @INROWNUM <= Input.paramter/10.
But this logic will vary based on your reply to this.
-
- Premium Member
- Posts: 497
- Joined: Sun Dec 17, 2006 11:52 pm
- Location: Kolkata
- Contact: