Re: Please help parsing file [sed, awk, fortran, bash]
On Thu, Aug 30, 2012 at 04:37:19PM -0700, daniel jimenez wrote:
> Hello all,
> I need some help fixing the format of some pretty strangely compressed
> data files. An example would be like this:
>
> 2883
> 452
> 0 7
> 1 6
> 2
> 4
> 6
> 10 7
> Parsing rules:
> The first two lines should be ignored.
> The first column is the 'index', the second column being the 'counter'.
> If there is no second number (ex. index=2), then the second number should
> be set to '1'.
> If there the index skips (ex. from index=2 to index=4), then the indexes
> which where skipped should be set to '0'
> Max index is 1024.
> That is it. I'd like to be guided to an app (scripting language? awk? sed?
> I haven't used those so I really don't know where to start) that can help
> me do that effectively.
> The command, with the script possibly as the argument, is to be included
> in a bash script right before a fortran program is executed as the fortran
> program expects the file to be uncompressed and it doesn't seem intuitive
> to do it from fortran. Although it would be nice for a guru to let me know
> how to handle it from within...
> In the end, any solution would be a great help.
> Thanks.
> --
> Daniel Jimenez
Hi Daniel,
Here's my awk solution:
NR > 2 { # Ignore lines 1 & 2
if (NF < 2){ # If number of fields is less than one...
counter=1 # Set variable counter to one
} else {
counter=$2 # Otherwise set counter to 2nd field
}
difference = $1 - last_index # Subtract last index to find gaps
if (difference > 1){ # If gaps exist...
for (i=1; i<=difference; i++){
arr[i+last_index]=0 # Add skipped indices to array w/ zero value
}
}
arr[$1]=counter # Add index to array with value counter
last_index=$1 # Remember this index for the next line
}
END {
for (j=0; j<=last_index; j++){
print j, arr[j] # Print all indices and their values
}
}
Reply to: