[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Please help parsing file [sed, awk, fortran, bash]



On Thu, Aug 30, 2012 at 04:37:19PM -0700, daniel jimenez wrote:
>    Hello all,
>    I need some help fixing the format of some pretty strangely compressed
>    data files. An example would be like this:
> 
>    2883
>    452
>    0  7
>    1  6
>    2
>    4
>    6
>    10  7
>    Parsing rules:
>    The first two lines should be ignored.
>    The first column is the 'index', the second column being the 'counter'.
>    If there is no second number (ex. index=2), then the second number should
>    be set to '1'.
>    If there the index skips (ex. from index=2 to index=4), then the indexes
>    which where skipped should be set to '0'
>    Max index is 1024.
>    That is it. I'd like to be guided to an app (scripting language? awk? sed?
>    I haven't used those so I really don't know where to start) that can help
>    me do that effectively.
>    The command, with the script possibly as the argument, is to be included
>    in a bash script right before a fortran program is executed as the fortran
>    program expects the file to be uncompressed and it doesn't seem intuitive
>    to do it from fortran. Although it would be nice for a guru to let me know
>    how to handle it from within...
>    In the end, any solution would be a great help.
>    Thanks.
>    --
>    Daniel Jimenez

Hi Daniel,

Here's my awk solution:


NR > 2 {       				# Ignore lines 1 & 2
  if (NF < 2){    			# If number of fields is less than one...
    counter=1  				# Set variable counter to one
  } else {
    counter=$2   			# Otherwise set counter to 2nd field
  }
  difference = $1 - last_index 		# Subtract last index to find gaps
  if (difference > 1){			# If gaps exist...
    for (i=1; i<=difference; i++){
      arr[i+last_index]=0		# Add skipped indices to array w/ zero value
    }
  }
  arr[$1]=counter			# Add index to array with value counter
  last_index=$1				# Remember this index for the next line
}
END {
  for (j=0; j<=last_index; j++){	
    print j, arr[j]			# Print all indices and their values
  }
}


Reply to: