[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Please help parsing file [sed, awk, fortran, bash]



Le vendredi 31 août 2012 à 09:46 +0100, Jon Dowland a écrit :
> On Fri, Aug 31, 2012 at 02:18:15AM +0000, Mark Blakeney wrote:
> > On Fri, 31 Aug 2012 01:31:29 +0000, Russell L. Harris wrote:
> > > This exercise provides the impetus to learn to use a very useful tool,
> > > namely Perl.
> > 
> > I would suggest python is a much better choice to a young person
> > just starting out.
> 
> Seconded. I wrote some Perl yesterday, for the first time in a while.
> I didn't miss it.

I second that too.

One possibility (in python) would have been:
data = [0] * 1024 
with open("your_file") as infile:
    infile.readline()
    infile.readline()
    for line in infile:
         sline = line.split()
         data[int(sline[0])] = int(sline[1]) if len(sline) > 1 else 1
       
If your input file was more like:
2883
452
0  7
1  6
2  1
4  1
6  1
10  7
Then:

import numpy as np
data = np.zeros(1024)
infile = np.genfromtxt("your_file", skiprows=2, dtype=[int, int])
data[infile[0]] = infile[1]

Then just access "data" directly using the index:
>>> print data[10]
10

The second example is also calculation-ready.

I guess there already exit a function out there that can correctly
handle missing data in space separated format that would allow a 4 line
parser as the one given above for your "compressed" data. 

Actually, if you really really want to compress data, just drop the
whole ASCII thing. Either use a known binary format (I use HDF5 for
instance) and/or compress your data using a compression program such as
zip/xz/7zip... (or build in in HDF5).



Reply to: