[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

help needed for converting strings in a file



hi,,
        i am making a small tool for offline web browsing.. For this I need to change the source of html files.
   let me explain::
 
        In a web page the hyper links are written as

href="http://www.micronux.com/catalog/"

i want this particular string to convert to


href="./micronux.com_catalog"

The logic is --1)delete
http://www.
2) replace '/' '?' etc with '_'

I want to write a script using sed or awk which will do all the conversion in a file..
Any help in this regard will be highly appreciated..
Thanking you
Sourabh Bora
Symbol Technologies India Ltd
Bangalore
India


Once I cross this stage I plan to make furthur improvements to make a user friendly offline browser.
PS:
I have written a C program but,,,it is very limited in functionality.
Usage:
./a.out
" http://www.micronux.com/catalog/"
output is
micronux.com_catalog
The limitation is it cannot search the whole file for any url starting with
href=


The code::::::
int main(int argc,char * *argv) {

char url[200];
char output[200];
int urlCount;
int outputCount;
strcpy(url,argv[1]);
urlCount=4;//start with url[4]--if http its : if https its "s"

if (url[urlCount]==':')
urlCount=7;
else
urlCount=8;

//now check for "www."
if(url[urlCount]=='w' && url[urlCount+1]=='w'&& url[urlCount+2]=='w' && url[urlCount+3]=='.')

urlCount+=4;
//now do the actual conversion--convert / \= : * ?" <> and | to _

for (outputCount=0;url[urlCount]!='\0';urlCount++,outputCount++){

if(url[urlCount]=='/' || url[urlCount]== '\\' || url[urlCount]=='=' || url[urlCount]==':' || url[urlCoun
t]=='*' || url[urlCount]=='\"' || url[urlCount]=='?' || url[urlCount]=='<' || url[urlCount]=='>' || url[urlCount]=='|')

output[outputCount]='_';
else
output[outputCount]=url[urlCount];

}

outputCount--;

//if the last character is '-' remove it

if(output[outputCount]=='_')
output[outputCount]='\0';
else
output[++outputCount]='\0';
printf("%s\n",output ) ;
return 0;
}

Reply to: