On Thu, 12 Jun 2014, davidson@ling.ohio-state.edu wrote:
On Thu, 12 Jun 2014, lina wrote:Hi, I wish to grab part of the CDS entry from http://www.ncbi.nlm.nih.gov/nuccore/KF699528.2 namely, "MLDHSSVNSTIAPGNLLNLPVWCYLLETEEGPILVDTGMPESAV NNEGLFNGTFVEGQILPKMTEEDRIVNILKRVGYEPDDLLYIISSHLHFDHAGGNGAF TNTPIIVQRTEYEAALHREEYMKECILPHLNYKIIEGDYEVVPGVQLLYTPGHSPGHQ SLFIETEQSGSILLTIDASYTKENFEDEVPFAGFDPELALSSIKRLKEVVAKEKPIIF FGHDIEQEKGCKVFPEYIPRAE"[snip]so it is going to be nice to know how to get these html plain file which contains these sequence, can anyone points out something to let me go further,using uzbl browser, along with either of the scripts on this page... http://www.uzbl.org/wiki/dump ...i think this can be done. (you can have your choice of html or plain text.)
PS: btw, uzbl has a relatively steep learning curve. if you are in a hurry, here is a cludge that should do what you want: jarjar@hell:~$ nuccore_fname=KF699528.2 jarjar@hell:~$ uzbl http://www.ncbi.nlm.nih.gov/nuccore/${nuccore_fname} 2>${nuccore_fname}_uzbl_squawks & [1] 2768 jarjar@hell:~$ uzbl_pid=$! jarjar@hell:~$ echo 'js document.documentElement.outerHTML' | socat - unix-connect:/tmp/uzbl_socket_${uzbl_pid} > ${nuccore_fname}_done.html jarjar@hell:~$ grep -A 4 '/translation=' ${nuccore_fname}_done.html /translation="MLDHSSVNSTIAPGNLLNLPVWCYLLETEEGPILVDTGMPESAV NNEGLFNGTFVEGQILPKMTEEDRIVNILKRVGYEPDDLLYIISSHLHFDHAGGNGAF TNTPIIVQRTEYEAALHREEYMKECILPHLNYKIIEGDYEVVPGVQLLYTPGHSPGHQ SLFIETEQSGSILLTIDASYTKENFEDEVPFAGFDPELALSSIKRLKEVVAKEKPIIF FGHDIEQEKGCKVFPEYIPRAE" if uzbl's complaints about the webpage don't interest you, replace 2>${nuccore_fname}_uzbl_squawks with 2>/dev/null. anyways, would be interesting to hear what solutions you find. -wes