[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [0.5 OT] How to grab some entry by command line



On Thu, Jun 12, 2014 at 08:13:11PM +0800, lina wrote:
>    Hi,
> 
>    I wish to grab part of the CDS entry from
>    [1]http://www.ncbi.nlm.nih.gov/nuccore/KF699528.2

Watching that page load in Firefox, I notice that the body of the page
starts blank and then a "Loading" wheel appears before the text you're
interested in appears. To me, that suggests that the content of the page
is loaded by AJAX.

In that case, you have two options. Which one you choose probably
depends on the acceptable API of that page. The neatest method is to
read up on how the page fetches the content and to perform that call
yourself (for example, if the page has some functions to call
http://backend.example.com/fetch-content.php?KF699528&version=2, then
you'd fetch that URL yourself). However, it might be against the T&Cs of
the page to access the backend yourself. In that case, you had best
pretend to be a fully-featured User-Agent. That is, you'll need to run
the javascript on the page above, wait for the content to me loaded and
then extract the information you need from the DOM (an inspector such as
firebug or chrome-inspector will help you find the appropriate element)

> 
>    namely,
> 
>  "MLDHSSVNSTIAPGNLLNLPVWCYLLETEEGPILVDTGMPESAV
>                       NNEGLFNGTFVEGQILPKMTEEDRIVNILKRVGYEPDDLLYIISSHLHFDHAGGNGAF
>                       TNTPIIVQRTEYEAALHREEYMKECILPHLNYKIIEGDYEVVPGVQLLYTPGHSPGHQ
>                       SLFIETEQSGSILLTIDASYTKENFEDEVPFAGFDPELALSSIKRLKEVVAKEKPIIF
>                       FGHDIEQEKGCKVFPEYIPRAE"
> 
>    I tried to view the source code, but it doesn't show this part, and when I
>    used
> 
>    wget [2]http://www.ncbi.nlm.nih.gov/nuccore/KF699528.2
> 
>    the thing I grabbed also doesn't show this part.
> 
>    Because I have lots of things with different end part
>    [3]http://www.ncbi.nlm.nih.gov/nuccore/****
> 
>    so it is going to be nice to know how to get these html plain file which
>    contains these sequence,
> 
>    can anyone points out something to let me go further,
> 
>    Thanks, lina
> 
> References
> 
>    Visible links
>    1. http://www.ncbi.nlm.nih.gov/nuccore/KF699528.2
>    2. http://www.ncbi.nlm.nih.gov/nuccore/KF699528.2
>    3. http://www.ncbi.nlm.nih.gov/nuccore/****

Attachment: signature.asc
Description: Digital signature


Reply to: