[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Sunrise and Sunset from terminal



On Sat, Sep 23, 2023 at 06:45:08PM -0400, Felix Miata wrote:
> piorunz composed on 2023-09-23 23:35 (UTC+0100):
> 
> > SunTimes=$(curl --silent "https://www.timeanddate.com/sun/uk/london";
> > 2>/dev/null)
[...]

> > sh srss.sh
> Sunrise Today: 64:7889657242711361093201601361071834
> Sunset Today: 65:7242711361093201601361071834
> >
> 
> That sort of resembles the half day format common outside the military.

The web site in question <https://www.timeanddate.com/sun/uk/london>
gives me a page full of HTML and Javascript and gods only know what
else.  The shell script which "parses" this is absolutely not doing
it correctly, if indeed it's even *possible* to do correctly.

Take it apart piece by piece.

unicorn:~$ curl -s https://www.timeanddate.com/sun/uk/london | grep 'Sunrise Today'
</script></section></header><main class="tpl-banner__main layout-grid layout-grid--sky tpl-banner__main--astro"><section class="layout-grid__hero tpl-banner__hero"><nav class=nav-3><div class=fixed><a href="/astronomy/uk/london">Sun & Moon Today </a><a href="/sun/uk/london" class=active>Sunrise & Sunset </a><a href="/moon/uk/london">Moonrise & Moonset </a><a href="/moon/phases/uk/london">Moon Phases </a><a href="/eclipse/in/uk/london">Eclipses </a><a href="/astronomy/night/uk/london">Night Sky </a></div></nav></section><article class="layout-grid__main tpl-banner__content"><section class=bk-focus><div id=qlook class=bk-focus__qlook><img src="//c.tadst.com/gfx/n/i/sun.png" width=120 height=120><div class="h1 dn-mob">Daylight</div><p class=dn-mob>6:47 am &#8211; 6:57 pm<br>12 hours, 10 minutes</p></div><div class=bk-focus__info><table class="table table--left table--inner-borders-rows"><tbody><tr><th>Current Time: </th><td id=smct>Sep 23, 2023 at 11:51:30 pm</td></tr><tr><th>Sun Direction: </th><td id=sunaz><span class="comp sa30" title="Map direction North">↑</span> 341° North</td></tr><tr><th>Sun Altitude: </th><td id=sunalt>-37.1°</td></tr><tr><th>Sun Distance: </th><td id=sundist>93.273 million mi</td></tr><tr><th>Next Solstice: </th><td><a href="/calendar/seasons.html?n=136" title="Click to see detailed season information for London">Dec 22, 2023 3:27 am (Winter)</a></td></tr><tr><th>Sunrise Today: </th><td>6:47 am<span class="comp sa8" title="Map direction East">↑</span> 89° East</td></tr><tr><th>Sunset Today: </th><td>6:57 pm<span class="comp sa24" title="Map direction West">↑</span> 271° West</td></tr></tbody></table></div><div id=bk-map class=bk-focus__map><a href="/time/map/#!cities=136"><img title="Map showing the location of London. Click map to see the location on our worldwide Time Zone Map." src="//c.tadst.com/gfx/citymap/uk-10.png?9" alt="Location of London" width=320 height=160><img id=map-mrk src="//c.tadst.com/gfx/n/icon/icon-map-pin.png" class=fadeInDown style="left:136px;top:107px;position:absolute" alt=Location title="Location of London" width=18 height=34></a></div></section><div class=fixed></div>

As you can see, parsing the 2100-character-long "line" of the page which
contains the substring "Sunrise Today" gives more rubbish than answer.
Simply discarding all of the non-digit characters leaves you with all
of the digit characters from the rubbish, which is not by any means a
useful piece of output.

So, what to do instead?  I would first look for a data source that's
not intended to be displayed by a Javascript-enabled web browser.
Something that gives you the results in plain text would be great.
I doubt such a thing can be found easily.  Something that gives the
results in, say, JSON or XML format might be easier to find.  Then
you "only" need to write code that parses JSON or XML (realistically
meaning you call upon a dedicated tool or library for doing so).

Failing that -- and really, this is a LAST resort, not a first resort --
you could parse the HTML here.  Find a tool that parses HTML, which
usually have names like "xpath" or "xslt" or something.  Then analyze
the HTML yourself, figure out the hierarchical structure of the elements,
and use your knowledge of this layout when applying your HTML parsing
tool.

In this specific example, I see a table with

    <tr><th>Sunrise Today: </th><td>6:47 am

inside it.  A decent HTML parsing tool should be able to zero in on
the correct table, then iterate through rows until it finds the one
with "Sunrise Today" as a substring in its <th> element, and then
spit out the first <td> element.

Finding an alternative data source that doesn't require this level of
parsing is a far superior choice, if it's possible.

By the way, do you know what tool does NOT parse HTML correctly?
A mashup of grep, awk and sed.  Seriously, don't do this, ever.


Reply to: