动态网页抓取(我对如何从此网页抓取数据有疑问(1)(图))
优采云 发布时间: 2021-11-05 12:09动态网页抓取(我对如何从此网页抓取数据有疑问(1)(图))
我有一个关于如何从此页面抓取数据的问题:
://&版本=1.11.2
好像还留在iframe里,屏幕上有很多JavaScript。
每当我尝试采集保存在 iframe 下的 span 或 div 或 tr 标签中的元素时,我似乎无法采集其中的数据。
我的目标是将内部文本保存在 class="pane-legend-item-value panel-legend-line main" 元素中。
显然,文本会根据光标在特定时间在屏幕上的位置而变化。所以我试图做的是设置一个已经加载页面并将光标放置在正确位置的 IE,在图表的末尾(为我提供最后一个数据点),您可以将光标移出屏幕,然后然后我写了一些简单的代码来获取IE窗口,然后尝试了GetElements,但是此时却无法获取任何数据。
到目前为止,这是我的代码,当我阅读更多选项时,它一直很粗糙,因为我一直在尝试编辑,但没有任何成功:(...任何想法或帮助将不胜感激!底部)
Sub InvestingCom()
Dim IE As InternetExplorer
Dim htmldoc As MSHTML.IHTMLDocument 'Document object
Dim eleColth As MSHTML.IHTMLElementCollection 'Element collection for th tags
Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
Dim eleRow As MSHTML.IHTMLElement 'Row elements
Dim eleCol As MSHTML.IHTMLElement 'Column elements
Dim elehr As MSHTML.IHTMLElement 'Header Element
Dim iframeDoc As MSHTML.HTMLDocument
Dim frame As HTMLIFrame
Dim ieURL As String 'URL
'Take Control of Open IE
marker = 0
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For x = 0 To (IE_count - 1)
On Error Resume Next
my_url = objShell.Windows(x).document.Location
my_title = objShell.Windows(x).document.Title
If my_title Like "*" & "*" Then 'compare to find if the desired web page is already open
Set IE = objShell.Windows(x)
marker = 1
Exit For
Else
End If
Next
'Extract data
Set htmldoc = IE.document 'Document webpage
' I have tried span, tr, td etc tags and various other options
' I have never actually tried collecting an HTMLFrame but googled it however was unsuccessful
End Sub
Excel 可以在另一个屏幕上打开并打开 excel 和 VB。IE截图,我要抓取的数据可以找找聊聊
最佳答案
我真的很难处理这个页面中的两个嵌套iframe来采集所需的内容。但无论如何,我终于修好了。运行以下代码并获得您所要求的内容:
Sub forexpros()
Dim IE As New InternetExplorer, html As HTMLDocument
Dim frm As Object, frmano As Object, post As Object
With IE
.Visible = True
.navigate "http://tv*敏*感*词*.forexpros.com/init.php?family_prefix=tv*敏*感*词*&carrier=64694b96ed4909e815f1d10605ae4e83&time=1513525898&domain_ID=70&lang_ID=70&timezone_ID=31&pair_ID=171&interval=86400&refresh=4&session=session&client=1&user=200743128&width=650&height=750&init_page=instrument&m_pids=&watchlist=&site=https://au.investing.com&version=1.11.2"
Do Until .readyState = READYSTATE_COMPLETE: Loop
Application.Wait (Now + TimeValue("0:00:05"))
Set frm = .document.getElementsByClassName("abs") ''this is the first iframe
.navigate frm(0).src
Do Until .readyState = READYSTATE_COMPLETE: Loop
Application.Wait (Now + TimeValue("0:00:05"))
Set html = .document
End With
Set frmano = html.getElementsByTagName("iframe")(0).contentWindow.document ''this is the second iframe
For Each post In frmano.getElementsByClassName("pane-legend-item-value pane-legend-line main")
Debug.Print post.innerText
Next post
IE.Quit
End Sub
关于javascript-VBA动态网页爬取Excel,我们在Stack Overflow上发现了一个类似的问题: