Forum Discussion
manny213
Dec 16, 2024Brass Contributor
download data from web
Hi everyone
I need to download a bunch of files from a website:
https://www.finra.org/finra-data/browse-catalog/equity-short-interest/files
The address doesn't show the filters that need to be applied. If you go to that website and select 'Any' for both Month and Year then you will see all the files. Can someone help me with creating the PowerShell script to download all the files to a local folder on my machine?
Thank you
cc: LainRobertson
Hi manny213 ,
The safest, most reliable approach would be to retrieve the data using the API supplied by INRA:
- Home | FINRA API Developer Center
- Documentation | FINRA API Developer Center
- Catalog of Datasets | FINRA API Developer Center
I've included an alternative hack below for fetching the files you mentioned, but while this works today, there's many reasons it could fail in the future and so I don't recommend this approach.
Get-FinraFiles.ps1
[cmdletbinding()] param() # Request URI. $uri = 'https://www.finra.org/views/ajax?_wrapper_format=drupal_ajax&custom_month%5Bmonth%5D=any&custom_year%5Byear%5D=any'+ '&view_name=transparency_services&view_display_id=equity_short_interest_biweekly&view_args=&view_path=%2Fnode%2F336166'+ '&view_base_path=&view_dom_id=6df270b0453b3e8d5fc32607f5a986f1d230dcb33864626710bc37237fa9aec2&pager_element=0&_drupal_ajax=1'+ '&ajax_page_state%5Btheme%5D=finra_bootstrap_sass&ajax_page_state%5Btheme_token%5D='+ '&ajax_page_state%5Blibraries%5D=addtoany%2Faddtoany.front%2Cbetter_exposed_filters%2Fauto_submit%2Cbetter_exposed_filters'+ '%2Fgeneral%2Cblazy%2Fbio.ajax%2Cbootstrap_barrio%2Fbreadcrumb%2Cbootstrap_barrio%2Fform%2Cbootstrap_barrio%2Fgesta_opensans'+ '%2Cbootstrap_barrio%2Fglobal-styling%2Cchosen%2Fdrupal.chosen%2Cchosen_lib%2Fchosen.css%2Cfinra_bootstrap_sass'+ '%2Fapp-dynamic-reporting%2Cfinra_bootstrap_sass%2Fback-button-handler%2Cfinra_bootstrap_sass%2Fcookie-classification'+ '%2Cfinra_bootstrap_sass%2Fgesta%2Cfinra_bootstrap_sass%2Fglobal-styling%2Cfinra_bootstrap_sass%2FglossaryViews'+ '%2Cfinra_bootstrap_sass%2Fopensans%2Cfontawesome%2Ffontawesome.webfonts%2Cfontawesome%2Ffontawesome.webfonts.shim'+ '%2Cparagraphs%2Fdrupal.paragraphs.unpublished%2Csuperfish%2Fsuperfish%2Csuperfish%2Fsuperfish_hoverintent%2Csuperfish'+ '%2Fsuperfish_supersubs%2Csuperfish%2Fsuperfish_supposition%2Csystem%2Fbase%2Cviews%2Fviews.ajax%2Cviews%2Fviews.module'; # Destination folder. $destination = "D:\Data\Temp\Forum\finra"; # Invoke the web call. $data = Invoke-RestMethod -Method Get -Uri $uri -UseBasicParsing -ErrorAction:Stop; #region Use BITS to download the files. $bitsFiles = [regex]::Matches($data[2].data, "https.*\.csv", [System.Text.RegularExpressions.RegexOptions]::IgnoreCase).Value | ForEach-Object { $parts = $_.Split("/"); $filename = $parts[$parts.Length - 1]; [PSCustomObject] @{ Source = $_; Destination = "$destination\$filename"; } }; $bitsJobName = "forumExample"; $bitsFiles | Start-BitsTransfer -DisplayName $bitsJobName -ErrorAction:Stop; Get-BitsTransfer -Name $bitsJobName | Remove-BitsTransfer; #endregion
Cheers,
Lain
- LainRobertsonSilver Contributor
Hi manny213 ,
The safest, most reliable approach would be to retrieve the data using the API supplied by INRA:
- Home | FINRA API Developer Center
- Documentation | FINRA API Developer Center
- Catalog of Datasets | FINRA API Developer Center
I've included an alternative hack below for fetching the files you mentioned, but while this works today, there's many reasons it could fail in the future and so I don't recommend this approach.
Get-FinraFiles.ps1
[cmdletbinding()] param() # Request URI. $uri = 'https://www.finra.org/views/ajax?_wrapper_format=drupal_ajax&custom_month%5Bmonth%5D=any&custom_year%5Byear%5D=any'+ '&view_name=transparency_services&view_display_id=equity_short_interest_biweekly&view_args=&view_path=%2Fnode%2F336166'+ '&view_base_path=&view_dom_id=6df270b0453b3e8d5fc32607f5a986f1d230dcb33864626710bc37237fa9aec2&pager_element=0&_drupal_ajax=1'+ '&ajax_page_state%5Btheme%5D=finra_bootstrap_sass&ajax_page_state%5Btheme_token%5D='+ '&ajax_page_state%5Blibraries%5D=addtoany%2Faddtoany.front%2Cbetter_exposed_filters%2Fauto_submit%2Cbetter_exposed_filters'+ '%2Fgeneral%2Cblazy%2Fbio.ajax%2Cbootstrap_barrio%2Fbreadcrumb%2Cbootstrap_barrio%2Fform%2Cbootstrap_barrio%2Fgesta_opensans'+ '%2Cbootstrap_barrio%2Fglobal-styling%2Cchosen%2Fdrupal.chosen%2Cchosen_lib%2Fchosen.css%2Cfinra_bootstrap_sass'+ '%2Fapp-dynamic-reporting%2Cfinra_bootstrap_sass%2Fback-button-handler%2Cfinra_bootstrap_sass%2Fcookie-classification'+ '%2Cfinra_bootstrap_sass%2Fgesta%2Cfinra_bootstrap_sass%2Fglobal-styling%2Cfinra_bootstrap_sass%2FglossaryViews'+ '%2Cfinra_bootstrap_sass%2Fopensans%2Cfontawesome%2Ffontawesome.webfonts%2Cfontawesome%2Ffontawesome.webfonts.shim'+ '%2Cparagraphs%2Fdrupal.paragraphs.unpublished%2Csuperfish%2Fsuperfish%2Csuperfish%2Fsuperfish_hoverintent%2Csuperfish'+ '%2Fsuperfish_supersubs%2Csuperfish%2Fsuperfish_supposition%2Csystem%2Fbase%2Cviews%2Fviews.ajax%2Cviews%2Fviews.module'; # Destination folder. $destination = "D:\Data\Temp\Forum\finra"; # Invoke the web call. $data = Invoke-RestMethod -Method Get -Uri $uri -UseBasicParsing -ErrorAction:Stop; #region Use BITS to download the files. $bitsFiles = [regex]::Matches($data[2].data, "https.*\.csv", [System.Text.RegularExpressions.RegexOptions]::IgnoreCase).Value | ForEach-Object { $parts = $_.Split("/"); $filename = $parts[$parts.Length - 1]; [PSCustomObject] @{ Source = $_; Destination = "$destination\$filename"; } }; $bitsJobName = "forumExample"; $bitsFiles | Start-BitsTransfer -DisplayName $bitsJobName -ErrorAction:Stop; Get-BitsTransfer -Name $bitsJobName | Remove-BitsTransfer; #endregion
Cheers,
Lain
- manny213Brass Contributor
Thank you LainRobertson
I didn't know about that API. I will take a look. I agree that API is the best approach vs scraping.
I ran the script and it downloaded the files. Thank you!!
- manny213Brass Contributor
I took a look at the website. There is a file path to use:
https://cdn.finra.org/equity/otcmarket/biweekly/shrt20241129.csv
https://cdn.finra.org/equity/otcmarket/biweekly/shrt20241115.csvso we can use https://cdn.finra.org/equity/otcmarket/biweekly/shrt*.csv