客戶端語言(client side script)

在客戶端上跑的script，例如Javascript。
伺服器端語言(serve side script)

在伺服器上跑的script，例如Java、Node.js、PHP、C。它提供介面給使用者，並規範他們可以操作哪些資料

當客戶端發出一個request，伺服器會處理要丟到頁面上的資料，再回傳給客戶端

當客戶端收到回傳的資料時，若資料裡面有client side script，先執行script再載入畫面

靜態 vs 動態

static vs dynamic

(圖片取自learnwebskill)

	靜態	動態
定義	只用HTML、CSS、Javascript構築	有用後端語言(serve side script)構築
資料庫、伺服器	✖	○
優點	1. `載入`比動態網站`快`(因為直接就是HTML，或者build時就轉成HTML)。載入速度是Google評估網站效能的一項關鍵，當效能太差時會導致SEO分數下降 2. 較安全，因為不打request，故地方攻擊	使用者能發request改變頁面內容，`內容彈性`
缺點	1. `內容固定`，如果要改內容只能改HTML 2. 當網站規模大時，一頁一檔案`難以維護`	1. 切換畫面時必須打request，`使用者體驗差` 2. 伺服器負擔較大

但事實上並非所有網站都能乾淨地分為靜態或動態

渲染(render)

渲染指的是何時、何處、如何template被轉換成網頁內容

Rendering in the context of this series refers to how/when/where template (a preliminary version of markup) and data are combined to create the final markup content of a site.

渲染分為客戶端渲染(client side render)、伺服器端渲染(server side render)

客戶端渲染(client side render)

CSR是指渲染全由JS在客戶端上完成，所以在行動裝置上效能會比SSR差，因為操作DOM會比打request更吃效能

CSR SPAs are Javascript intensive therefore, features and performance depend heavily on the browser and the device. DOM manipulation can often be more computationally expensive than requesting a new page from a server.

CSR常見於SPA，這類網站常用AJAX、fetch，或者第三方套件、框架提供的API來打request

但CSR不一定是SPA，也可能是MPA(ex:多頁的靜態網站)。同樣地SPA不一定是CSR，也可能是universal Javascript

伺服器端渲染(server side render)

SSR是指客戶端發出一個request，伺服器的模板引擎(template engine，ex:EJS、Pug)解析完回傳HTML，客戶端收到後載入HTML

Isomorphic / Universal JavaScript。

isomorphic Javascript和universal Javascript是同個概念，意思是同一份Javascript程式，在客戶端、伺服器端都能運行

Isomorphic JavaScript applications are applications written in JavaScript that can run both on the client and on the server.

universal Javascript可以想成SSR與CSR的混合(hydrated)，因為第一頁會是SSR，其他則是CSR，故能有效解決SEO問題，使用者體驗也會比傳統SSR好。Nuxt的SSR(universal)模式就是這樣

比較

來小小結論一下

	SSR	CSR
SPA	Nuxt、Next建立的網站(實為universal Javascript)	Vue、React建立的網站
MPA	傳統動態網頁(切換頁面時server解析模板再回傳html)	傳統靜態網頁(SSG，build時就直接產生HTML)

	SEO	link preview	伺服器(Hosting)	使用者變多後的維護	支援離線使用	使用者體驗	效能
SSR MPA	🌸🌸🌸	🌸🌸🌸	🌸🌸	🌸	🌸	🌸	🌸🌸
SSR SPA	🌸🌸🌸	🌸🌸🌸	🌸🌸	🌸	🌸🌸	🌸🌸🌸	🌸🌸
CSR MPA	🌸🌸🌸	🌸🌸🌸	🌸🌸🌸	🌸🌸🌸	🌸🌸	🌸🌸	🌸🌸🌸
CSR SPA	🌸	🌸	🌸🌸🌸	🌸🌸🌸	🌸🌸🌸	🌸🌸🌸	🌸

(🌸越多越簡單)

SEO & Javascript

常聽說SPA網頁(因為CSR是用Javascript渲染)SEO很差，但其實也沒糟糕到不行，只能說會增加SEO的難度

Google & Bin的搜尋引擎優化

約在2010年左右，Google和Bing的爬蟲已有能力爬取Javascript網頁內容，而之後他們也持續在優化爬蟲爬取Javascript網站的能力

2019年發布了

The new evergreen Bingbot simplifying SEO by leveraging Microsoft Edge
Introducing a new JavaScript SEO video series
The new evergreen Googlebot

其中2019年Bing發表的文章中提到「我們優化了搜尋引擎使其可以跑Javascript並渲染頁面」、「對網頁開發者來說減緩了SEO(的難題)」

Today we’re announcing that Bing is adopting Microsoft Edge as the Bing engine to run JavaScript and render web pages. Doing so will create less fragmentation of the web and ease Search Engines Optimization (SEO) for all web developers.

2008年時Google爬蟲其實已經可以初步地爬取Javascript網頁的內容，2019年Google發表的文章也提到「升級了爬蟲的渲染引擎」

we are happy to announce that Googlebot now runs the latest Chromium rendering engine (74 at the time of this post) when rendering pages for Search.
Moving forward, Googlebot will regularly update its rendering engine to ensure support for latest web platform features.
Compared to the previous version, Googlebot now supports 1000+ new features, like:

・ ES6 and newer JavaScript features
・ IntersectionObserver for lazy-loading
・ Web Components v1 APIs

爬蟲做了甚麼

(圖片取自Google Search Central)

Googlebot是Google搜尋引擎的爬蟲

爬取Javascript網站大約可以分為3步驟－爬取、渲染、排序

當Googlebot 下載(fetch)好網站檔，首先會讀在根目錄的robots.txt確認那些檔案可爬取。如果URL被設為Disallow，那它就會跳過這個URL

The disallow directive specifies paths that must not be accessed by the crawlers

Google can't index the content of pages which are disallowed for crawling, but it may still index the URL and show it in search results without a snippet.
```
// User-agent若為Googlebot、AdsBot-Google，無法檢索任何以https://tempura327.github.io/The-F2E-tourism/開頭的網址
// 設定Disallow時大小寫一定要對

User-agent: Googlebot
User-agent: AdsBot-Google
Disallow: /The-F2E-tourism/
```
解析HTML，尋找a標籤href的URL，並把它丟到crawl queue。如果有不希望被爬的網址可以加上rel="nofollow"

Use the nofollow value when other values don't apply, and you'd rather Google not associate your site with, or crawl the linked page from, your site. For links within your own site, use the robots.txt disallow rule.
```

<a rel="nofollow" href="https://...">Foo</a>


<div data-nosnippet>not in snippet</div>
```

把完成以上步驟的網站丟到render queue

如果不想被排序的話，在meta標籤加上noindex，該網址幾秒後就會被扔出render queue

<!-- 不要在搜尋結果中顯示這個網頁、媒體或資源 -->
<meta name="googlebot" content="noindex">

<!-- 不要在這個網頁的搜尋結果中顯示文字摘要或影片預覽畫面。但是如果有靜態圖片縮圖，而且顯示出來有助於提升使用者體驗，那麼系統仍可能會顯示這類縮圖 -->
<meta name="googlebot-news" content="nosnippet">

渲染
- 一般靜態網站、SSR網站
  如果HTML內沒有script，直接使用渲染引擎將畫面渲染
  
  如果HTML有script先執行再渲染，但若是外連的script則爬蟲首先要下載它
```

<script>
 function foo(){
   return 'foo';
 }

 foo();
</script>


<script type="text/javascript" src="https://www./.../index.js" />
```
  但外連的script可能會遭遇到爬取配額(crawl budget)的問題
  
  爬取配額指的是一段時間內，不造成問題、降低使用者體驗的範圍內，網頁可被爬取的次數
  
  搜尋引擎會根據爬取網站伺服器的負擔能力、爬取的需求(和網站人氣、老舊度有關)來計算，但這個數字絕對不會是無限
  
  Googlebot is designed to be a good citizen of the web. Crawling is its main priority, while making sure it doesn't degrade the experience of users visiting the site. We call this the "crawl rate limit," which limits the maximum fetching rate for a given site.
  
  ・ Crawl health: If the site responds really quickly for a while, the limit goes up, meaning more connections can be used to crawl. If the site slows down or responds with server errors, the limit goes down and Googlebot crawls less.
  
  ・ Limit set in Search Console: Website owners can reduce Googlebot's crawling of their site. Note that setting higher limits doesn't automatically increase crawling.
  
  Even if the crawl rate limit isn't reached, if there's no demand from indexing, there will be low activity from Googlebot.
  
  ・ Popularity: URLs that are more popular on the Internet tend to be crawled more often to keep them fresher in our index.
  
  ・ Staleness: Our systems attempt to prevent URLs from becoming stale in the index.
- SPA
  先讓Javascript渲染畫面
  
  這階段還有一個潛在的危險是網站的js檔和爬蟲使用的Javascript引擎不相容，不過就Google來說的話，在2019年的文章中他們提到「定期更新轉譯引擎(rendering engine)」，所以應該不用太擔心
排序(indexing)

Google的SEO指引

Understand the JavaScript SEO basics中有提到一些不利、友善SEO的東西

fragment URL
當Googlebot在頁面裡找連結時，它只會認\的href屬性。避免使用fragment URL，因為fragment不會被送到伺服器，故爬蟲不會去爬取
```

<a href="#/chapter1" />
```
替代方案可以用History API

使用可讀性高的HTTP code

不要在SPA使用soft 404。soft 404是指網址所傳回的頁面告知使用者該網頁不存在，但卻回傳200。

soft 404會讓使用者誤認錯誤頁為實際上線的網頁，這種網頁會被Google搜尋排除

替代的方案可以將使用者導到404 not found的頁面，或者如果頁面沒有內容時使用meta robots標籤將content設為noindex

Googlebot在執行頁面的Javascript前，遇到meta robots標籤noindex則不會去渲染頁面，也不會進行排序

If Google encounters the noindex tag, it skips rendering and JavaScript execution. Because Google skips your JavaScript in this case, there is no chance to remove the tag from the page.

 fetch(`/api/products/${productId}`)
   .then(response => response.json())
   .then(product => {
     if(product.exists) {
       showProductDetails(product); // shows the product information on the page
     } else {
       // this product does not exist, so this is an error page.
       // Note: This example assumes there is no other meta robots tag present in the HTML.
       const metaRobots = document.createElement('meta');
       metaRobots.name = 'robots';
       metaRobots.content = 'noindex';
       document.head.appendChild(metaRobots);
     }
   })