正規表達式 - lookaround


Posted by TempuraEngineer on 2022-04-27

目錄


lookaround是甚麼

lookaround是一種不佔位置的對比斷言(zero width assertions),但它並不對比特定字元,而是根據條件來對比。分為4種

它也可以多層、混在一起使用,例如/(A(?=B))(?=C)/、/A(?=B(?=C))/

  1. lookahead + positive: T(?=C) → [目標]後方的條件要符合「條件」
  2. lookbehind + positive: (?<=C)T → [目標]前方的條件要符合「條件」
  3. lookahead + negative: T(?!C) → [目標]後方的條件不能符合「條件」
  4. lookbehind + negative: (?<!C)T → [目標]前方的條件不能符合「條件」

對比特定字元的邊界對比:^、$、\b


positive + lookahead

const insult = 'Adam is such a asshole!';
const praise = 'Adam is such a genius!';

const aheadPositive = /such\s(?=a\sasshole)/g; // [such]的後面符合「a asshole」,就把such換成not

insult.replace(aheadPositive, 'not ');
praise.replace(aheadPositive, 'not ');


positive + lookbehind

const insult = 'Xi steamed bun, you son of bitch. Suck my dick';
const para = 'Our steamed bun is on sale now! Come to buy 3 and get 1 for free!';
const ad = 'Does steamed bun your favorite? Come to Mother Zai cooking school. We can teach you make delicious steamed bun~';

const behindPositive = /(?<=Xi\s)steamed bun/g; // [steamed bun]的前面符合「Xi」,就把steamed bun換成**

insult.replace(behindPositive, '**');
para.replace(behindPositive, '**');
ad.replace(behindPositive, '**');


negative + lookahead

const Emma = 'E221930121';
const Allen = 'E129377814';
const Joe = 'A121916907';

const isFemale = (id) => {
    const aheadNegative = /[A-Z](?![1]{1}[0-9]{8})/; // [英文]後面不符合「1開頭,共有9個數字」
    return aheadNegative.test(id);
}

isFemale(Emma); // true
isFemale(Allen); // false
isFemale(Joe); // false


negative + lookbehind

const Emma = 'E221930121';
const Allen = 'E129377814';
const Joe = 'A121916907';

const isNotFromKaouhsiung = (id) => {
    const behindNegative = /(?<![E])[1-2]{1}[0-9]{8}/; // [1開頭,共有9個數字]前面不符合「E」
    return behindNegative.test(id);
}

isNotFromKaouhsiung(Emma); // false
isNotFromKaouhsiung(Allen); // false
isNotFromKaouhsiung(Joe); // true


進階

(2023/11/3更新)

問題 & 資訊

以下有幾個valid的網址,我要擷取網址最多到create或者edit的部分

  • /xxx
  • /xxx/aaa
  • /xxx/action
  • /xxx/aaa/action
  • /yyy-yyy/bbb/action
  • /xxx/action/iii123
  • /xxx/action/123iii
  • /xxx/bbb/action/iii123
  • /zzz/ccc/action/iii123/iii123

從以上網址可以歸納出幾個規則

  1. 指出頁面分類的最小單位是「1個/,加上xxx、yyy、yyy-yyy、zzz、aaa、bbb、ccc」
    1. 必定放在開頭
    2. 有1~多組
  2. action只有/create和/edit
    1. 必定放在分類的後面
    2. 有0~1組
  3. /id的id是英(大小寫)、數字混合
    1. 必定在最後面
    2. 有0~多組
  4. id的前方一定有action,action的後方不見得有id


解法

可以拆成幾步思考

  1. 從1可以看出頁面分類是 /(xxx|yyy-yyy|yyy|zzz|aaa|bbb|ccc),加上1-2變成 (/(xxx|yyy|yyy-yyy|zzz|aaa|bbb|ccc))+
  2. 從2可以看出頁面分類是 /(create|edit),加上2-2變成 (/(create|edit))?
  3. 從3可以看出頁面分類是 /[a-zA-Z0-9]+,加上3-2變成 (/[a-zA-Z0-9]+)*

      const pageType = `(/(xxx|yyy|yyy-yyy|zzz|aaa|bbb|ccc))+`;
      const action = `(/(create|edit|create-multiple))?`;
      const id = `(/[a-zA-Z0-9]+)*`;
    
  4. 從4+2-1、2-2可以看出頁面分類和action的關係是「頁面分類後面是action,或者沒有東西」,所以變成 (/(xxx|yyy|yyy-yyy|zzz|aaa|bbb|ccc))+(?=(/(create|edit))?)

  5. 從4可以看出action和id的關係是「action後面是id,或者沒有東西」,所以變成 ((/(create|edit))?(?=(/[a-zA-Z0-9]+)*))
  6. 把前面兩步得到的正規表達式合起來,變成((/(xxx|yyy|yyy-yyy|zzz|aaa|bbb|ccc))+(?=((/(create|edit))?)(?=((/[a-zA-Z0-9]+)*))))
// 把route都整理在這邊
enum Routes {
    XXX = 'xxx',
    YYY = 'yyy',
    YYY_YYY = 'yyy-yyy',
    ZZZ = 'zzz',
    AAA = 'aaa',
    BBB = 'bbb',
    CCC = 'ccc'
}

const paths = [
  '/xxx',
  '/xxx/aaa',
  '/xxx/action',
  '/xxx/aaa/action',
  '/yyy-yyy/bbb/action',
  '/xxx/action/iii123',
  '/xxx/action/123iii',
  '/xxx/bbb/action/iii123',
  '/zzz/ccc/action/iii123/iii123',
];

// YYY_YYY一定要放YYY前面,不然會直接被YYY攔住
const pageType = `(/(${Routes.XXX}|${Routes.YYY_YYY}|${Routes.YYY}|${Routes.ZZZ}|${Routes.AAA}|${Routes.BBB}|${Routes.CCC}))+`;
const action = `/(create|edit)`;
const id = `(/[a-zA-Z0-9]+)*`;

// 為了讓複雜的正規表達式好讀些,所以拆三段
const entireRegex = new RegExp(`${pageType}(${action})?(?=(${id})*)`);

console.log(paths.map((p) => p.match(entireRegex)?.[0]));
// ["/xxx", "/xxx/aaa", "/xxx/create", "/yyy/aaa/create", "/yyy-yyy/bbb/create", "/xxx/create", "/xxx/edit", "/xxx/bbb/edit", "/zzz/ccc/create"]


參考資料

十五分鐘認識正規表達式


#lookaround #regexp #javascript







Related Posts

年輕小夥大戰jsp

年輕小夥大戰jsp

[Python] Multi-Process vs Multi-Thread

[Python] Multi-Process vs Multi-Thread

滲透測試重新打底(4)--Exploitation初介紹與密碼爆破

滲透測試重新打底(4)--Exploitation初介紹與密碼爆破


Comments