🔍Regex Recipes

Extract Hashtags

Extract social media hashtags with Unicode support for international characters.

Pattern

#[\w\u00C0-\u017F]+

Explanation

Matches # followed by word characters and common accented characters. Add more Unicode ranges for other scripts.

Examples

Simple
Input
#javascript
Output
✓ Match
With numbers
Input
#web3
Output
✓ Match
Multiple
Input
Love #coding and #developer
Output
Matches both
With accents
Input
#café
Output
✓ Match (with Unicode range)
Not hashtag
Input
cost is #50
Output
✓ Match (but may be false positive)

Code Examples

JavaScript
// Basic hashtag extraction
const text = 'Loving #JavaScript and #WebDev! #coding';
const hashtags = text.match(/#[\w\u00C0-\u017F]+/g)
  .map(tag => tag.substring(1)); // Remove #
// Result: ['JavaScript', 'WebDev', 'coding']

// More Unicode support (emoji, CJK, etc.)
const unicodeHashtags = /#[\p{L}\p{N}_]+/gu;
const tags = text.match(unicodeHashtags);

Try it Now

💡 Tips

  • Use \p{L} in modern regex for full Unicode support
  • Platform rules: Twitter allows letters/numbers, no spaces
  • Consider max length (often 100-140 chars)
  • Normalize to lowercase for comparison

⚠️ Common Pitfalls

  • May match # in non-hashtag contexts
  • Different platforms have different rules
  • Unicode support varies by regex engine
  • May need word boundaries for precise matching