Ruby / Strings and symbols
From WhyNotWiki
Aliases: Ruby / Strings, Ruby / Symbols See also: Ruby / Regular expressions
[edit] Strings and symbols: Strings
[edit] Searching (also has [Regexp (category)])
irb -> "abcde"["abc"]
=> "abc"
irb -> "abcde"[/.b./]
=> "abc"
irb -> "abcde".match /.b./
=> #<MatchData:0xb7eed6c4>
irb -> "abcde".match(/.b./)[0]
=> "abc"
irb -> "abcde"["z"]
=> nil
irb -> "<div><div>Contents</div></div>"[%r{<div>(.*)</div>}]
=> "<div><div>Contents</div></div>"
irb -> "<div><div>Contents</div></div>"[%r{<div>(.*)</div>}, 1]
=> "<div>Contents</div>"
[edit] Substrings (slice/[])
Unfortunately, [] gets character code rather than substring when you pass a single index rather than a range of indexes:
irb -> "abc"[0..2]
=> "abc"
irb -> "abc"[0..0]
=> "a"
irb -> "abc"[-1..-1]
=> "c"
But:
irb -> "abc"[-1]
=> 99
(not "c")
Another workaround (-1..-1 was the first workaround):
irb -> "abc"[-1].chr
=> "c"
[edit] Delimiters / Different ways to delimit a string literal
[edit] String interpolation
You can even nest #{} inside of #{}!
p "#{field} = #{ object.send("#{field}") } !"
[edit] %q{}, %Q{}, %q<>, etc.
Choose your own delimiter ({}, (), <>, [], ||, whatever)!
%Q{} allows string interpolation; %q{} does not.
[edit] Can be nested!
Can be useful for metaprogramming, or just building large strings...
['stdout', 'stderr'].each do |stream_name|
eval(%Q{
class Test_#{stream_name} < Test::Unit::TestCase
def setup
$#{stream_name} = StringIO.new
end
def test_simple_filter
filter_#{stream_name}(lambda{|input| ''}) do
noisy_command_#{stream_name}
end
assert_equal '', $#{stream_name}.string
end
end
})
end
Question: If you nest one string inside of the other, how do you control in which one the string interpolation happens?
Answer: By escaping the { characters, of course!
Interpolate now:
irb -> $a = 'test'
irb -> puts %Q{
" puts %Q{
" #{$a}
" }
" }
puts %Q{
test
}
=> nil
Interpolate later:
irb -> a = %Q{
" puts %Q{
" #\{$a\}
" }
" }
=> "\n puts %Q{\n \#{$a}\n }\n"
irb -> eval(a)
test
=> nil
[edit] Crazy powerful kung-fu heredoc syntax
[edit] To allow your terminating delimiter to be indented
irb -> <<-WayOutHere
" la dee da
" la dee da
" WayOutHere
=> "la dee da\n la dee da\n"
If you just use <<, then it will treat your indented delimiter as part of the string (it will not detect it as the delimiter unless all the way to the left -- no indenting).
irb -> <<WayOutHere
" WayOutHere
" WayOutHere
=> " WayOutHere\n"
[edit] To disable string interpolation
(example from Phrogs on ruby-talk at 2007-01-17 08:55)
Do this:
b = <<'FOO' b#{1+1} FOOinstead of this:
a = <<FOO a#{1+1} FOO
[edit] Can start heredoc in the middle of an expression, finish the rest of your expression, and then continue with the string
Kind of strange, but cool!
Example (mine):
irb -> /^=begin[ \t\f]*#{b=''}.*?\n(.*?)\n=end/mi.match(<<End )[1]
" =begin
" require 'foo'
" foo
" =end
" End
=> "require 'foo'\nfoo"
Example from http://ruby-doc.org/core/classes/ERB.html
def build
b = binding
# create and run templates, filling member data variebles
ERB.new(<<-'END_PRODUCT'.gsub(/^\s+/, ""), 0, "", "@product").result b
<%= PRODUCT[:name] %>
<%= PRODUCT[:desc] %>
END_PRODUCT
ERB.new(<<-'END_PRICE'.gsub(/^\s+/, ""), 0, "", "@price").result b
<%= PRODUCT[:name] %> -- <%= PRODUCT[:cost] %>
<%= PRODUCT[:desc] %>
END_PRICE
end
Example:
puts Subversion.help(subcommand).gsub(<<End, '')
Subversion is a tool for version control.
For additional information, see http://subversion.tigris.org/
End
... makes for nicer syntax than
puts Subversion.help(subcommand).gsub(<<End
Subversion is a tool for version control.
For additional information, see http://subversion.tigris.org/
End
, '')
In fact, that syntax isn't even valid!
syntax error, unexpected ',', expecting ')' (SyntaxError)
, '')
^
Nor is this:
puts Subversion.help(subcommand).gsub(<<End
Subversion is a tool for version control.
For additional information, see http://subversion.tigris.org/
End, '')
can't find string "End" anywhere before EOF (SyntaxError)
[edit] padding a string
"hello".rjust(20, " ") #=> " hello"
[edit] Indenting / Changing tab/indent
[edit] Removing indent
Let's say I want to remove the indent/leading-line-spaces from a multi-line string...
irb -> require 'facets/core/string/margin'
irb -> require 'facets/core/string/indent'
irb -> class String; def rchomp; self.gsub(/\A\n/, ''); end; end
irb -> input = %(
" line1
" line2
" ).rchomp
=> " line1\n line2\n"
irb -> puts input.margin
ine1
ine2
# Not what I wanted!
irb -> puts input.indent(-2) # Unindent by 2 spaces
line1
line2
# Good!
irb -> puts input.tab(0) # Replace any existing leading-line-spaces with 0 spaces.
line1
line2
# Good!
irb -> input = %(
" line1
" line2
" ).rchomp
=> " line1\n line2\n"
irb -> puts input.tab(0) # Replace any existing leading-line-spaces with 0 spaces.
line1
line2
=> nil
# Not quite what I wanted! I wanted line 2 to be ' line2'.
irb -> puts input.indent(-2) # Unindent by 2 spaces
line1
line2
# Yes, like that!
irb -> puts input.indent(2)
line1
line2
=> nil
irb -> puts input.tab(4)
line1
line2
=> nil
[edit] Processing a string one character at a time
irb -> "tyler".scan(/./) {|l| p l }
"t"
"y"
"l"
"e"
"r"
[edit] Checksums
irb -> "tyler".sum
=> 560
irb -> a = []; "tyler".each_byte {|l| a << l }; a.inject {|sum, i| sum + i}
=> 560
[edit] How do I capitalize the first letter? (the equivalent of ucfirst in PHP)
irb -> "hi there".capitalize
=> "Hi there"
irb -> "hi there".upcase
=> "HI THERE"
# Destructive modification?
irb -> original = "hi there"; new = original.dup; new.capitalize; original + " => " + new
=> "hi there => hi there"
irb -> original = "hi there"; new = original.dup; new.capitalize!; original + " => " + new
=> "hi there => Hi there"
[edit] How do I capitalize the first letter of each word? (the equivalent of ucwords in PHP)
I want to be able to do this:
irb -> "hi there".ucwords
=> "Hi There"
[edit] String#capitalize_all [Ruby Facets (category)]
http://facets.rubyforge.org/src/doc/rdoc/core/classes/String.html#M000904
capitalize_all( pattern=$;, *limit )Capitalize all words (or other patterned divisions) of a string.
"this is a test".capitalize_all #=> "This Is A Test"
[edit] Another implementation
If I had to implement it, I would first make a change_each_word(!) iterator, an then build capitalize_each_word(!) on top of that.
# TODO: move to quality_extensions
#require 'facets/string/partitions' # Facets 2.0?
require 'facets/core/string/each_word'
require 'qualitysmith_extensions/enumerable/enum' # Future version of Ruby?: obj.enum_for(method = :each, *args)
# irb -> s = 'anthony john doe'; s.change_each_word! {|a| a.capitalize}; s
# => "Anthony John Doe"
class String
def change_each_word(&block)
self.dup.change_each_word!(&block)
end
def change_each_word!
each_word do |value, range|
self[range] = (yield value)
end
end
end
class String
def capitalize_each_word!
change_each_word! do |word|
word.capitalize
end
end
alias_method :ucwords!, :capitalize_each_word!
end
# irb -> s = 'anthony john doe'; s.map_each_word {|a| a.capitalize}
# => ["Anthony", "John", "Doe"]
class String
def map_each_word
enum(:each_word).map do |value, range|
yield value
end
end
end
[edit] [Caveats (category)] [Built-in behavior is wrong (category)] s.downcase! returns nil rather than s!
You tell me: is this behavior intuitive?:
irb -> 'd'.downcase!
=> nil
irb -> ['d'].include? 'd'
=> true
irb -> ['d'].include? 'd'.downcase
=> true
# But!
irb -> ['d'].include? 'd'.downcase!
=> false
irb -> 'd'.downcase!
=> nil
I find that unintuitive.
What's more, it causes some obfuscation in order to "work around" this unwanted behavior.
Example:
irb -> response = ""
=> ""
irb -> response = $stdin.getc.chr while !['a', 'd', 'i', "\n"].include?(response.downcase!)
d
d
d
^DIRB::Abort: abort then interrupt!!
from /usr/lib/ruby/1.8/irb.rb:81:in `irb_abort'
from /usr/lib/ruby/1.8/irb.rb:243:in `signal_handle'
from /usr/lib/ruby/1.8/irb.rb:66:in `start'
from (irb):16:in `call'
from (irb):16:in `getc'
from (irb):16
from :0
I had to Control-D out of the loop because the exit condition was never being satisfied. Specifically, my input, 'd', had downcase! called on it, and response.downcase! resulted in nil, which is not in the list of valid inputs, so it kept looking hoping that maybe next time I'd enter something "more valid".
A workaround (that obfuscates mildly):
irb -> response = ""
=> ""
irb -> response = $stdin.getc.chr while !['a', 'd', 'i', "\n"].include?(begin response.downcase!; response end)
d => nil
[edit] How do I prefix all the elements in my array of strings with a prefix string?
irb -> elements = ['a', 'b', 'c']
=> ["a", "b", "c"]
This output just isn't cutting it...
irb -> puts elements a b c
Let's say instead you want to display it as a simple tree.
irb -> puts '+';
puts '\\- ' + elements
+
TypeError: can't convert Array into String
from (irb):3:in `+'
from (irb):3
That certainly doesn't work. I guess we want to use map. This works...
irb -> puts '+';
puts elements.map{|e| '\\- ' + e}
+
\- a
\- b
\- c
But it would kind of be nice to have the prefix come before the array to which it is prefixed...wouldn't it?
Hmm... how about this?
irb -> puts (['\\- ']*elements.size).map {|prefix| $a ||= -1; prefix + elements[$a += 1]}
\- a
\- b
\- c
Wow, is that ugly! And unsafe.
[To do: Find better solution]
[edit] Split
‘’.split is unicode safe - 'unicode string'.split // will split a string into its individual characters, even for multibyte characters. (http://woss.name/2006/05/07/notes-from-a-rails-course/)
[edit] Example of [Heredoc (category)], Example of [String#margin (category)]
[Ruby Facets (category)]
assert_equal <<-End.margin, output.chomp
|3 + x
|=> 4
End
If we'd just done this (without using margin):
assert_equal <<-End, output.chomp
3 + x
=> 4
End
, then we would have gotten a failure:
<" 3 + x\n => 4\n"> expected but was <"3 + x\n=> 4">.
To make the strings be equal without using margin, we'd have had to left-align everything, all the way to the left margin:
assert_equal <<-End, output
3 + x
=> 4
End
Yuck. I think that's exactly the sort of thing that prompted the author of String#margin to write it...
[edit] The ? "byte" [operator]
irb -> ?A
=> 65
irb -> ?\n
=> 10
irb -> ?\n.chr
=> "\n"
irb -> ?\t
=> 9
irb -> ?\r
=> 13
irb -> ?\ # That's a single space
=> 32
[edit] The use of \ within strings
Any time you are building a string of any significant length, you should be asking yourself this important question:
- Do I want the
\characters in this string to be treated as escape characters or as literal '\' characters?
Note the difference between these 2 behaviors:
\ as escape character |
\ as literal |
|
|---|---|---|
| special | inert, "safe" | |
|
"Everyone" knows that in order to get your
irb -> "\n"
=> "\n" # This is a newline.
irb -> "\d"
=> "d" # This, however, is just an
# ordinary, lowly 'd'!
irb -> puts "\n"
=> nil
irb -> puts "\d"
d
=> nil
|
irb -> '\n'
=> "\\n" # A literal '\' character
# followed by a literal 'n'
# character.
irb -> '\d'
=> "\\d"
irb -> puts '\n'
\n
=> nil
irb -> puts '\d'
\d
=> nil
|
|
"\d" #=> "d" %(\d) #=> "d" %Q(\d) #=> "d" |
'\d' #=> "\\d" %q(\d) #=> "\\d" |
|
'\n' #=> "\\n" %q(\n) #=> "\\n" |
"\n" #=> "\n" %(\n) #=> "\n" %Q(\n) #=> "\n" |
In summary, %q(...) is the same as '...' and both %(...) and %Q(...) are the same as "..." (for these test cases anyway).
I think the %q(...) form is typically the best choice for large strings that you want to be "safe" ("take these characters literally").
[edit] [Caveats (category)]: Be careful to consider how \ characters are treated when building code to be evaled
Here is one example of when I've forgotten about this behavior and have been bitten by it...
[Debugging stories (category)]
I had built up a string containing some code to be evaluated later in the context of my model:
$common_validation_code = %(
...
validates_format_of :zip, :with => /\d{5}(-\d{4})?/, :message => "should be in the form 12345 or 12345-1234"
...
)
class Model < ActiveRecord::Base
...
eval($common_validation_code)
...
end
However, this code was not working the way I expected it to. I expected the input '12345' to be considered valid, but it was telling me that it was not!
I did a quick sanity check in irb to convince myself that the regexp was in fact valid:
irb -> !!( '12345' =~ /\d{5}(-\d{4})?/ )
=> true
irb -> !!( '12345-1234' =~ /\d{5}(-\d{4})?/ )
=> true
irb -> !!( '1234' =~ /\d{5}(-\d{4})?/ )
=> false
Yeah, that's what I thought! So needless to say, I was a little bit confused as to why it wasn't working in my model.
It wasn't until I tried outputting the contents of my $common_validation_code variable to the screen that I realized what the problem was:
puts $common_validation_code
...
validates_format_of :zip, :with => /d{5}(-d{4})?/, :message => "should be 12345 or 12345-1234"
...
Wait a second, my regexp is supposed to be /\d{5}(-\d{4})?/, not /d{5}(-d{4})?/.
Heh. So it would have been fine with accepting zip codes like "ddddd" as valid, but not zip codes that contained actual numerals!
irb -> !!( 'ddddd-dddd' =~ /#{"\d{5}(-\d{4})?"}/ )
=> true
irb -> !!( '12345-1234' =~ /#{"\d{5}(-\d{4})?"}/ )
=> false
Anyway, the fix was really, really simple -- just change a single character and it made all the difference in the world!
- $common_validation_code = %( + $common_validation_code = %q( end
[Examples of a single character making a big difference (category)]
Moral of the story: %q( , not %( !
[edit] Which types of strings interpolate variables and which do not
It looks like these partitions are the same as for the \ character being literal or an escape character...
irb -> a='!'
=> "!"
irb -> '#{a}'
=> "\#{a}"
irb -> %q(#{a})
=> "\#{a}"
irb -> "#{a}"
=> "!"
irb -> %Q(#{a})
=> "!"
irb -> %(#{a})
=> "!"
[edit] Strings and symbols: Symbols
[edit] What are they?
They're different than strings. They're identifiers.
Bruce Tate (2007-03-13). Crossing borders: Extensions in Rails: The anatomy of an acts_as plug-in (http://www-128.ibm.com/developerworks/java/library/j-cb03137/index.html).
(A symbol is a user-defined name.)
[edit] Symbols can contain characters other than the normally allowed symbol characters
Usually, you just make symbols with letters and underscores, :like_this . But you can also do this:
irb -> :'complicated.symbol!@#$%^&*()'
=> :"complicated.symbol!@\#$%^&*()"
You can also do this: "whatever#{variable}".to_sym .
[edit] [Caveat (category)]: Symbol#to_s doesn't retain the initial : character
This can be very confusing, especially when you are evaling something, and you expect that symbols interpolated into a string will ... well, stay looking like symbols.
irb -> def foo; end
irb -> method(:foo)
=> #<Method: Object#foo>
irb -> puts "method(#{:foo})"
method(foo)
I would have expected to see:
method(:foo)
If we try to eval it, we get a less-than-helpful/[less-than-intuitive error (category)]:
irb -> eval "method(#{:foo})"
TypeError: (eval):1:in `method': nil is not a symbol
from (irb):8
from (eval):1
from (irb):8
from :0
(foo returns nil, which, of course, is not a symbol)
To work around this, it looks like Symbol#inspect does what we (sometimes) want Symbol#to_s to do, so we can use it instead:
irb -> puts "method(#{:foo.inspect})"
method(:foo)
irb -> eval "method(#{:foo.inspect})"
=> #<Method: Object#foo>
Categories: Articles that have aliases | Articles that have see-also links | Regexp | Facets | Ruby Facets | Caveats | Built-in behavior is wrong | To do | Heredoc | String | Debugging stories | Examples of a single character making a big difference | Pages containing web citations | Uninformative error messages | Ruby | Strings | Symbols | Intersection categories
