Ruby / Strings and symbols

From WhyNotWiki

Jump to: navigation, search

Aliases: Ruby / Strings, Ruby / Symbols See also: Ruby / Regular expressions

Contents

[edit] Strings and symbols: Strings

[edit] Searching (also has [Regexp (category)])

irb -> "abcde"["abc"]
    => "abc"

irb -> "abcde"[/.b./]
    => "abc"

irb -> "abcde".match /.b./
    => #<MatchData:0xb7eed6c4>

irb -> "abcde".match(/.b./)[0]
    => "abc"

irb -> "abcde"["z"]
    => nil
irb -> "<div><div>Contents</div></div>"[%r{<div>(.*)</div>}]
    => "<div><div>Contents</div></div>"

irb -> "<div><div>Contents</div></div>"[%r{<div>(.*)</div>}, 1]
    => "<div>Contents</div>"

[edit] Substrings (slice/[])

Unfortunately, [] gets character code rather than substring when you pass a single index rather than a range of indexes:

irb -> "abc"[0..2]
    => "abc"

irb -> "abc"[0..0]
    => "a"
irb -> "abc"[-1..-1]
    => "c"

But:

irb -> "abc"[-1]
    => 99
(not "c")

Another workaround (-1..-1 was the first workaround):

irb -> "abc"[-1].chr
    => "c"

[edit] Delimiters / Different ways to delimit a string literal

[edit] String interpolation

You can even nest #{} inside of #{}!

p "#{field} = #{ object.send("#{field}") } !"

[edit] %q{}, %Q{}, %q<>, etc.

Choose your own delimiter ({}, (), <>, [], ||, whatever)!

%Q{} allows string interpolation; %q{} does not.

[edit] Can be nested!

Can be useful for metaprogramming, or just building large strings...

['stdout', 'stderr'].each do |stream_name|
  eval(%Q{

    class Test_#{stream_name} < Test::Unit::TestCase
      def setup
        $#{stream_name} = StringIO.new
      end
      def test_simple_filter
        filter_#{stream_name}(lambda{|input| ''}) do
          noisy_command_#{stream_name}
        end
        assert_equal '', $#{stream_name}.string
      end
    end
  })
end

Question: If you nest one string inside of the other, how do you control in which one the string interpolation happens?

Answer: By escaping the { characters, of course!

Interpolate now:

irb -> $a = 'test'
irb -> puts %Q{
     "    puts %Q{
     "      #{$a}
     "    }
     "  }

   puts %Q{
     test
   }

    => nil

Interpolate later:

irb -> a = %Q{
     "   puts %Q{
     "     #\{$a\}
     "   }
     " }
    => "\n  puts %Q{\n    \#{$a}\n  }\n"

irb -> eval(a)

    test

    => nil

[edit] Crazy powerful kung-fu heredoc syntax

[edit] To allow your terminating delimiter to be indented

irb ->         <<-WayOutHere
     " la dee da
     "        la dee da
     "                       WayOutHere
    => "la dee da\n       la dee da\n"

If you just use <<, then it will treat your indented delimiter as part of the string (it will not detect it as the delimiter unless all the way to the left -- no indenting).

irb -> <<WayOutHere
     "                 WayOutHere
     " WayOutHere
    => "                WayOutHere\n"

[edit] To disable string interpolation

(example from Phrogs on ruby-talk at 2007-01-17 08:55)

Do this:

b = <<'FOO'
b#{1+1}
FOO

instead of this:

a = <<FOO
a#{1+1}
FOO

[edit] Can start heredoc in the middle of an expression, finish the rest of your expression, and then continue with the string

Kind of strange, but cool!

Example (mine):

irb -> /^=begin[ \t\f]*#{b=''}.*?\n(.*?)\n=end/mi.match(<<End )[1]
     " =begin
     " require 'foo'
     " foo
     " =end
     " End
    => "require 'foo'\nfoo"

Example from http://ruby-doc.org/core/classes/ERB.html

   def build
     b = binding
     # create and run templates, filling member data variebles
     ERB.new(<<-'END_PRODUCT'.gsub(/^\s+/, ""), 0, "", "@product").result b
       <%= PRODUCT[:name] %>
       <%= PRODUCT[:desc] %>
     END_PRODUCT
     ERB.new(<<-'END_PRICE'.gsub(/^\s+/, ""), 0, "", "@price").result b
       <%= PRODUCT[:name] %> -- <%= PRODUCT[:cost] %>
       <%= PRODUCT[:desc] %>
     END_PRICE
   end

Example:

          puts Subversion.help(subcommand).gsub(<<End, '')
Subversion is a tool for version control.
For additional information, see http://subversion.tigris.org/
End

... makes for nicer syntax than

          puts Subversion.help(subcommand).gsub(<<End
Subversion is a tool for version control.
For additional information, see http://subversion.tigris.org/
End
          , '')

In fact, that syntax isn't even valid!

 syntax error, unexpected ',', expecting ')' (SyntaxError)
          , '')
           ^

Nor is this:

          puts Subversion.help(subcommand).gsub(<<End
Subversion is a tool for version control.
For additional information, see http://subversion.tigris.org/
End, '')
 can't find string "End" anywhere before EOF (SyntaxError)

[edit] padding a string

   "hello".rjust(20, " ")           #=> "               hello"

[edit] Indenting / Changing tab/indent

[edit] Removing indent

[Facets (category)]

Let's say I want to remove the indent/leading-line-spaces from a multi-line string...

irb -> require 'facets/core/string/margin'
irb -> require 'facets/core/string/indent'

irb -> class String; def rchomp; self.gsub(/\A\n/, ''); end; end

irb -> input = %(
     "   line1
     "   line2
     " ).rchomp
    => "  line1\n  line2\n"

irb -> puts input.margin
ine1
ine2
# Not what I wanted!

irb -> puts input.indent(-2)  # Unindent by 2 spaces
line1
line2
# Good!

irb -> puts input.tab(0)      # Replace any existing leading-line-spaces with 0 spaces.
line1
line2
# Good!
irb -> input = %(
     "   line1
     "     line2
     " ).rchomp
    => "  line1\n    line2\n"

irb -> puts input.tab(0)      # Replace any existing leading-line-spaces with 0 spaces.
line1
line2
    => nil
# Not quite what I wanted! I wanted line 2 to be '  line2'.

irb -> puts input.indent(-2)  # Unindent by 2 spaces
line1
  line2
# Yes, like that!
irb -> puts input.indent(2)
    line1
      line2
    => nil

irb -> puts input.tab(4)
    line1
    line2
    => nil

[edit] Processing a string one character at a time

irb -> "tyler".scan(/./) {|l| p l }
"t"
"y"
"l"
"e"
"r"

[edit] Checksums

irb -> "tyler".sum
    => 560

irb -> a = []; "tyler".each_byte {|l| a << l }; a.inject {|sum, i| sum + i}
    => 560

[edit] How do I capitalize the first letter? (the equivalent of ucfirst in PHP)

irb -> "hi there".capitalize
    => "Hi there"

irb -> "hi there".upcase
    => "HI THERE"

# Destructive modification?
irb -> original = "hi there"; new = original.dup; new.capitalize; original + " => " + new
    => "hi there => hi there"
irb -> original = "hi there"; new = original.dup; new.capitalize!; original + " => " + new
    => "hi there => Hi there"

[edit] How do I capitalize the first letter of each word? (the equivalent of ucwords in PHP)

I want to be able to do this:

irb -> "hi there".ucwords
    => "Hi There"

[edit] String#capitalize_all [Ruby Facets (category)]

http://facets.rubyforge.org/src/doc/rdoc/core/classes/String.html#M000904

capitalize_all( pattern=$;, *limit )

Capitalize all words (or other patterned divisions) of a string.

  "this is a test".capitalize_all  #=> "This Is A Test"

[edit] Another implementation

If I had to implement it, I would first make a change_each_word(!) iterator, an then build capitalize_each_word(!) on top of that.

# TODO: move to quality_extensions

#require 'facets/string/partitions'  # Facets 2.0?
require 'facets/core/string/each_word'
require 'qualitysmith_extensions/enumerable/enum'   # Future version of Ruby?: obj.enum_for(method = :each, *args)

# irb -> s = 'anthony john doe'; s.change_each_word! {|a| a.capitalize}; s
#     => "Anthony John Doe"
class String
  def change_each_word(&block)
    self.dup.change_each_word!(&block)
  end
  def change_each_word!
    each_word do |value, range|
      self[range] = (yield value)
    end
  end
end
class String
  def capitalize_each_word!
    change_each_word! do |word|
      word.capitalize
    end
  end
  alias_method :ucwords!, :capitalize_each_word!
end

# irb -> s = 'anthony john doe'; s.map_each_word {|a| a.capitalize}
#     => ["Anthony", "John", "Doe"]
class String
  def map_each_word
    enum(:each_word).map do |value, range|
      yield value
    end
  end
end

[edit] [Caveats (category)] [Built-in behavior is wrong (category)] s.downcase! returns nil rather than s!

You tell me: is this behavior intuitive?:

irb -> 'd'.downcase!
    => nil
irb -> ['d'].include? 'd'
    => true

irb -> ['d'].include? 'd'.downcase
    => true

# But!
irb -> ['d'].include? 'd'.downcase!
    => false

irb -> 'd'.downcase!
    => nil

I find that unintuitive.

What's more, it causes some obfuscation in order to "work around" this unwanted behavior.

Example:

irb -> response = ""
    => ""
irb -> response = $stdin.getc.chr while !['a', 'd', 'i', "\n"].include?(response.downcase!)
d
d
d
^DIRB::Abort: abort then interrupt!!
        from /usr/lib/ruby/1.8/irb.rb:81:in `irb_abort'
        from /usr/lib/ruby/1.8/irb.rb:243:in `signal_handle'
        from /usr/lib/ruby/1.8/irb.rb:66:in `start'
        from (irb):16:in `call'
        from (irb):16:in `getc'
        from (irb):16
        from :0

I had to Control-D out of the loop because the exit condition was never being satisfied. Specifically, my input, 'd', had downcase! called on it, and response.downcase! resulted in nil, which is not in the list of valid inputs, so it kept looking hoping that maybe next time I'd enter something "more valid".

A workaround (that obfuscates mildly):

irb -> response = ""
    => ""
irb -> response = $stdin.getc.chr while !['a', 'd', 'i', "\n"].include?(begin response.downcase!; response end)
d    => nil

[edit] How do I prefix all the elements in my array of strings with a prefix string?

irb -> elements = ['a', 'b', 'c']
    => ["a", "b", "c"]

This output just isn't cutting it...

irb -> puts elements
a
b
c

Let's say instead you want to display it as a simple tree.

irb -> puts '+';
       puts '\\- ' + elements
+
TypeError: can't convert Array into String
        from (irb):3:in `+'
        from (irb):3

That certainly doesn't work. I guess we want to use map. This works...

irb -> puts '+';
       puts elements.map{|e| '\\- ' + e}
+
\- a
\- b
\- c

But it would kind of be nice to have the prefix come before the array to which it is prefixed...wouldn't it?

Hmm... how about this?

irb -> puts (['\\- ']*elements.size).map {|prefix| $a ||= -1; prefix + elements[$a += 1]}
\- a
\- b
\- c

Wow, is that ugly! And unsafe.

[To do: Find better solution]

[edit] Split

‘’.split is unicode safe - 'unicode string'.split // will split a string into its individual characters, even for multibyte characters. (http://woss.name/2006/05/07/notes-from-a-rails-course/)

[edit] Example of [Heredoc (category)], Example of [String#margin (category)]

[Ruby Facets (category)]

    assert_equal <<-End.margin, output.chomp
      |3 + x
      |=> 4
    End


If we'd just done this (without using margin):

    assert_equal <<-End, output.chomp
      3 + x
      => 4
    End

, then we would have gotten a failure:

<"      3 + x\n      => 4\n"> expected but was
<"3 + x\n=> 4">.

To make the strings be equal without using margin, we'd have had to left-align everything, all the way to the left margin:


    assert_equal <<-End, output
3 + x
=> 4
    End

Yuck. I think that's exactly the sort of thing that prompted the author of String#margin to write it...

[edit] The ? "byte" [operator]

irb -> ?A
    => 65

irb -> ?\n
    => 10

irb -> ?\n.chr
    => "\n"

irb -> ?\t
    => 9

irb -> ?\r
    => 13

irb -> ?\  # That's a single space
    => 32

[edit] The use of \ within strings

Any time you are building a string of any significant length, you should be asking yourself this important question:

Do I want the \ characters in this string to be treated as escape characters or as literal '\' characters?

Note the difference between these 2 behaviors:

\ as escape character \ as literal
special inert, "safe"

"Everyone" knows that in order to get your \n to be treated as a newline rather than a literal \n, you have to use double quotes ("\n") rather than single quotes ('\n'). But what happens if you use the intrepid \ escape character in front of other, less-suspecting characters, that normally don't appear following a \, like "d"...? Let's try it and see!

irb -> "\n"
    => "\n"   # This is a newline.

irb -> "\d"
    => "d"    # This, however, is just an 
              # ordinary, lowly 'd'!

irb -> puts "\n"

    => nil

irb -> puts "\d"
d
    => nil

irb -> '\n'
    => "\\n"  # A literal '\' character
              # followed by a literal 'n'
              # character.

irb -> '\d'
    => "\\d"

irb -> puts '\n'
\n
    => nil

irb -> puts '\d'
\d
    => nil
"\d"   #=> "d"
%(\d)  #=> "d"
%Q(\d) #=> "d"
'\d'   #=> "\\d"
%q(\d) #=> "\\d"
'\n'   #=> "\\n"
%q(\n) #=> "\\n"
"\n"   #=> "\n"
%(\n)  #=> "\n"
%Q(\n) #=> "\n"

In summary, %q(...) is the same as '...' and both %(...) and %Q(...) are the same as "..." (for these test cases anyway).

I think the %q(...) form is typically the best choice for large strings that you want to be "safe" ("take these characters literally").


[edit] [Caveats (category)]: Be careful to consider how \ characters are treated when building code to be evaled

Here is one example of when I've forgotten about this behavior and have been bitten by it...


[Debugging stories (category)]

I had built up a string containing some code to be evaluated later in the context of my model:

$common_validation_code = %(
  ...
  validates_format_of :zip, :with => /\d{5}(-\d{4})?/, :message => "should be in the form 12345 or 12345-1234"
  ...
)

class Model < ActiveRecord::Base
  ...
  eval($common_validation_code)
  ...
end

However, this code was not working the way I expected it to. I expected the input '12345' to be considered valid, but it was telling me that it was not!

I did a quick sanity check in irb to convince myself that the regexp was in fact valid:

irb -> !!( '12345' =~ /\d{5}(-\d{4})?/ )
    => true

irb -> !!( '12345-1234' =~ /\d{5}(-\d{4})?/ )
    => true

irb -> !!( '1234' =~ /\d{5}(-\d{4})?/ )
    => false

Yeah, that's what I thought! So needless to say, I was a little bit confused as to why it wasn't working in my model.

It wasn't until I tried outputting the contents of my $common_validation_code variable to the screen that I realized what the problem was:

puts $common_validation_code
  ...
  validates_format_of :zip, :with => /d{5}(-d{4})?/, :message => "should be 12345 or 12345-1234"
  ...

Wait a second, my regexp is supposed to be /\d{5}(-\d{4})?/, not /d{5}(-d{4})?/.

Heh. So it would have been fine with accepting zip codes like "ddddd" as valid, but not zip codes that contained actual numerals!

irb -> !!( 'ddddd-dddd' =~ /#{"\d{5}(-\d{4})?"}/ )
    => true

irb -> !!( '12345-1234' =~ /#{"\d{5}(-\d{4})?"}/ )
    => false

Anyway, the fix was really, really simple -- just change a single character and it made all the difference in the world!

-  $common_validation_code = %(
+  $common_validation_code = %q(
end

[Examples of a single character making a big difference (category)]

Moral of the story: %q( , not %( !

 


[edit] Which types of strings interpolate variables and which do not

It looks like these partitions are the same as for the \ character being literal or an escape character...

irb -> a='!'
    => "!"
irb -> '#{a}'
    => "\#{a}"

irb -> %q(#{a})
    => "\#{a}"
irb -> "#{a}"
    => "!"

irb -> %Q(#{a})
    => "!"

irb -> %(#{a})
    => "!"


[edit] Strings and symbols: Symbols

[edit] What are they?

They're different than strings. They're identifiers.

Bruce Tate (2007-03-13). Crossing borders: Extensions in Rails: The anatomy of an acts_as plug-in (http://www-128.ibm.com/developerworks/java/library/j-cb03137/index.html). Retrieved on 2007-03-14 16:44.

(A symbol is a user-defined name.)


[edit] Symbols can contain characters other than the normally allowed symbol characters

Usually, you just make symbols with letters and underscores, :like_this . But you can also do this:

irb -> :'complicated.symbol!@#$%^&*()'
    => :"complicated.symbol!@\#$%^&*()"

You can also do this: "whatever#{variable}".to_sym .

[edit] [Caveat (category)]: Symbol#to_s doesn't retain the initial : character

This can be very confusing, especially when you are evaling something, and you expect that symbols interpolated into a string will ... well, stay looking like symbols.


irb -> def foo; end

irb -> method(:foo)
    => #<Method: Object#foo>

irb -> puts "method(#{:foo})"
method(foo)

I would have expected to see:

method(:foo)

If we try to eval it, we get a less-than-helpful/[less-than-intuitive error (category)]:

irb -> eval "method(#{:foo})"
TypeError: (eval):1:in `method': nil is not a symbol
        from (irb):8
        from (eval):1
        from (irb):8
        from :0

(foo returns nil, which, of course, is not a symbol)

To work around this, it looks like Symbol#inspect does what we (sometimes) want Symbol#to_s to do, so we can use it instead:

irb -> puts "method(#{:foo.inspect})"
method(:foo)

irb -> eval "method(#{:foo.inspect})"
    => #<Method: Object#foo>
Personal tools