-Ford January Lot Blowout - Select your 2014-

Below Kelly Blue Book - (2014's still available)
www.Ford/2014/all-makes-and-models.html

Remanning cars can be seen below:

Choose Mode/Make Here

IIf and can't know there is some anti-theft control aftermarket or OE for this to kill the stereo/entertainment items I can't know. Would think it wouldn't come back on ever if really controlled like that.


It's that it forget presets that makes me think just plain loss of power. Most fuses that fail plain blow, look dark or obvious that the hoop in the pic is burned out as it's supposed to.


Many vehicles come with spare fuses and a puller in the box that contain them. Check and ONLY use exact AMP rated fuse always.


If you find it blown it never should come back as they don't heal if really blown.


Start with this and not think whole unit is messed up quite yet and doubt you are going to fix anything inside it without intense details of what to test where for what. There are places that do that worth it for most if only a one time fix if needed,

 

MMC Family Communication Group : 1 8 7 7 Atrium Dr : Melbourn : FL 32935

If you would like to stop further promos from us please head to
http://www.pannerautomobiles.com/annul/cameramen/locales.html

 

 

 

 

 

 

 

 

 

http://www.fueleconomy.gov/feg/Find.do?action=bt1

I have the following Haskell code, implementing a simple version of the "cat" unix command-line utility. Testing performance with "time" on a 400MB file, it's about 3x slower. (the exact script I am using to test it is below the code).

My questions are:

  1. Is this a valid test of performance?
  2. How can I make this program run faster?
  3. How can I identify performance bottlenecks in Haskell programs in general?

Regarding questions 2 and 3: I have used GHC -prof, then running with +RTS -p, but I'm finding the output a bit uninformative here.

Source (Main.hs)

module Main where  import System.IO import System.Environment import Data.ByteString as BS  import Control.Monad  -- Copied from cat source code bufsize = 1024*128  go handle buf = do   hPut stdout buf   eof <- hIsEOF handle   unless eof $ do     buf <- hGetSome handle bufsize     go handle buf  main = do   file    <- fmap Prelude.head getArgs   handle  <- openFile file ReadMode   buf     <- hGetSome handle bufsize   hSetBuffering stdin $ BlockBuffering (Just bufsize)   hSetBuffering stdout $ BlockBuffering (Just bufsize)   go handle buf

Timing script (run.sh):

#!/usr/bin/env bash  # Generate 10M lines of silly test data yes aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | head -n 10000000 > huge  # Compile with optimisation ghc -O2 Main.hs  # Run haskell echo "timing Haskell" time ./Main huge > /dev/null  echo "" echo ""  # Run cat echo "timing 'cat'" time cat huge > /dev/null

My results:

timing Haskell  real    0m0.980s user    0m0.296s sys     0m0.684s   timing 'cat'  real    0m0.304s user    0m0.001s sys     0m0.302s

The profiling report when compiling with -prof and running with +RTS -p is below:

  Sat Dec 13 21:26 2014 Time and Allocation Profiling Report  (Final)       Main +RTS -p -RTS huge    total time  =        0.92 secs   (922 ticks @ 1000 us, 1 processor)   total alloc = 7,258,596,176 bytes  (excludes profiling overheads)  COST CENTRE MODULE  %time %alloc  MAIN        MAIN    100.0  100.0                                                          individual     inherited COST CENTRE MODULE                   no.     entries  %time %alloc   %time %alloc  MAIN        MAIN                      46           0  100.0  100.0   100.0  100.0  CAF        GHC.Conc.Signal           84           0    0.0    0.0     0.0    0.0  CAF        GHC.IO.FD                 82           0    0.0    0.0     0.0    0.0  CAF        GHC.IO.Handle.FD          81           0    0.0    0.0     0.0    0.0  CAF        System.Posix.Internals    76           0    0.0    0.0     0.0    0.0  CAF        GHC.IO.Encoding           70           0    0.0    0.0     0.0    0.0  CAF      
   GHC.IO.Encoding.Iconv     69           0    0.0    0.0     0.0    0.0
performanceunixhaskell
share|improve this question
editedDec 13 '14 at 21:19

askedDec 13 '14 at 21:10
statusfailed
96 4
2
It's good to see that you have also added -O2 optimization. You can also post the link to the profiled report here.SibiDec 13 '14 at 21:13
4
Instead ofhGetSomeusehGetNonBlocking, it is increasing the performance substantiallySibiDec 13 '14 at 21:25
5
Notice caching impacts the timing significantly. If I runcatfirst in the shell script "benchmark" instead of second it performs about 60% slower on my system.Thomas M. DuBuissonDec 13 '14 at 21:59
3
-profwon’t be terribly useful if you don’t have any cost centers; try compiling it (and the wholebytestringpackage, too, if you can (probably in a Cabal sandbox)) with-fprof-autoand maybe-fprof-cafsin addition to-prof.icktoofayDec 13 '14 at 22:36
4
@statusfailed Apipes solutionruns pretty close. I have also attached benchmark with the code.SibiDec 13 '14 at 23:19

3 Answers

activeoldestvotes
up vote14down vote

This is only a partial answer trying to address the second question:

I tried something like this usingGHC.IO.BufferAPI:

module Main where  import System.IO import System.Environment import GHC.IO.Buffer import Data.ByteString as BS  import Control.Monad  -- Copied from cat source code bufsize = 1024*128  go handle bufPtr = do   read <- hGetBuf handle bufPtr bufsize   when (read > 0) $ do     hPutBuf stdout bufPtr read     go handle bufPtr  main = do   file    <- fmap Prelude.head getArgs   handle  <- openFile file ReadMode   buf     <- newByteBuffer bufsize WriteBuffer    withBuffer buf $ go handle

and it seems to come closer to the performance of 'cat', but still definitely slower...

time ./Cat huge > /dev/null  ./Cat huge > /dev/null  0.00s user 0.06s system 76% cpu 0.081 total  time cat huge > /dev/null   cat huge > /dev/null  0.00s user 0.05s system 75% cpu 0.063 total

I think using the buffer API, we can explicitly avoid allocating all the buffer bytestrings when usinghGetSomelike in the original code, but I am just guessing here and don't know either what exactly is happening in both compiled codes...

UPDATE: Adding the original code's performance on my laptop:

time ./Cat2 huge > /dev/null ./Cat2 huge > /dev/null  0.12s user 0.10s system 99% cpu 0.219 total

UPDATE 2: Adding some basic profiling results:

Original Code:

Cat2 +RTS -p -RTS huge      total time  =        0.21 secs   (211 ticks @ 1000 us, 1 processor)     total alloc = 6,954,068,112 bytes  (excludes profiling overheads)  COST CENTRE MODULE  %time %alloc  MAIN        MAIN    100.0  100.0                                                          individual     inherited COST CENTRE MODULE                   no.     entries  %time %alloc   %time %alloc  MAIN        MAIN                      46           0  100.0  100.0   100.0  100.0  CAF        GHC.IO.Handle.FD          86           0    0.0    0.0     0.0    0.0  CAF        GHC.Conc.Signal           82           0    0.0    0.0     0.0    0.0  CAF        GHC.IO.Encoding           80           0    0.0    0.0     0.0    0.0  CAF        GHC.IO.FD                 79           0    0.0    0.0     0.0    0.0  CAF        System.Posix.Internals    75           0    0.0    0.0     0.0    0.0  CAF        GHC.IO.Encoding.Iconv     72           0    0.0    0.0     0.0    0.0<
 /pre>
            

Buffer-API Code:

Cat +RTS -p -RTS huge      total time  =        0.06 secs   (61 ticks @ 1000 us, 1 processor)     total alloc =   3,487,712 bytes  (excludes profiling overheads)  COST CENTRE MODULE  %time %alloc  MAIN        MAIN    100.0   98.9                                                         individual     inherited COST CENTRE MODULE                  no.     entries  %time %alloc   %time %alloc  MAIN        MAIN                     44           0  100.0   98.9   100.0  100.0  CAF        GHC.IO.Handle.FD         85           0    0.0    1.0     0.0    1.0  CAF        GHC.Conc.Signal          82           0    0.0    0.0     0.0    0.0  CAF        GHC.IO.Encoding          80           0    0.0    0.1     0.0    0.1  CAF        GHC.IO.FD                79           0    0.0    0.0     0.0    0.0  CAF        GHC.IO.Encoding.Iconv    71           0    0.0    0.0     0.0    0.0

Notice especially the big difference in allocation costs...

share|improve this answer
editedDec 13 '14 at 23:11

answeredDec 13 '14 at 22:20
bmk
633 3 9
6
Note that there is also a fixed overhead of approximately 10 ms for every Haskell program, so if you subtract that from your Haskell timing it is even closer tocat.Gabriel GonzalezDec 13 '14 at 22:35
1
I know I didn't specify this in the question, but I'm hoping to go for a more idiomatic / "high level" Haskell solution. Upvoted for a good answer though.statusfailedDec 13 '14 at 22:58
up vote12down vote

The original question made me think it was about finding a performance issue in the exact code provided. Since the comment "I'm hoping to go for a more idiomatic / "high level" Haskell solution" contradicts that assumption, I'll give the reasonably performing idiomatic Haskell solution.

The way I would expect any random programmer familiar with Haskell to solve this problem is with Lazy bytestrings. This allows the programmer to simply specify the task of reading input and putting output while letting the compiler worry about mucking with the buffering and looping constructs.

module Main where

import System.IO import System.Environment import Data.ByteString.Lazy as BS  import Control.Monad  main :: IO () main = do   file    <- fmap Prelude.head getArgs   handle  <- openFile file ReadMode   buf     <- BS.hGetContents handle   hPut stdout buf

The result is both more readable and better performing than the code in the original question:

timing 'cat'  real    0m0.075s user    0m0.000s sys     0m0.074s timing strict bytestring with GHC -O2  real    0m0.254s user    0m0.126s sys     0m0.127s timing strict bytestring with GHC -O2 -fllvm  real    0m0.267s user    0m0.132s sys     0m0.134s timing lazy bytestring with GHC -O2  real    0m0.091s user    0m0.023s sys     0m0.067s timing lazy bytestring with GHC -O2 -fllvm  real    0m0.091s user    0m0.021s sys     0m0.069s

That is, the lazy bytestring solution is 21% slower thancat. Puttingcatlast for preferential caching behavior results in 59ms runtime placing the Haskell solution at 51% slower.

EDIT: Dons suggested using memory mapped IO would more accurately model cat's behavior. I'm not sure how accurate that statement is but mmap almost always results in better performance and this situation is certainly no exception:

timing memory mapped lazy bytestring with GHC -O2  real    0m0.008s user    0m0.004s sys     0m0.003s

Which was produced by:

module Main where  import System.IO (stdout) import System.Environment import System.IO.Posix.MMap.Lazy import Data.ByteString.Lazy (hPut)  import Control.Monad  main :: IO () main = do   file    <- fmap Prelude.head getArgs   buf     <- unsafeMMapFile file   hPut stdout buf