V8 JavaScript Engine: 2015

Thursday, December 17, 2015

There's Math.random(), and then there's Math.random()

Math.random()
Returns a Number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo randomly with approximately uniform distribution over that range, using an implementation-dependent algorithm or strategy. This function takes no arguments.

ES 2015, 20.2.2.27

Math.random() is the most well-known and frequently-used source of randomness in Javascript. In V8 and most other Javascript engines, it is implemented using a pseudo-random number generator (PRNG). As with all PRNGs, the random number is derived from an internal state, which is altered by a fixed algorithm for every new random number. So for a given initial state, the sequence of random numbers is deterministic. Since the bit size n of the internal state is limited, the numbers that a PRNG generates will eventually repeat themselves. The upper bound for the period length of this permutation cycle is of course 2ⁿ.

There are many different PRNG algorithms; among the most well-known ones are Mersenne-Twister and LCG. Each has its particular characteristics, advantages, and drawbacks. Ideally, it would use as little memory as possible for the initial state, be quick to perform, have a large period length, and offer a high quality random distribution. While memory usage, performance, and period length can easily be measured or calculated, the quality is harder to determine. There is a lot of math behind statistical tests to check the quality of random numbers. The de-facto standard PRNG test suite, TestU01, implements many of these tests.

Until recently (up to version 4.9.40), V8’s choice of PRNG was MWC1616 (multiply with carry, combining two 16-bit parts). It uses 64 bits of internal state and looks roughly like this:

uint32_t state0 = 1;
uint32_t state1 = 2;
uint32_t mwc1616() {
  state0 = 18030 * (state0 & 0xffff) + (state0 >> 16);
  state1 = 30903 * (state1 & 0xffff) + (state1 >> 16);
  return state0 << 16 + (state1 & 0xffff);

The 32-bit value is then turned into a floating point number between 0 and 1 in agreement with the specification.

MWC1616 uses little memory and is pretty fast to compute, but unfortunately offers sub-par quality:

The number of random values it can generate is limited to 2³² as opposed to the 2⁵² numbers between 0 and 1 that double precision floating point can represent.
The more significant upper half of the result is almost entirely dependent on the value of state0. The period length would be at most 2³², but instead of few large permutation cycles, there are many short ones. With a badly chosen initial state, the cycle length could be less than 40 million.
It fails many statistical tests in the TestU01 suite.

This has been pointed out to us, and having understood the problem and after some research, we decided to reimplement Math.random based on an algorithm called xorshift128+. It uses 128 bits of internal state, has a period length of 2¹²⁸ - 1, and passes all tests from the TestU01 suite.

uint64_t state0 = 1;
uint64_t state1 = 2;
uint64_t xorshift128plus() {
  uint64_t s1 = state0;
  uint64_t s0 = state1;
  state0 = s0;
  s1 ^= s1 << 23;
  s1 ^= s1 >> 17;
  s1 ^= s0;
  s1 ^= s0 >> 26;
  state1 = s1;
  return state0 + state1;
}

The new implementation landed in V8 4.9.41.0 within a few days of us becoming aware of the issue. It will become available with Chrome 49. Both Firefox and Safari switched to xorshift128+ as well.

Make no mistake however: even though xorshift128+ is a huge improvement over MWC1616, it still is not cryptographically secure. For use cases such as hashing, signature generation, and encryption/decryption, ordinary PRNGs are unsuitable. The Web Cryptography API introduces window.crypto.getRandomValues, a method that returns cryptographically secure random values, at a performance cost.

Please keep in mind, if you find areas of improvement in V8 and Chrome, even ones that—like this one—do not directly affect spec compliance, stability, or security, please file an issue on our bug tracker.

Posted by Yang Guo, Software Engineer and dice designer

Wednesday, November 25, 2015

V8 Release 4.8

Roughly every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before Chrome branches for a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 4.8, which will be in beta until it is released in coordination with Chrome 48 Stable. V8 4.8 contains a handful of developer-facing features, so we’d like to give you a preview of some of the highlights in anticipation of the release in several weeks.

Improved ECMAScript 2015 (ES6) support

This release of V8 provides support for two well-known symbols, built-in symbols from the ES2015 spec that allow developers to leverage several low-level language constructs which were previously hidden.

@@isConcatSpreadable

The name for a Boolean valued property that if true indicates an object should be flattened to its array elements by Array.prototype.concat.

(function(){
"use strict";
class AutomaticallySpreadingArray extends Array {
   get [Symbol.isConcatSpreadable]() {
      return true;
   }
}
let first = [1];
let second = new AutomaticallySpreadingArray();
second[0] = 2;
second[1] = 3;
let all = first.concat(second);
// Outputs [1, 2, 3]
console.log(all);
}());

@@toPrimitive

The name for a method to invoke on an object for implicit conversions to primitive values.

(function(){
"use strict";
class V8 {
   [Symbol.toPrimitive](hint) {
      if (hint === 'string') {
         console.log('string');
         return 'V8';
      } else if (hint === 'number') {
         console.log('number');
         return 8;
      } else {
         console.log('default:' + hint);
         return 8;
      }
   }
}

let engine = new V8();
console.log(Number(engine));
console.log(String(engine));
}());

ToLength

The ES2015 spec adjusts the abstract operation for type conversion to convert an argument to an integer suitable for use as the length of an array-like object. (While not directly observable, this change might be indirectly visible when dealing with array-like objects with negative length.)

V8 API

Please check out our summary of API changes. This document gets regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use 'git checkout -b 4.8 -t branch-heads/4.8' to experiment with the new features in V8 4.8. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

Friday, October 30, 2015

Jank Busters Part One

Jank, or in other words visible stutters, can be noticed when Chrome fails to render a frame within 16.66ms (disrupting 60 frames per second motion). As of today most of the V8 garbage collection work is performed on the main rendering thread, c.f. Figure 1, often resulting in jank when too many objects need to be maintained. Eliminating jank has always been a high priority for the V8 team [1, 2, 3]. In this blog post we will discuss a few optimizations that were implemented between M41 and M46 which significantly reduce garbage collection pauses resulting in better user experience.

Figure 1: Garbage collection performed on the main thread.

A major source of jank during garbage collection is processing various bookkeeping data structures. Many of these data structures enable optimizations that are unrelated to garbage collection. Two examples are the list of all ArrayBuffers, and each ArrayBuffer’s list of views. These lists allow for an efficient implementation of the DetachArrayBuffer operation without imposing any performance hit on access to an ArrayBuffer view. In situations, however, where a web page creates millions of ArrayBuffers, (e.g., WebGL-based games), updating those lists during garbage collection causes significant jank. In M46, we removed these lists and instead detect detached buffers by inserting checks before every load and store to ArrayBuffers. This amortizes the cost of walking the big bookkeeping list during GC by spreading it throughout program execution resulting in less jank. Although the per-access checks can theoretically slow down the throughput of programs that heavily use ArrayBuffers, in practice V8's optimizing compiler can often remove redundant checks and hoist remaining checks out of loops, resulting in a much smoother execution profile with little or no overall performance penalty.

Another source of jank is the bookkeeping associated with tracking the lifetimes of objects shared between Chrome and V8. Although the Chrome and V8 memory heaps are distinct, they must be synchronized for certain objects, like DOM nodes, that are implemented in Chrome's C++ code but accessible from JavaScript. V8 creates an opaque data type called a handle that allows Chrome to manipulate a V8 heap object without knowing any of the details of the implementation. The object's lifetime is bound to the handle: as long as Chrome keeps the handle around, V8's garbage collector won't throw away the object. V8 creates an internal data structure called a global reference for each handle it passes back out to Chrome through the V8 API, and these global references are what tell V8's garbage collector that the object is still alive. For WebGL games, Chrome may create millions of such handles, and V8, in turn, needs to create the corresponding global references to manage their lifecycle. Processing these huge amounts of global references in the main garbage collection pause is observable as jank. Fortunately, objects communicated to WebGL are often just passed along and never actually modified, enabling simple static escape analysis. In essence, for WebGL functions that are known to usually take small arrays as parameters the underlying data is copied on the stack, making a global reference obsolete. The result of such a mixed approach is a reduction of pause time by up to 50% for rendering-heavy WebGL games.

Most of V8’s garbage collection is performed on the main rendering thread. Moving garbage collection operations to concurrent threads reduces the waiting time for the garbage collector and further reduces jank. This is an inherently complicated task since the main JavaScript application and the garbage collector may simultaneous observe and modify the same objects. Until now, concurrency was limited to sweeping the old generation of the regular object JS heap. Recently, we also implemented concurrent sweeping of the code and map space of the V8 heap. Additionally, we implemented concurrent unmapping of unused pages to reduce the work that has to be performed on the main thread, c.f. Figure 2.

Figure 2: Some garbage collection operations performed on the concurrent garbage collection threads.

The impact of the discussed optimizations is clearly visible in WebGL-based games, for example Turbolenz' Oort Online demo. The following video compares Chrome M41 to M46

We are currently in the process of making more garbage collection components incremental, concurrent, and parallel, to shrink garbage collection pause times on the main thread even further. Stay tuned as we have some interesting patches in the pipeline.

Posted by the jank busters: Jochen Eisinger, Michael Lippautz, and Hannes Payer

Wednesday, October 14, 2015

V8 Release 4.7

Roughly every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before Chrome branches for a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 4.7, which will be in beta until it is released in coordination with Chrome 47 Stable. V8 4.7 is filled will all sorts of developer-facing goodies, so we’d like to give you a preview of some of the highlights in anticipation of the release in several weeks.

Improved ECMAScript 2015 (ES6) support

Rest operator

The Rest operator enables the developer to pass an indefinite number of arguments to a function. It is similar to the arguments object.


//Without Rest operator
function concat() {
 var args = Array.prototype.slice.call(arguments, 1);
 return args.join("");
} 

//With Rest operator
function concatWithRest(...strings) {
 return strings.join("");
}

Support for upcoming ES features

Array.prototype.includes

The new Array.prototype.includes method is a new feature that is currently a stage 3 proposal for inclusion in ES2016. It provides a terse syntax for determining whether or not an element is in a given array by returning a boolean value.


[1, 2, 3].includes(3) // true
["apple", "banana", "cherry"].includes("apple") // true
["apple", "banana", "cherry"].includes("peach") // false

Ease the pressure on memory while parsing

Recent changes to the V8 parser greatly reduce the memory consumed by parsing files with large nested functions. In particular, this allows V8 to run larger asm.js modules than previously possible.

V8 API

Please check out our summary of API changes. This document gets regularly updated a few weeks after each major release. Developers with an active V8 checkout can use 'git checkout -b 4.7 -t branch-heads/4.7' to experiment with the new features in V8 4.7. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

Friday, September 25, 2015

Custom startup snapshots

The JavaScript specification includes a lot of built-in functionality, from math functions to a full-featured regular expression engine. Every newly-created V8 context has these functions available from the start. For this to work, the global object (for example, the window object in a browser) and all the built-in functionality must be set up and initialized into V8’s heap at the time the context is created. It takes quite some time to do this from scratch.

Fortunately, V8 uses a shortcut to speed things up: just like thawing a frozen pizza for a quick dinner, we deserialize a previously-prepared snapshot directly into the heap to get an initialized context. On a regular desktop computer, this can bring the time to create a context from 40 ms down to less than 2 ms. On an average mobile phone, this could mean a difference between 270 ms and 10 ms.

Applications other than Chrome that embed V8 may require more than vanilla Javascript. Many load additional library scripts at startup, before the “actual” application runs. For example, a simple TypeScript VM based on V8 would have to load the TypeScript compiler on startup in order to translate TypeScript source code into JavaScript on-the-fly.

As of the release of V8 4.3 two months ago, embedders can utilize snapshotting to skip over the startup time incurred by such an initialization.. The test case for this feature shows how this API works.

To create a snapshot, we can call v8::V8::CreateSnapshotDataBlob with the to-be-embedded script as a null-terminated C string. After creating a new context, this script is compiled and executed. In our example, we create two custom startup snapshots, each of which define functions on top of what JavaScript already has built in.

We can then use v8::Isolate::CreateParams to configure a newly-created isolate so that it initializes contexts from a custom startup snapshot. Contexts created in that isolate are exact copies of the one from which we took a snapshot. The functions defined in the snapshot are available without having to define them again.

There is an important limitation to this: the snapshot can only capture V8’s heap. Any interaction from V8 with the outside is off-limits when creating the snapshot. Such interactions include:

defining and calling API callbacks (i.e. functions created via v8::FunctionTemplate)
creating typed arrays, since the backing store may be allocated outside of V8

And of course, values derived from sources such as Math.random or Date.now are fixed once the snapshot has been captured. They are no longer really random nor reflect the current time.

Limitations aside, startup snapshots remain a great way to save time on initialization. We can shave off 100ms from the startup spent on loading the TypeScript compiler in our example above (on a regular desktop computer). We're looking forward to seeing how you might put custom snapshots to use!

Posted by Yang Guo, Software Engineer and engine pre-heater supplier

Friday, August 28, 2015

V8 Release 4.6

Roughly every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before Chrome branches for a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 4.6, which will be in beta until it is released in coordination with Chrome 46 Stable. V8 4.6 is filled will all sorts of developer-facing goodies, so we’d like to give you a preview of some of the highlights in anticipation of the release in several weeks.

Improved ECMAScript 2015 (ES6) support

V8 4.6 adds support for several ECMAScript 2015 (ES6) features.

Spread operator

The spread operator makes it much more convenient to work with arrays. For example it makes imperative code obsolete when you simply want to merge arrays.


// Merging arrays
// Code without spread operator
let inner = [3, 4];
let merged = [0, 1, 2].concat(inner, [5]);

// Code with spread operator
let inner = [3, 4];
let merged = [0, 1, 2, ...inner, 5];

Another good use of the spread operator to replace apply().


// Function parameters stored in an array
// Code without spread operator
function myFunction(a, b, c) {
   console.log(a);
   console.log(b);
   console.log(c);
}
let argsInArray = ["Hi ", "Spread ", "operator!"];
myFunction.apply(null, argsInArray);

// Code with spread operator
function myFunction (a,b,c) {
   console.log(a);
   console.log(b);
   console.log(c);
}

let argsInArray = ["Hi ", "Spread ", "operator!"];
myFunction(...argsInArray);

new.target

new.target is one of ES6's features designed to improve working with classes. Under the hood it’s actually an implicit parameter to every function. If a function is called with the keyword new, then the parameter holds a reference to the called function. If new is not used the parameter is undefined.

In practice, this means that you can use new.target to figure out whether a function was called normally or constructor-called via the new keyword.


function myFunction() {
   if (new.target === undefined) {
      throw "Try out calling it with new.";
   }
   console.log("Works;");
}

// Will break
myFunction();

// Will work
let a = new myFunction();

When ES6 classes and inheritance are used, new.target inside the constructor of a super-class is bound to the derived constructor that was invoked with new. In particular, this gives super-classes access to the prototype of the derived class during construction.

Reduce the jank

Jank can be a pain, especially when playing a game. Often, it's even worse when the game features multiple players. oortonline.gl is a WebGL benchmark that tests the limits of current browsers by rendering a complex 3D scene with particle effects and modern shader rendering. The V8 team set off in a quest to push the limits of Chrome’s performance in these environments. We’re not done yet, but the fruits of our efforts are already paying off. Chrome 46 shows incredible advances in oortonline.gl performance which you can see yourself below.

Some of the optimizations include:

TypedArray performance improvements

TypedArrays are heavily used in rendering engines such as Turbulenz (the engine behind oortonline.gl). For example, engines often create typed arrays (such as Float32Array) in JavaScript and pass them to WebGL after applying transformations.
The key point was optimizing the interaction between the embedder (Blink) and V8.

Performance improvements when passing TypedArrays and other memory from V8 to Blink

There’s no need to create additional handles (that are also tracked by V8) for typed arrays when they are passed to WebGL as part of a one-way communication.
On hitting external (Blink) allocated memory limits we now start an incremental garbage collection instead of a full one.

Idle garbage collection scheduling

Garbage collection operations are scheduled during idle times on the main thread which unblocks the compositor and results in smoother rendering.

Concurrent sweeping enabled for the whole old generation of the garbage collected heap

Freeing of unused memory chunks is performed on additional threads concurrent to the main thread which significantly reduces the main garbage collection pause time.

The good thing is that all changes related to oortonline.gl are general improvements that potentially affect all users of applications that make heavy use of WebGL.

V8 API

Please check out our summary of API changes. This document gets regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use 'git checkout -b 4.6 -t branch-heads/4.6' to experiment with the new features in V8 4.6. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

Friday, August 7, 2015

Getting Garbage Collection for Free

JavaScript performance continues to be one of the key aspects of Chrome's values, especially when it comes to enabling a smooth experience. Starting in Chrome 41, V8 takes advantage of a new technique to increase the responsiveness of web applications by hiding expensive memory management operations inside of small, otherwise unused chunks of idle time. As a result, web developers should expect smoother scrolling and buttery animations with much reduced jank due to garbage collection.

Many modern language engines such as Chrome's V8 JavaScript engine dynamically manage memory for running applications so that developers don't need to worry about it themselves. The engine periodically passes over the memory allocated to the application, determines which data is no longer needed, and clears it out to free up room. This process is known as garbage collection.

In Chrome, we strive to deliver a smooth, 60 frames per second (FPS) visual experience. Although V8 already attempts to perform garbage collection in small chunks, larger garbage collection operations can and do occur at unpredictable times -- sometimes in the middle of an animation -- pausing execution and preventing Chrome from hitting that 60 FPS goal.

Chrome 41 included a task scheduler for the Blink rendering engine which enables prioritization of latency-sensitive tasks to ensure Chrome remains responsive and snappy. As well as being able to prioritize work, this task scheduler has centralized knowledge of how busy the system is, what tasks need to be performed and how urgent each of these tasks are. As such, it can estimate when Chrome is likely to be idle and roughly how long it expects to remain idle.

An example of this occurs when Chrome is showing an animation on a web page. The animation will update the screen at 60 FPS, giving Chrome around 16.6 ms of time to perform the update. As such, Chrome will start work on the current frame as soon as the previous frame has been displayed, performing input, animation and frame rendering tasks for this new frame. If Chrome completes all this work in less than 16.6 ms, then it has nothing else to do for the remaining time until it needs to start rendering the next frame. Chrome’s scheduler enables V8 to take advantage of this idle time period by scheduling special idle tasks when Chrome would otherwise be idle.

Figure 1: Frame rendering with idle tasks.

Idle tasks are special low-priority tasks which are run when the scheduler determines it is in an idle period. Idle tasks are given a deadline which is the scheduler's estimate of how long it expects to remain idle. In the animation example in Figure 1, this would be the time at which the next frame should start being drawn. In other situations (e.g., when no on-screen activity is happening) this could be the time when the next pending task is scheduled to be run, with an upper bound of 50 ms to ensure that Chrome remains responsive to unexpected user input. The deadline is used by the idle task to estimate how much work it can do without causing jank or delays in input response.

Garbage collection done in the idle tasks are hidden from critical, latency-sensitive operations. This means that these garbage collection tasks are done for “free”. In order to understand how V8 does this, it is worth reviewing V8’s current garbage collection strategy.

Deep dive into V8’s garbage collection engine

V8 uses a generational garbage collector with the Javascript heap split into a small young generation for newly allocated objects and a large old generation for long living objects. Since most objects die young, this generational strategy enables the garbage collector to perform regular, short garbage collections in the smaller young generation (known as scavenges), without having to trace objects in the old generation.

The young generation uses a semi-space allocation strategy, where new objects are initially allocated in the young generation’s active semi-space. Once that semi-space becomes full, a scavenge operation will move live objects to the other semi-space. Objects which have been moved once already are promoted to the old generation and are considered to be long-living. Once the live objects have been moved, the new semi-space becomes active and any remaining dead objects in the old semi-space are discarded.

The duration of a young generation scavenge therefore depends on the size of live objects in the young generation. A scavenge will be fast (<1 ms) when most of the objects become unreachable in the young generation. However, if most objects survive a scavenge, the duration of the scavenge may be significantly longer.

A major collection of the whole heap is performed when the size of live objects in the old generation grows beyond a heuristically-derived limit. The old generation uses a mark-and-sweep collector with several optimizations to improve latency and memory consumption. Marking latency depends on the number of live objects that have to be marked, with marking of the whole heap potentially taking more than 100 ms for large web applications. In order to avoid pausing the main thread for such long periods, V8 has long had the ability to incrementally mark live objects in many small steps, with the aim to keep each marking steps below 5 ms in duration.

After marking, the free memory is made available again for the application by sweeping the whole old generation memory. This task is performed concurrently by dedicated sweeper threads. Finally, memory compaction is performed to reduce memory fragmentation in the old generation. This task may be very time-consuming and is only performed if memory fragmentation is an issue.

In summary, there are four main garbage collection tasks:

Young generation scavenges, which usually are fast
Marking steps performed by the incremental marker, which can be arbitrarily long depending on the step size
Full garbage collections, which may take a long time
Full garbage collections with aggressive memory compaction, which may take a long time, but clean up fragmented memory

In order to perform these operations in idle periods, V8 posts garbage collection idle tasks to the scheduler. When these idle tasks are run they are provided with a deadline by which they should complete. V8’s garbage collection idle time handler evaluates which garbage collection tasks should be performed in order to reduce memory consumption, while respecting the deadline to avoid future jank in frame rendering or input latency.

The garbage collector will perform a young generation scavenge during an idle task if the application’s measured allocation rate shows that the young generation may be full before the next expected idle period. Additionally, it calculates the average time taken by recent scavenge tasks in order to predict the duration of future scavenges and ensure that it doesn’t violate idle task deadlines.

When the size of live objects in the old generation is close to the heap limit, incremental marking is started. Incremental marking steps can be linearly scaled by the number of bytes that should be marked. Based on the average measured marking speed, the garbage collection idle time handler tries to fit as much marking work as possible into a given idle task.

A full garbage collection is scheduled during an idle tasks if the old generation is almost full and if the deadline provided to the task is estimated to be long enough to complete the collection. The collection pause time is predicted based on the marking speed multiplied by the number of allocated objects. Full garbage collections with additional compaction are only performed if the webpage has been idle for a significant amount of time.

Performance evaluation

In order to evaluate the impact of running garbage collection during idle time, we used Chrome’s Telemetry performance benchmarking framework to evaluate how smoothly popular websites scroll while they load. We benchmarked the top 25 sites on a Linux workstation as well as typical mobile sites on an Android Nexus 6 smartphone, both of which open popular webpages (including complex webapps such as Gmail, Google Docs and YouTube) and scroll their content for a few seconds. Chrome aims to keep scrolling at 60 FPS for a smooth user experience.

Figure 2 shows the percentage of garbage collection that was scheduled during idle time. The workstation’s faster hardware results in more overall idle time compared to the Nexus 6, thereby enabling a greater percentage of garbage collection to be scheduled during this idle time (43% compared to 31% on the Nexus 6) resulting in about 7% improvement on our jank metric.

Figure 2: The percentage of garbage collection that occurs during idle time.

As well as improving the smoothness of page rendering, these idle periods also provide an opportunity to perform more aggressive garbage collection when the page becomes fully idle. Recent improvements in Chrome 45 take advantage of this to drastically reduce the amount of memory consumed by idle foreground tabs. Figure 3 shows a sneak peek at how memory usage of Gmail’s JavaScript heap can be reduced by about 45% when it becomes idle, compared to the same page in Chrome 43.

Figure 3: Memory usage for latest version of Chrome 45 (left) vs Chrome 43 on Gmail.

These improvements demonstrate that it is possible to hide garbage collection pauses by being smarter about when expensive garbage collection operations are performed. Web developers no longer have to fear the garbage collection pause, even when targeting silky smooth 60 FPS animations. Stay tuned for more improvements as we push the bounds of garbage collection scheduling.

Posted by Hannes Payer and Ross McIlroy, Idle Garbage Collectors

Monday, July 27, 2015

Code caching

V8 uses just-in-time compilation (JIT) to execute Javascript code. This means that immediately prior to running a script, it has to be parsed and compiled - which can cause considerable overhead. As we announced recently, code caching is a technique that lessens this overhead. When a script is compiled for the first time, cache data is produced and stored. The next time V8 needs to compile the same script, even in a different V8 instance, it can use the cache data to recreate the compilation result instead of compiling from scratch. As a result the script is executed much sooner.

Code caching has been available since V8 version 4.2 and not limited to Chrome alone. It is exposed through V8’s API, so that every V8 embedder can take advantage of it. The test case used to exercise this feature serves as an example of how to use this API.

When a script is compiled by V8, cache data can be produced to speed up later compilations by passing v8::ScriptCompiler::kProduceCodeCache as an option. If the compilation succeeds, the cache data is attached to the source object and can be retrieved via v8::ScriptCompiler::Source::GetCachedData. It can then be persisted for later, for example by writing it to disk.

During later compilations, the previously produced cache data can be attached to the source object and passed v8::ScriptCompiler::kConsumeCodeCache as an option. This time, code will be produced much faster, as V8 bypasses compiling the code and deserializes it from the provided cache data.

Producing cache data comes at a certain computational and memory cost. For this reason, Chrome only produces cache data if the same script is seen at least twice within a couple of days. This way Chrome is able to turn script files into executable code twice as fast on average, saving users valuable time on each subsequent page load.

Posted by Yang Guo, Software Engineer

Friday, July 17, 2015

V8 4.5 release

Roughly every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before Chrome branches for a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 4.5, which will be in beta until it is released in coordination with Chrome 45 Stable. V8 4.5 is filled will all sorts of developer-facing goodies, so we’d like to give you a preview of some of the highlights in anticipation of the release in several weeks.

Improved ECMAScript 2015 (ES6) support

V8 4.5 adds support for several ECMAScript 2015 (ES6) features.

Arrow Functions

With the help of Arrow Functions it is possible to write more streamlined code.

let data = [0, 1, 3];
// Code without Arrow Functions
let convertedData = data.map(function(value) { return value*2;});
// Code with Arrow Functions
let convertedData = data.map(value => value*2);
console.log(convertedData);

The lexical binding of 'this' is another major benefit of arrow functions. As a result, using callbacks in methods gets much easier.

class MyClass {
  constructor() { this.a = "Hello, "; }
  hello() { setInterval(() => console.log(this.a + "World!"), 1000); }
}
let myInstance = new MyClass();
myInstance.hello();

Array/TypedArray functions

All of the new methods on Arrays and TypedArrays that are specified in ES2015 are now supported in 4.5. They make working with Arrays and TypedArrays more convenient. Among the methods added are Array.from and Array.of. Methods which mirror most Array methods on each kind of TypedArray were added as well.

Object.assign

Object.assign enables developers to quickly merge and clone objects.

let target = {a: "Hello, "};
let source = {b: "world!"};
// Objects are merged
Object.assign(target, source);
console.log(target.a + target.b);

This feature can also be used to mix in functionality.

More JavaScript language features are “optimizable”

For many years, V8’s traditional optimizing compiler, Crankshaft, has done a great job of optimizing many common JavaScript patterns. However, it never had the capability to support the entire JavaScript language, and using certain language features in a function—such as “try/catch” and “with”—would prevent it from being optimized. V8 would have to fall back to its slower, baseline compiler for that function.

One of the design goals of V8’s new optimizing compiler, TurboFan, is to be able to eventually optimize all of JavaScript, including ECMAScript 2015 features. In V8 4.5, we’ve started using TurboFan to optimize some of the language features that are not supported by Crankshaft: ‘for-of’, ‘class’, ‘with’ and computed property names.

Here is an example of code that uses 'for-of', which can now be compiled by TurboFan:

let sequence = ["First", "Second", "Third"];
for (let value of sequence) {
  // This scope is now optimizable
  let object = {a: "Hello, ", b: "world!", c: value};
  console.log(object.a + object.b + object.c);
}

Although initially functions that use these language features won't reach the same peak performance as other code compiled by Crankshaft, TurboFan can now speed them up well beyond our current baseline compiler. Even better, performance will continue to improve quickly as we develop more optimizations for TurboFan.

V8 API

Please check out our summary of API changes. This document gets regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use 'git checkout -b 4.5 -t branch-heads/4.5' to experiment with the new features in V8 4.5. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

Monday, July 13, 2015

Digging into the TurboFan JIT

Last week we announced that we've turned on TurboFan for certain types of JavaScript. In this post we wanted to dig deeper into the design of TurboFan.

Performance has always been at the core of V8’s strategy. TurboFan combines a cutting-edge intermediate representation with a multi-layered translation and optimization pipeline to generate better quality machine code than what was previously possible with the CrankShaft JIT. Optimizations in TurboFan are more numerous, more sophisticated, and more thoroughly applied than in CrankShaft, enabling fluid code motion, control flow optimizations, and precise numerical range analysis, all of which were more previously unattainable.

A layered architecture

Compilers tend to become complex over time as new language features are supported, new optimizations are added, and new computer architectures are targeted. With TurboFan, we've taken lessons from many compilers and developed a layered architecture to allow the compiler to cope with these demands over time. A clearer separation between the source-level language (JavaScript), the VM's capabilities (V8), and the architecture's intricacies (from x86 to ARM to MIPS) allows for cleaner and more robust code. Layering allows those working on the compiler to reason locally when implementing optimizations and features, as well as write more effective unit tests. It also saves code. Each of the 7 target architectures supported by TurboFan requires fewer than 3,000 lines of platform-specific code, versus 13,000-16,000 in CrankShaft. This enabled engineers at ARM, Intel, MIPS, and IBM to contribute to TurboFan in a much more effective way. TurboFan is able to more easily support all of the coming features of ES6 because its flexible design separates the JavaScript frontend from the architecture-dependent backends.

More sophisticated optimizations

The TurboFan JIT implements more aggressive optimizations than CrankShaft through a number of advanced techniques. JavaScript enters the compiler pipeline in a mostly unoptimized form and is translated and optimized to progressively lower forms until machine code is generated. The centerpiece of the design is a more relaxed sea-of-nodes internal representation (IR) of the code which allows more effective reordering and optimization.

Example TurboFan graph

Numerical range analysis helps TurboFan understand number-crunching code much better. The graph-based IR allows most optimizations to be expressed as simple local reductions which are easier to write and test independently. An optimization engine applies these local rules in a systematic and thorough way. Transitioning out of the graphical representation involves an innovative scheduling algorithm that makes use of the reordering freedom to move code out of loops and into less frequently executed paths. Finally, architecture-specific optimizations like complex instruction selection exploit features of each target platform for the best quality code.

Delivering a new level of performance

We're already seeing some great speedups with TurboFan, but there's still a ton of work to do. Stay tuned as we enable more optimizations and turn TurboFan on for more types of code!

Posted by Ben L. Titzer, Software Engineer and TurboFan Mechanic

Friday, July 10, 2015

Hello, World!

Welcome to the V8 Developer blog!

Historically, the V8 team has published blog posts on the Chromium blog that appeal to the wider web development audience. In order to deliver more in-depth posts especially targeted to the hardcore coder and virtual machine enthusiasts out there, we've spun off this V8-specific blog. Big-tickets items will continue to be posted to the Chromium blog, but we hope you'll stay tuned here for even more posts about advanced compiler techniques, low-level performance optimizations, and all the technical bits that make V8 tick.

Happy reading,

the V8 team