Adding a New Architecture (Pcode Backend)¶
One of the key features of Styx is its ability to add new processor architectures.
Styx has two CPU backends, the Unicorn backend powered by the Unicorn Engine and the Pcode backend powered by emulating Ghidra Pcode. The Unicorn backend benefits from the maturity of the Unicorn engine, including superior performance. The Pcode backend is trivial to extend with new architecture, provided it’s supported by Ghidra.
Steps For Experienced Users¶
Here is a checklist to perform in order to add a new architecture to the pcode backend. It doesn’t include all necessary details so be sure to read the rest of this page if this is your first architecture port.
Add SLEIGH spec to Styx, see SLEIGH Specification
Put in
styx-sla
custom folder or find in ghidra collectionAdd feature flag
arch_<arch name>
to Cargo.toml, including add to the default feature flags (more info in the features rust docs)Add
ArchFeature
for your arch instyx-sla
’s build.rsAdd SLEIGH spec path to
build.rs
under correctArchFeature
guardPerform a build of
styx-sla
to check correctness up to this pointAdd module for arch in
lib.rs
withSlaRegisters
implAnother build of
styx-sla
to finish
Add the
arch_<arch name>
feature flag to theCargo.toml
ofstyx-pcode-translator
andstyx-cpu-pcode-backend
cratesGuard arch specific code, e.g. for ARM:
#[cfg(feature = "arch_arm")]
Use xtask feature-add to automate
Add the arch spec to the pcode backend, see Architecture Specification
Create a new module with your arch name, feature gated by the arch feature flag
Create a “StandardPcManager” for your arch and add it to the PcManagerenum_dispatch enum, see PC Manager
Add your arch to the
build_arch_spec()
matchAdd CallOtherHandlers and RegisterHandlers as encountered
Styx Architecture Description¶
The Architecture description is common for all Styx backends and is how Styx users will interface with the ISA.
SLEIGH Specification¶
To add a custom architecture, you will need a processor specification. The processor specification file provides the definitions to generate Pcodes from the architecture’s machine code.
More about SLEIGH
The SLEIGH processor specification language was developed for the GHIDRA project to define the translation between machine and assembly instructions and aid data-flow and decompilation analysis.
Processor specifications are written in the SLEIGH language (filename
.slaspec
) and compiled into sla files (.sla
). A SLEIGH compiler is
included in GHIDRA as a part of libsleigh
.
More information can be found in the GHIDRA’s SLEIGH documentation.
The GHIDRA project has SLEIGH specifications for many common processors
that can be used as-is for pcode emulation or modified. These are included in
styx
for convenience under the styx-sla
crate in
processors/ghidra/
. Alternatively, custom SLEIGH specifications can be
added under processors/custom/
. SLEIGH specifications in these locations
will be combined to .sla
files automatically during build.
Next, add your new architecture to
incubation/styx-pcode-sleigh-backend/src/sla.rs
. This will include an
implementation of SlaRegisters
. SlaRegisters
defines the translation
from Styx’s register nomenclature to the SLEIGH spec’s nomenclature. Usually, Styx
register names are uppercase while SLEIGH spec registers are lowercase. There may
also be some registers with exclusive mappings.
Registers are defined in
styx/core/styx-cpu-type/src/arch/[ARCH]/registers.rs
. Your arch’s register
types probably don’t exist yet, in which case you’ll have to add them. See
Adding an Architecture learn how to add the ISA types to Styx.
Architecture Specification¶
The Arch Spec is a behavior specification specific to the Pcode backend. Pcode emulation doesn’t have all the information needed to properly emulate the target; the Arch Spec fills in these gaps.
There are four parts to the Arch Spec:
CallOther handlers - Execute Pcode “userops”
Register handlers - Custom logic for complex registers
PcManager - Define program counter semantics
GeneratorHelper - Pre-instruction fetch hook
CallOther Handlers¶
Pcode has a special USERDEFINED opcode
for defining instructions that are not implemented in the SLEIGH spec. In
mature SLEIGH specs, these are instructions with side effects beyond changing
memory and register values. For example, the ARM SLEIGH spec has a
SoftwareInterruptCallOther
. In other SLEIGH specs, there may be more
CallOthers for complex instructions that are hard to implement in Pcode.
USERDEFINED opcodes take the form of define pcodeop <name>
in the SLEIGH spec.
The best way to implement userdefined opcodes correctly is to look through the SLEIGH spec to find in what instructions they are used, what arguments are passed to them, and if their output varnode is used. This should be done while cross-referencing the processor manual.
#[derive(Debug, Default)]
pub struct SoftwareInterruptCallOther;
impl CallOtherCallback for SoftwareInterruptCallOther {
fn handle(
&self,
backend: &PcodeBackend,
inputs: &[VarnodeData],
_output: Option<&VarnodeData>,
) -> Result<PCodeStateChange, CallOtherHandleError> {
let input_value = backend.read(&inputs[0]).unwrap();
let interrupt_number = input_value.to_u64().unwrap();
let interrupt_number: i32 = interrupt_number.try_into().unwrap();
trace!("Interrupt no: {interrupt_number}");
assert_eq!(interrupt_number, 0);
Ok(PCodeStateChange::DelayedInterrupt(SVC_IRQN))
}
}
Register Handlers¶
Register handlers are used to hook register reads and writes at the CPU backend level. This is used to implement additional logic beyond the SLEIGH spec.
Register handlers implement a read(..)
and write(..)
function that
define behavior when a Styx user tries to interface with the register. When a
register is read from and written to in the emulated core the underlying
Register Space is queried. The Register Space is just a memory store with no
behavior unless the pcode generated has behavior.
In contrast if the register is read from or written to over the register API
(cpu.read_register(..)
, cpu.write_register(..)
, then the
RegisterManager first checks if a Register Handler is associated with the
queried register. If so, the Register Handler is called and the value read from
the handler is used. If no handler is associated with the register then the
DefaultRegisterHandler
is used. The DefaultRegisterHandler
reads the
value in the Register Pcode space. This is the “correct” behavior for trivial
value registers.
An example of a register that needs a Register Handler is Armv7-M’s XPSR
handler. XPSR
is combination of the APSR
, IPSR
, and EPSR
registers.
#[derive(Debug, Default)]
pub struct XpsrHandler;
impl RegisterCallback for XpsrHandler {
fn read(
&self,
register: ArchRegister,
backend: &PcodeBackend,
) -> Result<SizedValue, RegisterHandleError> {
let apsr = backend.read_register::<u32>(ArmRegister::Apsr).unwrap();
let ipsr = backend.read_register::<u32>(ArmRegister::Ipsr).unwrap();
let epsr = backend.read_register::<u32>(ArmRegister::Epsr).unwrap();
let xpsr = apsr | ipsr | epsr;
Ok(SizedValue::from_u64(xpsr as u64, 4))
}
fn write(
&self,
register: ArchRegister,
value: SizedValue,
backend: &PcodeBackend,
) -> Result<(), RegisterHandleError> {
let xpsr = value.to_u64().unwrap() as u32;
backend.write_register(ArmRegister::Apsr, xpsr).unwrap();
backend.write_register(ArmRegister::Ipsr, xpsr).unwrap();
backend.write_register(ArmRegister::Epsr, xpsr).unwrap();
Ok(())
}
}
Warning
Pcode emulation does not use the Register Handlers.
If the register is used in generated pcode then that value comes from the
Pcode Register Space. To solve this make sure to keep register space in
sync with the value written to the Register Handler. An example of this is
in the DefaultRegisterHandler
.
PC Manager¶
The PC Manager is used to define the Program Counter of the processor. To properly abstract the ISA from the Pcode backend, two PC definitions are used:
pub trait ArchPcManager {
/// Value of Program Counter as defined by the Instruction Set Architecture.
///
/// This is the pc that is read inside machine instructions like `mov r0, pc`. This is also the
/// pc that is returned from [CpuEngine::pc()](styx_cpu_engine_trait::CpuEngine::pc()).
fn isa_pc(&self) -> u64;
/// Value of Program Counter for internal backend use. Used to track the next instruction to
/// translate and execute.
///
/// This pc must hold the following: before execution the PC points to the next instruction,
/// during fetch and execution this is set to the current instruction. After execution the PC is
/// set to the next instruction to be executed.
///
/// This pc is to track the next instruction to translate and execute.
fn internal_pc(&self) -> u64 {
self.isa_pc()
}
...
}
Blackfin implements a StandardPcManager
, which may be stabilized to be used
for any architecture and may be the PC manager correct for your implementation.
The main exception and justification for the PC manager’s existence is ARM’s
unique PC, which is two instructions ahead of the current executed instruction.
The PcManager could also be used to help implement instruction packets for
architectures that use them (i.e. hexagon, itanium, and tms320).
The PcManager has several hooks that are called during execution to allow the PcManager to have the correct state.
Generator Helper¶
The Generator Helper is an optional part of the arch spec that provides a prefetch hook to assist Pcode generation. E.g. this is needed in ARM Pcode generation, as thumb mode must be tracked during emulation, and cannot be known statically. The Generator Helper prefetch allows the architecture implementer to read the system state and apply context options as needed.