import sys import re import struct import IPython import copy class AssemblerException(Exception): pass class InvalidRegister(AssemblerException): def __init__(self, register): super().__init__("Invalid register: {}".format(register)) class InvalidOperation(AssemblerException): def __init__(self, operation): super().__init__("Invalid operation: {}".format(operation)) class ExpectedImmediate(AssemblerException): def __init__(self, value): super().__init__("Expected immediate, got {}".format(value)) class ExpectedRegister(AssemblerException): def __init__(self, value): super().__init__("Expected register, got {}".format(value)) class IPOverwrite(AssemblerException): def __init__(self, instruction=None): if instruction: super().__init__("IP can't be overwritten. Instruction: {}".format(instruction)) else: super().__init__("IP can't be overwritten.") class InvalidValue(AssemblerException): def __init__(self, instruction): super().__init__("Invalid value while assembling: {}".format(instruction)) class VMAssembler: def __init__(self, key, data): self.data = data self.assembled_code = bytearray() self.functions = [] self.decrypt_ops(key) self.parse_functions() print(self.functions) main = next((x for x in self.functions if x.name == "main"), None) if main == None: print("Main has to be defined") return def parse_functions(self): cur_fun_size = 0 cur_fun_name = None fun_start = 0 # first parse to get every function name for i, line in enumerate(self.data): match = function_re.match(line) if match: if cur_fun_name: f = VMFunction(cur_fun_name, self.data[fun_start:i]) self.functions.append(f) cur_fun_name = match.group(1) fun_start = i + 1 f = VMFunction(cur_fun_name, self.data[fun_start:i + 1]) self.functions.append(f) # putting main in first position in order to assemble it first for i, f in enumerate(self.functions): if f.name == "main" and i is not 0: self.functions[0], self.functions[i] = self.functions[i], self.functions[0] break # calculating functions offsets for i in range(1, len(self.functions)): prev_fun_tot_size = self.functions[i-1].size + self.functions[i-1].offset cur_fun_size = self.functions[i].size self.functions[i].set_offset(prev_fun_tot_size) return def parse(self): for f in self.functions: for i in f.instructions: action = getattr(self, "{}".format(i.opcode.method)) action(i) def imm2reg(self, instruction): """ Intel syntax -> REG, IMM """ opcode = instruction.opcode reg = instruction.args[0] imm = instruction.args[1] if reg.name == "ip": raise IPOverwrite(instruction) if not imm.isimm(): raise ExpectedImmediate(imm) if not reg.isreg(): raise ExpectedRegister(reg) if not opcode.uint8() or not reg.uint8() or not imm.uint16(): raise InvalidValue(instruction) self.assembled_code += opcode.uint8() + reg.uint8() + imm.uint16() return def reg2reg(self, instruction): """ Intel syntax -> DST_REG, SRC_REG """ opcode = instruction.opcode dst_reg = instruction.args[0] src_reg = instruction.args[1] if dst_reg.name == "ip" or src_reg.name == "ip": raise IPOverwrite(instruction) if not dst_reg.isreg(): raise ExpectedRegister(dst_reg) if not src_reg.isreg(): raise ExpectedRegister(src_reg) if not opcode.uint8() or not dst_reg.uint8() or not src_reg.uint8(): raise InvalidValue(instruction) byte_with_nibbles = struct.pack(" IMM, REG """ opcode = instruction.opcode imm = instruction.args[0] reg = instruction.args[1] if reg.name == "ip": raise IPOverwrite(instruction) if not imm.isimm(): raise ExpectedImmediate(imm) if not reg.isreg(): raise ExpectedRegister(reg) if not opcode.uint8() or not reg.uint8() or not imm.uint16(): raise InvalidValue(instruction) self.assembled_code += opcode.uint8() + imm.uint16() + reg.uint8() return def byt2reg(self, instruction): """ Intel syntax -> REG, [BYTE]IMM """ opcode = instruction.opcode reg = instruction.args[0] imm = instruction.args[1] if reg.name == "ip": raise IPOverwrite(instruction) if not imm.isimm(): raise ExpectedImmediate(imm) if not reg.isreg(): raise ExpectedRegister(reg) if not opcode.uint8() or not reg.uint8() or not imm.uint8(): raise InvalidValue(instruction) self.assembled_code += opcode.uint8() + reg.uint8() + imm.uint8() return def regonly(self, instruction): """ Instruction with only an argument: a register """ opcode = instruction.opcode reg = instruction.args[0] if reg.name == "ip": raise IPOverwrite(instruction) if not reg.isreg(): raise ExpectedRegister(reg) if not opcode.uint8() or not reg.uint8(): raise InvalidValue(instruction) self.assembled_code += opcode.uint8() + reg.uint8() return def immonly(self, instruction): """ Instruction with only an argument: an immediate """ opcode = instruction.opcode imm = instruction.args[0] if not imm.isimm(): raise ExpectedImmediate(imm) if not opcode.uint8() or not imm.uint16(): raise InvalidValue(instruction) self.assembled_code += opcode.uint8() + imm.uint16() return def jump(self, instruction): imm_op_re = re.compile(".*[iI]$") reg_op_re = re.compile(".*[rR]$") symcall = symcall_re.match(str(instruction)) dst = instruction.args[0] # let's check if the jump is to a label or a function if symcall: # the symbal has not been resolved if dst.name == dst.value: # check whether it is a function val = next((x.offset for x in self.functions if x.name == dst.name), None) # check whether it is a label if val == None: for f in self.functions: for i in f.instructions: if i.label == dst.name: val = f.offset_of_label(dst) + f.offset if val == None: raise AssemblerException() # resolving the symbol instruction.args[0].set_value(val) # define the kind of jump: to immediate or to register if imm_op_re.match(instruction.opcode.name): self.immonly(instruction) elif reg_op_re.match(instruction.opcode.name): self.regonly(instruction) else: raise AssemblerException() def single(self, instruction): """ Instruction with no arguments """ opcode = instruction.opcode self.assembled_code += opcode.uint8() return def decrypt_ops(self, key): key_ba = bytearray(key, 'utf-8') olds = copy.deepcopy(ops) # RC4 KSA! :-P arr = [i for i in range(256)] j = 0 for i in range(len(arr)): j = (j + arr[i] + key_ba[i % len(key)]) % len(arr) arr[i], arr[j] = arr[j], arr[i] for i, o in enumerate(ops): o.set_value(arr[i]) for o, n in zip(olds, ops): print("{} : {}->{}".format(o.name, hex(o.value), hex(n.value))) class VMFunction: def __init__(self, name, code): self.name = name self.size = 0 self.offset = 0 self.instructions = [] # populating instructions i = 0 while i < len(code): line = code[i] ins = instruction_re.match(line) label = label_re.match(line) if label: label_name = label.group(1) self.instructions.append(VMInstruction(code[i+1], label_name)) i += 2 elif ins: self.instructions.append(VMInstruction(line)) i+=1 self.calc_size() def calc_size(self): for i in self.instructions: self.size += i.size def set_offset(self, offset): self.offset = offset def offset_of_label(self, label): offset = 0 for i in self.instructions: offset += i.size if i.label == label: break return offset def __repr__(self): return "{}: size {}, offset {}".format(self.name, hex(self.size), hex(self.offset)) class VMInstruction: """ Represents an instruction the VM recognizes. e.g: MOVI [R0, 2] ^ ^ opcode args """ def __init__(self, line, label = None): self.opcode = None self.args = [] self.size = 1 self.label = label ins = instruction_re.match(line) symcall = symcall_re.match(line) opcode = ins.group(1) self.opcode = next((x for x in ops if x.name == opcode), None) if self.opcode == None: raise InvalidOperation(opcode) args = [x for x in ins.groups()[1:] if x is not None] for a in args: if immediate_re.match(a) or symcall: # directly append the immediate self.args.append(VMComponent(a, a)) self.size += 2 continue elif register_re.match(a): # create a VM component for a register reg = next((x for x in regs if x.name == a), None) if reg == None: raise InvalidRegister(a) self.args.append(reg) self.size += 1 continue def __repr__(self): return "{} {}".format(self.opcode.name, ", ".join([x.name for x in self.args])) class VMComponent: """ Represents a register, operation or an immediate the VM recognizes """ def __init__(self, name, value, method = None): self.name = name.casefold() self.value = value self.method = method def __repr__(self): return "{}".format(self.name) def set_name(self, name): self.name = name def set_value(self, value): self.value = value def uint8(self): numre = re.compile("^[0-9]+$") if isinstance(self.value, int): return struct.pack("