2 D HIGH RESOLUTION MARINE SEISMIC DATA PROCESSING BY USING SEISMIC UNIX: PART-1: SEG-Y INPUT ~ TotalCorner

INTRODUCTION

Seismic data processing converts seismic data recordings into meaningful seismic sections that reveal and help delineate the earth subsurface stratigraphy and structure that may bear fossil hydrocarbons. Three main objectives of seismic data processing: correcting static and location, increasing signal/noise ratio, and improving resolution. Those can be implemented by various techniques of processing stages which are beyond the scope of mathematical, geological, and physics concepts.

There are many processing softwares, the most usual and commercial software such as ProMAX, Omega, Vista, Focus, etc. Apart of them, there are also many freeware and open-source processing softwares. Seismic Unix (SU) package is a most usual, fast release & freeware processing tools under Linux/Unix developed and maintained by the Center for Wave Phenomena (CWP) at Colorado School of Mines. The full source code packages are distributed and can be downloaded on http://www.cwp.mines.edu/cwpcodes/index.html or via ftp://ftp.cwp.mines.edu/pub/cwpcodes.

The users can also modify the ANSI C compiler and extend the programming capability in shell scripts after installing on Linux / Unix platform (Red Hat Linux or Ubuntu, etc). The other advantage, Unix is a multi-tasking operating system, multiple processes may be strung together in a cascade via “pipes” (|). For the usage and tutorial can be done in Thomas Benz (2005), “Seismic Data Processing with Seismic Un*x” or refers to SU self documentation.

This topic introduces the basic concept of 2 D high resolution marine seismic reflection data processing. This is accomplished by using Seismic Unix with a set of shell scripts to covers the necessary processing steps and produces a final migrated section. With “learning-by-doing”, hope this can explore and share our ideas to understand the fundamental or advanced seismic data processing.

PART-1: SEG-Y INPUT

DESCRIPTION:

The objective of this section is to load seismic data record in SEG-Y format (*.segy or *.sgy) and reformatted into SU format (*.su). The SEG-Y data will be input and the header will be extracted. This includes seismic information to guide us for processing parameters.
Then, we can get the raw data in SU format within Trace Header (240 bytes) similar to SEG-Y, but without EBCDIC Header (3200 bytes) and Binary Header (400 bytes). As the initial data QC, we need to view the range of each Trace Headers, create print out of original field records data and plot near traces data.
Author : Henry Mulana Nainggolan

Date : April 27^th, 2014

BASIC THEORY:

Primarily, the data from field is transposed (demultiplex) from the recording mode (each record contains the same time sample from all traces) to the trace mode (each record contains all time samples from one trace). The seismographs record the data in format SEG-D, SEG-Y or in specific format to a device. Then the datasets will be converted into an internal software format that is used throughout processing (e.g., *.su format for Seismic Unix).
The SEG-Y file format is one of several standards developed by the Society of Exploration Geophysicists for storing seismic datasets. SEG-Y format had been revised from revision 0 in 1975 into SEG-Y format revision 1 in 2002. Further reading can be done in SEG Technical Standards Committee (2002), “SEG Y rev 1 Data Exchange format”. This information can be download from http://www.seg.org/documents/10161/77915/seg_y_rev1.pdf.
Seismic datasets must consist of navigation and signal data. In addition, signal data is accompanied with information about the data, which is stored in the headers of the files concerned or in separate acquisition and/or processing reports.
Every byte of header contains some "characters" indicate the parameters of recording, geometry, topography, values of transfers, dead trace etc. Essentially, SEG-Y data format is the same as SU data format, but SEG-Y has a 3200 bytes EBCDIC Header and 400 bytes Binary Header at the beginning of 240 bytes of Trace Header.

Figure-1.Description of SEG-Y data structure.

EBCDIC Header is the first header structure, starts from byte 1 to byte 3200, that contains textual information about survey description, acquisition parameters, and processing steps. The SEG-Y rev 1 restored a new textual header with more comprehensively structure, known as Extended Textual File Header.

Binary Header is the second header structure, starts from byte 32001 to byte 3600, that contains binary values which are defined as two-byte or four-byte, two’s complement integers. The following byte locations are mandatory for all types of data:

* 3205 - 3208 = lino line number (numerical part of line name)

* 3213 - 3214 = ntrpr number of data traces per ensemble

* 3215 - 3216 = nart number of auxiliary traces per ensemble

* 3217 - 3218 = hdt sample interval in μs

* 3221 - 3222 = hns number of samples per data trace

* 3225 - 3226 = format data sample format code

* 3227 - 3228 = fold nominal CDP fold

* 3229 - 3230 = tsort trace sorting code

* 3501 - 3502 = rev SEGY format revision number

* 3503 - 3504 = flen fixed length trace flag

* 3505 - 3506 = netfh number extended textual file headers

Trace Header is the third header structure, starts from byte 1 to byte 240, that contains trace attributes which are defined as two-byte or four-byte, two’s complement integers. The following byte locations are mandatory for all types of data:

* 001 - 004 = tracl trace sequence number within line

* 005 - 008 = tracr trace sequence number within reel

* 009 - 012 = fldr original field record number

* 013 - 016 = tracf trace number within the original field record

* 017 - 020 = ep energy source point number

* 021 - 024 = cdp ensemble number (CDP, CMP, CRP, etc)

* 029 - 030 = trid trace identification code

* 033 - 034 = nhs number of horizontally stacked traces yielding this trace

* 037 - 040 = offset distance from center source point to the center receiver group

* 069 – 070 = scalel scalar to be applied to all elevations and depths

* 071 – 072 = scalco scalar to be applied to all coordinates

* 073 - 076 = sx source coordinate X

* 077 - 080 = sy source coordinate Y

* 081 - 084 = gx group coordinate X

* 085 - 088 = gy group coordinate Y

* 089 – 090 = counit coordinate units

* 099 - 100 = sstat source static correction in milliseconds

* 101 - 102 = gstat group static correction in milliseconds

* 103 - 104 = tstat total static applied in milliseconds

* 115 - 116 = ns number of samples in this trace

* 117 - 118 = dt sample interval in μs

* 181 - 184 = cdpx X coordinate of ensemble (CDP) position

* 185 - 188 = cdpy Y coordinate of ensemble (CDP) position

Trace data is the fourth or last structure in seismic data which follows the Trace Header. The raw data must be organized as recorded with trace sorting code in Binary Header is equal to 1 (no sorting). Let us consider a signal data example in SEG-Y format: 2095 records (shots), 96 channels, 3201 samples, 32 bits/sample. Then the total number traces data would be 2095 x 96 = 201120; the total number of bits would be: 201120 x 3201 samples x 32 bits/sample = 2.0601 x 10¹⁰ bits for all traces data.

The SEG-Y format for seismic data is a digital format (binary), which typically has floating-point values stored in a rarely used IBM floating-point format, a header section using EBCDIC character encoding, and record headers in a binary integer format.

The digital data is recorded by seismic recording system, coverted from analog continuous signals when underwater seismic receivers or hydrophone detect sound signals. The analog signal (continuous voltage vs time) will be amplified, filtered, converted Analog to Digital (ADC), and resampled, respectively. More details, should be read on Gadallah and Fisher (2009), “Exploration Geophysics”.

Figure-2.Representation of digitized and resampled signal from continuous analog signal into digital signal. Courtesy Gadallah and Fisher (2009), “Exploration Geophysics”.

The digital signal as a string of pulses represents the digits 0 and 1 per sample. This digit is called as 1-bit “binary”. Then, every 1-bit/sample of digital signal are resampled into greater precision (e.g.32 bits/sample). If we record data per channel for 3 seconds with a sampling rate of 1 ms (1000Hz or 1000 samples/second); the value of the sample is given by 32 bit/sample. Then the total number of bits per channel would be: 3 s x 1000 samples/s x 32 bits/sample = 96000 bits.

4-Byte (32-Bit) Two’s Complement Integer Fixed Point

Once we have obtained a stream of bits, then we stored this stream to a certain encoding format, such as 4 byte (32-bit) two’s complement integers. This represent 32 bits (0 to 31) of binary numbers (0 & 1) or 2³² (4294967296) different integer numbers. The integers values range from -2^(32-1) to 2^(32-1)- 1 or equal with -2147483648 to 2147483647 (signed integers). For unsigned integers, this will stores integer values range from 2⁰-1 to 2³²-1 or equal with 0 to 4294967295. More details, please refers to http://www.seg.org/documents/10161/77915/seg_y_rev1.pdf or http://en.wikipedia.org/wiki/Two’s_complement.

For example, given a sample of 32-bit digital signal (binary) is 0000 0000 0000 0000 0000 0001 0001 0011:

Binary to two’s complement integer for n-bit:

b_n-1 b_n-2 b₁ … b₀ = -b_n-1 2^(n-1) +

0000 0000 0000 0000 0000 0001 0001 0011

= - 0 x 2³¹ + (0 x 2³⁰ + … + 0 x 2⁹ + 1 x 2⁸+ 0 x 2⁷+ 0 x 2⁶ + 0 x 2⁵ + 1 x 2⁴

+ 0 x 2³+ 0 x 2² + 1 x 2¹+ 1 x 2⁰)

= 0 + … +0 + 256 + 0 + 0 + 0 + 16 + 0 + 0 + 2 + 1

= 275

Vice versa, two’s complement integer to binary:

275 = 2) 275 1

= 2) 137 1

= 2) 68 0

= 2) 34 0

= 2) 17 1

= 2) 8 0

= 2) 4 0

= 2) 2 0

= 2) 1 1

= 0

275= 0000 0001 0001 0011 (16-bit two’s complement binary)

275= 0000 0000 0000 0000 0000 0001 0001 0011 (32-bit two’s complement binary)

Other example, given a negative integer signal value is -275:

Take the 32-bit two’s complement binary of 275 is 0000 0000 0000 0000 0000 0001 0001 0011.

Invert: 1111 1111 1111 1111 1111 1110 1110 1100 (32-bit one’s complement binary)

Add 1: 1 +

-275= 1111 1111 1111 1111 1111 1110 1110 1101 (32-bit two’s complement binary)

IBM Single Precision (32-Bit) Floating Point

The majority of modern seismic exploration has adopted IBM single precision (32-bit) or IEEE single precision (32-bit), a certain type of grouping the bits as bytes or words such that they can called as floating point. They can be divided into three parts: the sign bit (S), the exponent (E), and the mantissa or fraction (F).

The first way to store the bit stream is in IBM single precision (32 bit). The binary format written as 3 groups: the first group of 1-bit number as a sign bit, the second group of 7-bit number as an exponent, and the third group of 24-bit number as fraction. Details can also be read on http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture. Let us consider a sample of 32-bit digital signal (binary) is 0000 0000 0000 0000 0000 0001 0001 0011.The binary encoding format in IBM looks like:

0-0000000-000000000000000100010011

S E F

Binary to IBM: (-1)^S x 16^E-64 x F.

=[1] x [16^-64] x [2^-16+2^-20+2^-23+2^-24]

= 1.41558 x 10^-82

IEEE Single Precision (32-bit) Floating Point

The second way is IEEE single precision (32 bit) with binary format written as 3 groups: the first group of 1-bit number as a sign bit, the second group of 8-bit number as an exponent, and the third group of 23-bit number as fraction. Further information should be also read on http://en.wikipedia.org/wiki/IEEE_floating_point.The binary encoding format in IEEE looks like:

0-00000000-00000000000000100010011

S E F

Binary to IEEE: (-1)^S x 2^E-127 x (1+ F).

=[1] x [2^-127] x [1+2^-15+2^-19+2^-22+2^-23]

=7.45082 x 10^-9

So far, we’ve got 3 different values of a sample from 32-bit digital signal (binary) 0000 0000 0000 0000 0000 0001 0001 0011. They are 275 (from two’s complement integers), 1.41558 x 10^-82 (from IBM), and 7.45082 x 10^-9 (from IEEE). Below are the description of the difference seismic trace data between IBM and IEEE encoding formats.

Figure-3.IBM stored traces data displayed correctly (left); incorrectly as IEEE (center); IEEE with display gain (/2) (right). Courtesy Dennis Meisinger, "SEGY Floating Point Confusion", CSEG RECORDER September 2004.

Therefore, we must be careful to encode the stored trace data from recording systems. If we had stored trace data into encoding format 4-byte (32-bit) two’s complement integers, so we must load it by using the same encoding format. The information of encoding format can be obtained in keyword “format” (byte 3225 – 3226) from Binary Header.

INPUT - OUTPUT DATA:

* Survey description:

* Shotpoint (SP) interval: 12.5 m* Group interval: 12.5 m
* Number of source: 2089
* Number of channel: 96
* Vessel heading: 327.7°
* Near offset: 33 m
* Record length: 3 s
* Sampling interval: 1 ms
* Delay time: 33 ms
* Gun and streamer depth: 3m
INDATA : raw data in SEG-Y format (data_raw.sgy or data_raw.segy).
OUTDATA : raw data in SU format (data_raw.su) and trace header in binary file.

COMMANDS - PARAMETERS:

The SEG-Y input steps are:

* Put together INDATA with source code (01.segyin.sh) into a project folder.

* Open the source code (01.segyin.sh) using text editor.

* Set DATADIR for processing directory (line-7), in where the seismic datasets are located.

* Define INDATA and OUTDATA (line-10 & 11).

* Run the command in terminal: sh 01.segyin.

* Wait the process until finish.
* While finish, make sure print out file 01.Raw_Data-QC_First_Shot.eps, 01.Raw_Data-QC_Last_Shot.eps, and 01.Raw_Data-QC_Near_Trace.eps are exist in directory.

The mandatory SU commands with required parameters in source code 01.segyin.sh is as follow.

segyread tape=$INDATA endian=0 verbose=1 | segyclean > $OUTDATA

SU command to read an SEG-Y tape (INDATA) and to convert into SU format (OUTDATA). Parameter endian=0 means read data by little-endian (local PC or laptop), while endian=1 for big-endian (server). For further information, please type in terminal: segyread.

Below is additional SU commands with required parameters in source code 01.segyin.sh.

dt=`sugethw dt < $OUTDATA | sed 1q | sed 's/.*dt=//'`

SU command to get sample interval (dt) value from trace header and writes the value of the selected key word.

ns=`sugethw < $OUTDATA key=ns | sed 1q | sed 's/ns=//' | awk '{print $1}'`

SU command to get number of samples (ns) value from trace header and writes the value of the selected key word.

fldr_start=`sugethw < $OUTDATA output=geom key=fldr | sort -n | head -1 | awk '{print $1}'`

SU command to get field record numbers defined by key word “fldr” in geom or ASCII output (not binary), all values are normal sorted (min-max), then a first value is selected and written as a first field record number.

fldr_end=`sugethw < $OUTDATA output=geom key=fldr | sort -n | tail -1 | awk '{print $1}'`

SU command to get field record numbers defined by key word “fldr” in geom or ASCII output (not binary), all values are normal sorted (min-max), then a last value is selected and written as a last field record number.

surange > $OUTDATA

SU command to get minimum and maximum range for trace header values of output data (OUTDATA). This SU command is very necessary to view QC output data and to compare or to view changes from input data with output data. Therefore, this will be used for every processing stage in Seismic Unix.

Note: ` is different type with '.

PRINT OUT AND QC:

Figure-4.FFID 98-101 showing raw data from near field records which consist of 3 “misfire” records (FFID 98-100) and 2 auxiliary traces per record.

Figure-5.FFID 2189-2192 showing raw data from far field records which consist of 3 “misfire” records (FFID 2190-2192) and 2 auxiliary traces per record.

Figure-6.Near Trace QC showing near trace (trace number 1) from original records (FFID 98-2192).

The following is range of trace header values from output SU data format (OUTDATA).

205310 traces:

tracl 1 205310 (1 - 205310)

tracr 1 205310 (1 - 205310)

fldr 98 2192 (98 - 2192)

tracf -2 96 (-1 - 96)

ep 0 2192 (0 - 2192)

cdpt 1 98 (1 - 98)

trid 0 1 (0 - 1)

nhs 1

ns 3201

dt 1000

gain 3

afilf 400

afils 370

lcf 5

hcf 400

lcs 12

hcs 370

year 2013

Negative value (-) of tracf from above Trace Header range indicate auxiliary trace, where we can also see this number from keyword “nart” in Binary Header. The values of trace identification code (trid) in Trace Header indicate unknown (trid=0) and seismic data (trid=1). Total number of Trace Data within this raw data is 205310 ( tracl=1 to tracl=205310) , equals with number of original record (2095) x number of original trace per record (98).

Number of trace per original record is 98 traces (tracf=-2 to 96), consists of seismic traces (96 traces), and auxiliary traces (2 traces) that are going to be killed in next processing stage. Based on survey plan and observer log, number of original record (fldr=98 to 2192) consists of 6 “misfire” records (fldr=98,99,100,2190,2191, and 2192). These “misfire” records will not be used due to dead traces that will also be removed in next processing stage (Part2: Trace Editing, *will be continued soon*).
SOURCE CODE:

#!/bin/sh
# filename: 01.segyin.sh
# READ AND CONVERT SEGY DATA INTO SU DATA FORMAT
# Created by geophenry124

# Set directory
DATADIR=/media/Data/data/marine

# Define input-output data
INDATA=$DATADIR/data_raw.sgy
OUTDATA=$DATADIR/data_raw.su

#=========================================================================
# PROCESS 1: READ SEG-Y DATA
echo "PROCESS 1: READ SEG-Y DATA"
echo ""
printf "Job started `todays_date` at `time_now` \n"
echo ""

# Read SEG-Y data
echo "Read SEGY data: $INDATA..."
segyread tape=$INDATA endian=0 verbose=1 | segyclean > $OUTDATA

# View range of header values
echo "Range of header values for $OUTDATA..."
surange < $OUTDATA

# Get header info
echo "Get sampling time..."
dt=`sugethw dt < $OUTDATA | sed 1q | sed 's/.*dt=//'`
echo "dt=$dt"

dtsec=`bc -l << END
       $dt / 1000000
END`
echo "Sampling time = $dtsec"

echo "Get total number of sampling time..."
ns=`sugethw < $OUTDATA key=ns | sed 1q | sed 's/ns=//' | awk '{print $1}'`
echo "Total number of sampling time = $ns"

echo "Check the first shot number..."
ep=`sugethw < $OUTDATA output=geom key=ep | sort -n | head -1 | awk '{print $1}'`
echo "First shot number = $ep"

echo "Check the near trace..."
tracf=`sugethw < $OUTDATA output=geom key=tracf | sort -n | head -1 | awk '{print $1}'`
echo "Near trace = $tracf"

echo "Check the first field record..."
fldr_start=`sugethw < $OUTDATA output=geom key=fldr | sort -n | head -1 | awk '{print $1}'`
echo "First field file id (FFID) = $fldr_start"

echo "Check the last field record..."
fldr_end=`sugethw < $OUTDATA output=geom key=fldr | sort -n | tail -1 | awk '{print $1}'`
echo "Last field file id (FFID) = $fldr_end"

# Set field record number (min-max) to view QC of the first & last FFID
min1=$fldr_start
max1=`bc -l << END
       $min1 + 3
END`

min2=`bc -l << END
      $fldr_end - 3
END`
max2=$fldr_end

# Display QC field record
suwind < $OUTDATA key=fldr min=$min1 max=$max1 | suxwigb perc=95 label1="Time (s)" \
label2="Trace No." windowtitle="Raw Data" title="FFID $min1-$max1" &
suwind < $OUTDATA key=fldr min=$min2 max=$max2 | suxwigb perc=95 label1="Time (s)" \
label2="Trace No." windowtitle="Raw Data" title="FFID $min2-$max2" &

# Print out QC field record
suwind < $OUTDATA key=fldr min=$min1 max=$max1 | supswigb perc=95 label1="Time (s)" \
label2="Trace No." title="Raw Data-FFID $min1-$max1" wbox=20 hbox=15 \
verbose=0 > 01.Raw_Data-QC_First_Shot.eps &
suwind < $OUTDATA key=fldr min=$min2 max=$max2 | supswigb perc=95 label1="Time (s)" \
label2="Trace No." title="Raw Data-FFID $min2-$max2" wbox=20 hbox=15 \
verbose=0 > 01.Raw_Data-QC_Last_Shot.eps &

# Display QC near traces
suwind < $OUTDATA key=tracf min=1 max=1 | suximage f2=$fldr_start perc=95 \
windowtitle="QC Near Trace" title="Raw Data-QC Near Trace" \
label1="Time (s)" label2="FFID" &

# Print out QC near traces
suwind < $OUTDATA key=tracf min=1 max=1 | supsimage f2=$fldr_start perc=95 \
label1="Time (s)" label2="FFID" title="Raw Data-QC Near Trace" \
wbox=10 hbox=15 > 01.Raw_Data-QC_Near_Trace.eps
#========================================================================
echo ""
printf "Job finished `todays_date` at `time_now` \n"
echo "Finish..."
echo ""
echo "NEXT PROCESS: TRACE EDIT"
exit