next up previous contents
Next: Un po' di esercizi Up: File: lettura e scrittura Previous: File: lettura e scrittura   Indice

Esempio: leggiamo una sequenza da un file formato SWISSPROT

Se abbiamo un file in formato flat SWISSPROT il cui nome è OPSDi_BOVIN, di cui per esempio un estratto è il seguente

ID   OPSD_BOVIN     STANDARD;      PRT;   348 AA.
AC   P02699;
TR   A.
TS   TMHMM83; TMHMM160; COPRETHI; HTP; HMMTOP; NON_RED.
DE   RHODOPSIN.
FT   DOMAIN        1     36       EXTRACELLULAR.
FT   TRANSMEM     37     63       1. (MEDLINE; 20385054)
FT   DOMAIN       64     73       CYTOPLASMIC.
FT   TRANSMEM     74     96       2. (MEDLINE; 20385054)
FT   DOMAIN       97    110       EXTRACELLULAR.
FT   TRANSMEM    111    133       3. (MEDLINE; 20385054)
FT   DOMAIN      134    152       CYTOPLASMIC.
FT   TRANSMEM    153    173       4. (MEDLINE; 20385054)
FT   DOMAIN      174    202       EXTRACELLULAR.
FT   TRANSMEM    203    224       5. (MEDLINE; 20385054)
FT   DOMAIN      225    252       CYTOPLASMIC.
FT   TRANSMEM    253    274       6. (MEDLINE; 20385054)
FT   DOMAIN      275    286       EXTRACELLULAR.
FT   TRANSMEM    287    308       7. (MEDLINE; 20385054)
FT   DOMAIN      309    348       CYTOPLASMIC.
SQ   SEQUENCE   348 AA;  39007 MW;  33FDA196803E81F3 CRC64;
     MNGTEGPNFY VPFSNKTGVV RSPFEAPQYY LAEPWQFSML AAYMFLLIML GFPINFLTLY
     VTVQHKKLRT PLNYILLNLA VADLFMVFGG FTTTLYTSLH GYFVFGPTGC NLEGFFATLG
     GEIALWSLVV LAIERYVVVC KPMSNFRFGE NHAIMGVAFT WVMALACAAP PLVGWSRYIP
     EGMQCSCGID YYTPHEETNN ESFVIYMFVV HFIIPLIVIF FCYGQLVFTV KEAAAQQQES
     ATTQKAEKEV TRMVIIMVIA FLICWLPYAG VAFYIFTHQG SDFGPIFMTI PAFFAKTSAV
     YNPVIYIMMN KQFRNCMVTT LCCGKNPLGD DEASTTVSKT ETSQVAPA
//
e vogliamo estrarre la sequenza soltanto, potremo utilizzare un codice del tipo
def read_sp_seq(filename):
    ''' reads a swissprot flat file and 
        returns the protein sequence '''
    try:
        f=open(filename,'r')
    except:
        print 'Error in file ',filename,'with mode',mode
        return None # nothing to do
    seq="" # our sequence
    wholefile=f.readlines() # read the whole file
    f.close() # close file
    # read until we find the 'SQ' keyword
    line=wholefile[0]
    numlines=len(wholefile)
    i=0
    while (line[0:2] != 'SQ') and (i<numlines):
        i=i+1
        line=wholefile[i]
    if i == numlines:
        print 'Error in file ',filename,'Not a swissprot file'
        return None # nothing to do        
    i=i+1
    line=wholefile[i]
    while (line[0:2] != '//') and (i<numlines):
        listline=line.split()
        seq=seq+"".join(listline)
        i=i+1
        line=wholefile[i]
    return seq
ed utilizzarla come
>>> seq=read_sp_seq('OPSD_BOVIN')
>>> seq
'MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLY
VTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGG
EIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEG
MQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATT
QKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPV
IYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA'



2004-11-02